Stepping into the era of big data, web crawlers have “evolved” from web crawling to data crawling. Especially with the acceleration of big data applications, the value of data has greatly increased, and it has become an important resource for market competition. Malicious crawlers have also crawled data. Occurs frequently. According to incomplete statistics, since 2016, there have been more than ten judicial cases involving web crawlers, of which most are civil cases.
Including the 2016 case of “Chelailai” app crawling real-time bus driving data of “Kumike” app, 2017 case of illegally grabbing and using Sina Weibo user information by Maimai, and 2017 unauthorised transplantation of Toutiao to Sina Weibo. V account content data case, 2019 Shuabao App crawling Douyin App short video and user comment data case, 2021 “Extreme” website crawling WeChat public account article data case, etc.
In addition to civil cases, criminal cases involved in crawling personal information data are also on the increase, and there are many cases where hundreds of millions of data are crawled. For example, in June 2021, the Criminal Judgment issued by the People’s Court of Suiyang District, Shangqiu City, Henan Province, showed that Lu and Li used their own crawler software to crawl Taobao for 8 months. Nearly 1.2 billion user messages were illegally obtained.
The original technology-neutral “web crawler” has the threat of becoming a “pest” because of its application in data crawling. When malicious crawlers easily crawl data, unauthorized crawling, breaking the Robots protocol, and crawling data among the same industry are often typical situations. Where is the legal boundary of web crawlers? How do data companies protect their legitimate rights and interests? How to regulate malicious crawlers to capture data chaos and guide the healthy development of industry compliance? This is a question that industry participants urgently need to answer after entering the era of big data.
Unauthorized crawling of data is suspected of unfair competition
Policy blessings have ushered in a bright moment for data. On April 9, 2020, the “Opinions of the Central Committee of the Communist Party of China and the State Council on Building a More Complete Factor Market Allocation System and Mechanism” was officially released, using data as a new type of production factor.
However, the supporting systems and regulations related to data rights have not yet been issued, and illegal data crawlers have already hit. In 2014, it was believed that Maimai illegally captured and used Sina Weibo user information without the user’s permission and the authorization of the Weibo platform, and illegally obtained and used Maimai registered userscell phoneSina Weibo took the pulse to the court for the correspondence between contacts in the address book and Weibo users. This case is also known as the first big data unfair competition dispute case.
In 2016, the People’s Court of Beijing Haidian District (hereinafter referred to as the “Beijing Haidian Court”) held in the first instance that the network platform can claim rights over the unauthorized use of the user data collected and used by the user with the consent of the user. In 2017, the Beijing Intellectual Property Court found that Maimai had illegally captured and used Sina Weibo user information without the user’s permission and the authorization of the Weibo platform, which constituted unfair competition.
Unauthorized and unauthorized use has become an important consideration in the judgment of the above-mentioned cases. The Beijing Intellectual Property Court held that when a third-party developer obtains user information through Open API, it must follow “user authorization + platform authorization + user authorization”, that is, the user agrees to the platform to provide information to the third party, the platform authorizes the third party to obtain information, and the user again The third party is authorized to use the information, and the user’s consent must be specific and clear. It is a free decision made by the user with full knowledge. This is also called the “triple authorization principle” by the industry.
The “triple authorization principle” has a significant impact on subsequent cases and has become a major prerequisite for third parties to crawl and use data. However, malicious crawlers often cross the red line. In reality, data crawling mostly occurs in the field of e-commerce and content platforms. Take the content platform as an example. In September 2021, Hangzhou Internet Court heard the trial of Sishi (Hangzhou) New Media Technology Co., Ltd. (hereinafter referred to as “Sishi Company”) In the case of crawling the WeChat official account platform data, it was believed that the “extreme” website operated by Si Shi violated the principle of honesty and credit, and used other operators to obtain user consent and legally collected data with commercial value, which constituted unfair competition.
Especially in the field of digital content, data is the core competitive resource of the content industry, and data collected and analyzed by content platforms often has extremely high economic value. If content platform operators are required to open up their core competitive resources to competitors indefinitely, it will not only damage the creator’s creative environment, make the overall content production of society sluggish, and consumers’ demand for high-quality content will not be guaranteed, and it will violate the “interconnectivity” The essence of “” is not conducive to the continuous change of high-quality content and the continuous development of the Internet industry.
Breaking the Robots agreement violates business ethics
In cases involving web crawlers, the Robots protocol is an inevitable topic. The full name of the Robots protocol is “Web Crawler Exclusion Standard”. The website uses the Robots protocol to clearly warn search engines which pages can be crawled and which pages cannot be crawled, similar to the industry’s “gentleman agreement”.
According to Robots protocol rules, search engines will crawl according to the permissions given by each website owner. In practice, it can be found that the information on Taobao.com can not be searched through the Baidu search engine, but it can be searchedJingdongThe reason for the product information of the mall is that Taobao banned Baidu crawlers in 2008, while Jingdong Mall did not say “no” to Baidu crawlers. The reason for this is closely related to Taobao and JD’s traffic entry selection and commercial interest considerations.
Taobao bans Baidu crawler Baiduspider from accessing its website
However, the Robots agreement is not mandatory. Driven by commercial interests, web crawlers and anti-crawlers are more like an offensive and defensive battle. Some malicious crawlers on network platforms take the risk to attack and break through the Robots agreement, and are The crawler can only passively defend technically and actively defend his rights through judicial proceedings.
In this offensive and defensive battle, there are also well-known companies in the Internet field. Among them, the conflicts between Sina Weibo and Toutiao have appeared more than once.
In 2017, Sina Weibo sued Toutiao to the court because it believed that Toutiao had not been authorized to grab the content of its Sina Weibo account. On May 17, 2021, the unfair competition dispute case was hammered down. The Beijing Haidian Court ruled in the first instance that a third party without the platform’s authorization shall not capture user content. Toutiao’s parent company, Bytedance, used a similar “copy and paste” method. The contents of Sina Weibo were transplanted on a large scale, and the contents of Sina Weibo were transplanted in a targeted manner to today’s headlines, which constituted unfair competition and compensated Sina Weibo with 20 million yuan.
Bytedance instead sued Sina Weibo for banning headline search crawlers through the Robots agreement, which constituted unfair competition. The case went through a plot of losing the first instance and reversing the second instance. On October 8, 2021, the Beijing Higher People’s Court held its final judgment that the restriction of web crawlers through the Robots protocol by web platforms is a manifestation of the autonomy of web operators. In a sense, the Robots protocol has become a means of maintaining the core competitiveness of enterprises and maintaining orderly competition in the market. Website operators should be allowed to restrict the crawling of other network robots through the Robots protocol.
It is worth mentioning that the restriction of web crawlers through the Robots protocol does not violate the business ethics of the Internet industry. In the Internet industry, except for Sina Weibo, which restricts crawlers through the Robots protocol, Internet companies including Bytedance will use the Robots protocol to expressly prohibit crawling content.
Judicial adjudication guides data compliance
Behind the cases of illegally crawling data, the data services established by malicious crawlers often form a direct competitive relationship with the crawled party, or even form a substitute relationship.
In the judgment of the case, the court also made it clear that the increase in consumer welfare in the Internet field is not an obvious alternative or homogeneous use of data through data crawling. Illegal crawlers crawling data for nothing and gaining competitive rights and interests are obviously “free-riding” unjustified. For example, in the case of “Xtremely” website crawling WeChat official account data, the court determined that “Xtremely” website violated the principle of good faith and used unauthorized data collected by other operators with user consent and legally collected data with commercial value. Substantially replace part of the products or services provided by other operators, damaging the market order of fair competition.
At the same time, when determining that data companies enjoy data competition rights, the court also took the operating costs of data companies such as the human, material and financial resources paid for collection and sorting as important considerations. For example, in the case of Douyin suing Shuabao for crawling data, the court held that the Weibo company (the developer and operator of the Douyin App) invested corresponding human and financial costs to accumulate users and short video content through legitimate and legal operations. . The Shuabao App directly obtains video resources and comment content without investing the corresponding cost, plunders the operating results of the micro-broadcasting company, damages the legitimate rights and interests of Douyin, and constitutes unfair competition.
Studying the thinking of case judgments is not difficult to find that when data companies face unauthorized malicious crawling, judicial organs are also constantly exploring ways to protect data rights and interests to settle disputes. In existing judicial judgments, the court held a positive attitude towards the legitimate rights and interests enjoyed by data companies. When the data rights of data companies are infringed by others, they have the right to require the infringer to bear tort liability, including the right to request the infringer to stop the infringement and delete the illegally stolen data when others steal data without permission; When damage is caused by negligence, it has the right to demand the infringer to bear the liability for tort compensation.
In addition, from the perspective of market economy development, if data practitioners, especially data companies, cannot reasonably and effectively control the data they collect and store, crawlers can grab it at their discretion and come and go freely, and data companies obviously have no incentive to invest. The cost to collect, store, and utilize massive amounts of data, and then to tap the huge value contained in the data, and it is even impossible to develop more data products. The development of the data industry and the era of big data will be impossible to talk about.
At present, although the law has not yet defined data rights, there are more and more industry cases in judicial practice. Among them, data “rights” or “rights” are not completely impossible to define. The legal facts and scenarios of “rights” are the chain of “authorization”. It is relatively clear and has been recognized to a considerable extent, especially the exploration and determination of data rights in judicial trials, which will provide many references and lessons for guiding data companies to carry out data compliance.
Text/Wang Qiongfei
Editor/Lu Wei