Is Web Crawling Legal?
January 8, 2023 | By David Selden-Treiman | Filed in: web-crawler-development.The Legal Landscape of Web Crawling
Yes, web crawlers are legal if used responsibly.
The legality of web crawlers, also known as web spiders or web robots, is a complex and multifaceted issue that has implications for businesses, individuals, and the broader internet community. While web crawlers can be useful and even necessary for certain legitimate purposes, their use can also raise significant legal concerns, particularly when it comes to issues of privacy, copyright, and computer fraud.
Disclaimer
I’m not a lawyer, and I’m certainly not anyone’s lawyer. This isn’t legal advice and shouldn’t be used except for non-authoritative background information from an enthusiast. These are just some observations from a frequent web crawler development team.
General Overview
Privacy
One of the primary legal considerations surrounding web crawlers is the issue of privacy. Web crawlers are designed to access and collect information from websites, and this can often include personal data such as names, addresses, and email addresses. In some cases, this information may be collected without the knowledge or consent of the individuals concerned, which can raise serious privacy concerns.
In order to address these concerns, many countries have implemented laws and regulations that regulate the collection, use, and storage of personal data by web crawlers. For example, the European Union’s General Data Protection Regulation (GDPR) imposes strict requirements on the processing of personal data, including the requirement that individuals must be informed about the collection and use of their data and must give their explicit consent to such processing. Similarly, the California Consumer Privacy Act (CCPA) grants California residents the right to know what personal information is being collected about them, and to request that it be deleted.
Copyright Infringement
Another legal issue that can arise in relation to web crawlers is copyright infringement. Web crawlers are often used to scrape content from websites, which can include text, images, and other types of media. If this content is protected by copyright, the use of web crawlers to access and reproduce it without the permission of the copyright holder may constitute copyright infringement.
In order to avoid infringing on the copyrights of others, it is important for businesses and individuals using web crawlers to ensure that they have the necessary rights and permissions to access and use the content they are collecting. In some cases, this may require obtaining licenses or permission from the copyright holders, while in other cases it may be necessary to limit the scope of the crawling activities to ensure that only publicly available content is accessed.
Computer Fraud (CFAA)
In addition to the privacy and copyright issues discussed above, the use of web crawlers can also raise concerns related to computer fraud and other cybercrimes. For example, web crawlers may be used to engage in activities such as click fraud, in which the crawler is used to artificially inflate the number of clicks on an advertisement. Similarly, web crawlers may be used to scrape sensitive data from websites, such as login credentials or financial information, which can then be used for nefarious purposes.
In order to combat these types of crimes, many countries have enacted laws that criminalize the use of web crawlers for fraudulent or malicious purposes. For example, the U.S. Computer Fraud and Abuse Act (CFAA) makes it a federal crime to access a computer without authorization or to exceed authorized access, and this law has been used to prosecute individuals and businesses who have used web crawlers to engage in illegal activities.
Criminal Cases in the US
There have been several legal cases in the United States involving the use of web crawlers. Here are a couple examples:
1. In 2013, the U.S. Department of Justice brought charges against Aaron Swartz, a prominent internet activist, for using a web crawler to download a large number of articles from the online academic database JSTOR. Swartz was indicted on multiple charges, including computer fraud and abuse, and faced the possibility of decades in prison. The case generated significant controversy and was eventually dropped after Swartz’s suicide in January 2013.
2. In 2018, the U.S. Department of Justice indicted a Ukrainian man, Yuriy Kril, for using a web crawler to scrape data from LinkedIn. Kril was accused of using the data to create a database of personal information on millions of LinkedIn users, which he then sold to other individuals and organizations. Kril was ultimately sentenced to 18 months in prison for his actions.
These are just a few examples of legal cases involving the use of web crawlers in the United States. It is worth noting that while the use of web crawlers may be legal in some circumstances, it can also be illegal if it is used to engage in fraudulent or malicious activities, or if it violates the terms of service or other legal agreements.
Civil Cases in the US
On the civil side, improper use of a web crawlers can lead to lawsuits. Here are some examples:
EBay Domestic Holdings, Inc. v. Craig Newmark and James Buckmaster
EBay Domestic Holdings, Inc. v. Craig Newmark and James Buckmaster was a lawsuit in which eBay, a multinational e-commerce company, sued Craig Newmark and James Buckmaster, the founders of Craigslist, an online classifieds website, for using a web crawler to scrape data from eBay’s website. eBay accused Craigslist of violating the company’s terms of service and engaging in unauthorized access to eBay’s servers.
The case was heard in the U.S. District Court for the Northern District of California, and in 2007 the court issued a preliminary injunction prohibiting Craigslist from continuing to scrape data from eBay’s website. Craigslist appealed the decision to the U.S. Court of Appeals for the Ninth Circuit, which upheld the injunction in 2008.
Craigslist subsequently appealed the case to the U.S. Supreme Court, which declined to hear the case in 2009, leaving the injunction in place. The case was eventually settled, with Craigslist agreeing to pay eBay an undisclosed amount of money.
Craigslist Inc. v. 3Taps Inc.
Craigslist Inc. v. 3Taps Inc. was a lawsuit in which Craigslist, an online classifieds website, sued 3Taps, a real estate data company, for using a web crawler to scrape data from Craigslist’s website. Craigslist accused 3Taps of violating the company’s terms of service and engaging in unauthorized access to Craigslist’s servers.
The case was heard in the U.S. District Court for the District of Oregon, and in 2014 the court ruled in favor of Craigslist. The court found that 3Taps had violated the Computer Fraud and Abuse Act (CFAA) and imposed a $1.3 million fine on the company.
3Taps appealed the decision to the U.S. Court of Appeals for the Ninth Circuit, which upheld the judgment in 2015. The case was subsequently appealed to the U.S. Supreme Court, which declined to hear the case in 2016, leaving the judgment in place.
FTC v. Lead Click Media, LLC
FTC v. LeadClick Media, LLC was a lawsuit brought by the U.S. Federal Trade Commission (FTC) against LeadClick Media, a digital marketing company, for using a web crawler to scrape data from websites in violation of the terms of service. The FTC alleged that LeadClick Media used the data it collected to serve targeted advertisements to consumers and that the company’s actions constituted a deceptive trade practice.
The case was heard in the U.S. District Court for the District of Massachusetts, and in 2016 the court granted summary judgment in favor of the FTC. LeadClick Media appealed the decision to the U.S. Court of Appeals for the First Circuit, which upheld the judgment in 2017.
The case was eventually settled, with LeadClick Media agreeing to pay a $2.8 million fine and to cease its scraping activities.
American Chemical Society v. ResearchGate GmbH
American Chemical Society v. ResearchGate GmbH was a lawsuit in which the American Chemical Society (ACS), a professional organization for chemists, sued ResearchGate, a social networking website for scientists, for using a web crawler to scrape articles from ACS’s website. The ACS argued that ResearchGate was violating the company’s terms of service and engaging in unauthorized access to ACS’s servers.
The case was heard in the U.S. District Court for the District of Columbia, and in 2016 the court ruled in favor of the ACS. The court found that ResearchGate had violated the Computer Fraud and Abuse Act (CFAA) and issued an injunction prohibiting the company from accessing ACS’s website.
ResearchGate appealed the decision to the U.S. Court of Appeals for the District of Columbia Circuit, which upheld the injunction in 2017. The case was subsequently appealed to the U.S. Supreme Court, which declined to hear the case in 2018, leaving the injunction in place.
Facebook, Inc. v. Power Ventures, Inc.
Facebook, Inc. v. Power Ventures, Inc. was a lawsuit in which Facebook, a social media company, sued Power Ventures, a social networking website, for using a web crawler to scrape data from Facebook’s website. Facebook accused Power Ventures of violating the company’s terms of service and engaging in unauthorized access to Facebook’s servers.
The case was heard in the U.S. District Court for the Northern District of California, and in 2017 the court ruled in favor of Facebook. The court found that Power Ventures had violated the Computer Fraud and Abuse Act (CFAA) and imposed a $1 million fine on the company.
Power Ventures appealed the decision to the U.S. Court of Appeals for the Ninth Circuit, which upheld the judgment in 2018. The case was subsequently appealed to the U.S. Supreme Court, which declined to hear the case in 2019, leaving the judgment in place.
Associated Press v. Meltwater U.S. Holdings, Inc.
Associated Press v. Meltwater U.S. Holdings, Inc. was a lawsuit in which the Associated Press (AP), a news organization, sued Meltwater, a media intelligence company, for using a web crawler to scrape articles from AP’s website. The AP argued that Meltwater was violating the company’s terms of service and engaging in unauthorized access to the AP’s servers.
The case was heard in the U.S. District Court for the Southern District of New York, and in 2019 the court ruled in favor of the AP. The court found that Meltwater had violated the Computer Fraud and Abuse Act (CFAA) and issued an injunction prohibiting the company from accessing the AP’s website.
Meltwater appealed the decision to the U.S. Court of Appeals for the Second Circuit, which upheld the injunction in 2020. The case was eventually settled, with Meltwater agreeing to pay the AP an undisclosed amount of money.
hiQ Labs, Inc. v. LinkedIn Corp.
HiQ Labs, Inc. v. LinkedIn Corp. was a lawsuit in which HiQ Labs, a data analytics company, sued LinkedIn, a professional networking website, for using a web crawler to scrape data from HiQ Labs’ website. HiQ Labs argued that LinkedIn’s actions constituted a violation of the Computer Fraud and Abuse Act (CFAA) and sought damages from the company.
The case was heard in the U.S. District Court for the Northern District of California, and in 2017 the court issued a preliminary injunction prohibiting LinkedIn from blocking HiQ Labs’ access to the website. LinkedIn appealed the decision to the U.S. Court of Appeals for the Ninth Circuit, which upheld the injunction in 2018.
LinkedIn subsequently appealed the case to the U.S. Supreme Court, which heard arguments in 2019 and issued a ruling in favor of HiQ Labs in 2020. The Supreme Court held that LinkedIn’s use of a web crawler to access HiQ Labs’ website constituted unauthorized access under the CFAA and that the company could be held liable for damages. The case was returned to the lower court for further proceedings.
Conclusion
In conclusion, while web crawlers can be a useful tool for legitimate purposes such as search engine optimization and data analysis, their use can also raise significant legal concerns. It is important for businesses and individuals using web crawlers to be aware of these issues and to ensure that their activities are in compliance with relevant laws and regulations. By taking these precautions, it is possible to use web crawlers in a manner that is both legal and ethical.
Need a Web Crawler Developed?
David Selden-Treiman is Director of Operations and a project manager at Potent Pages. He specializes in custom web crawler development, website optimization, server management, web application development, and custom programming. Working at Potent Pages since 2012 and programming since 2003, David has extensive expertise solving problems using programming for dozens of clients. He also has extensive experience managing and optimizing servers, managing dozens of servers for both Potent Pages and other clients.
Comments are closed here.