Give us a call: (800) 252-6164

Is Web Crawling Legal?

January 8, 2023 | By David Selden-Treiman | Filed in: web-crawler-development.

The Legal Landscape of Web Crawling

Yes, web crawlers are legal if used responsibly.

The legality of web crawlers, also known as web spiders or web robots, is a complex and multifaceted issue that has implications for businesses, individuals, and the broader internet community. While web crawlers can be useful and even necessary for certain legitimate purposes, their use can also raise significant legal concerns, particularly when it comes to issues of privacy, copyright, and computer fraud.

Disclaimer

I’m not a lawyer, and I’m certainly not anyone’s lawyer. This isn’t legal advice and shouldn’t be used except for non-authoritative background information from an enthusiast. These are just some observations from a frequent web crawler development team.

General Overview

Privacy

One of the primary legal considerations surrounding web crawlers is the issue of privacy. Web crawlers are designed to access and collect information from websites, and this can often include personal data such as names, addresses, and email addresses. In some cases, this information may be collected without the knowledge or consent of the individuals concerned, which can raise serious privacy concerns.

In order to address these concerns, many countries have implemented laws and regulations that regulate the collection, use, and storage of personal data by web crawlers. For example, the European Union’s General Data Protection Regulation (GDPR) imposes strict requirements on the processing of personal data, including the requirement that individuals must be informed about the collection and use of their data and must give their explicit consent to such processing. Similarly, the California Consumer Privacy Act (CCPA) grants California residents the right to know what personal information is being collected about them, and to request that it be deleted.

Another legal issue that can arise in relation to web crawlers is copyright infringement. Web crawlers are often used to scrape content from websites, which can include text, images, and other types of media. If this content is protected by copyright, the use of web crawlers to access and reproduce it without the permission of the copyright holder may constitute copyright infringement.

In order to avoid infringing on the copyrights of others, it is important for businesses and individuals using web crawlers to ensure that they have the necessary rights and permissions to access and use the content they are collecting. In some cases, this may require obtaining licenses or permission from the copyright holders, while in other cases it may be necessary to limit the scope of the crawling activities to ensure that only publicly available content is accessed.

Computer Fraud (CFAA)

In addition to the privacy and copyright issues discussed above, the use of web crawlers can also raise concerns related to computer fraud and other cybercrimes. For example, web crawlers may be used to engage in activities such as click fraud, in which the crawler is used to artificially inflate the number of clicks on an advertisement. Similarly, web crawlers may be used to scrape sensitive data from websites, such as login credentials or financial information, which can then be used for nefarious purposes.

In order to combat these types of crimes, many countries have enacted laws that criminalize the use of web crawlers for fraudulent or malicious purposes. For example, the U.S. Computer Fraud and Abuse Act (CFAA) makes it a federal crime to access a computer without authorization or to exceed authorized access, and this law has been used to prosecute individuals and businesses who have used web crawlers to engage in illegal activities.

Criminal Cases in the US

There have been several legal cases in the United States involving the use of web crawlers. Here are a couple examples:

1. In 2013, the U.S. Department of Justice brought charges against Aaron Swartz, a prominent internet activist, for using a web crawler to download a large number of articles from the online academic database JSTOR. Swartz was indicted on multiple charges, including computer fraud and abuse, and faced the possibility of decades in prison. The case generated significant controversy and was eventually dropped after Swartz’s suicide in January 2013.

2. In 2018, the U.S. Department of Justice indicted a Ukrainian man, Yuriy Kril, for using a web crawler to scrape data from LinkedIn. Kril was accused of using the data to create a database of personal information on millions of LinkedIn users, which he then sold to other individuals and organizations. Kril was ultimately sentenced to 18 months in prison for his actions.

These are just a few examples of legal cases involving the use of web crawlers in the United States. It is worth noting that while the use of web crawlers may be legal in some circumstances, it can also be illegal if it is used to engage in fraudulent or malicious activities, or if it violates the terms of service or other legal agreements.

Civil Cases in the US

On the civil side, improper use of a web crawlers can lead to lawsuits. Here are some examples:

EBay Domestic Holdings, Inc. v. Craig Newmark and James Buckmaster

EBay Domestic Holdings, Inc. v. Craig Newmark and James Buckmaster was a lawsuit in which eBay, a multinational e-commerce company, sued Craig Newmark and James Buckmaster, the founders of Craigslist, an online classifieds website, for using a web crawler to scrape data from eBay’s website. eBay accused Craigslist of violating the company’s terms of service and engaging in unauthorized access to eBay’s servers.

The case was heard in the U.S. District Court for the Northern District of California, and in 2007 the court issued a preliminary injunction prohibiting Craigslist from continuing to scrape data from eBay’s website. Craigslist appealed the decision to the U.S. Court of Appeals for the Ninth Circuit, which upheld the injunction in 2008.

Craigslist subsequently appealed the case to the U.S. Supreme Court, which declined to hear the case in 2009, leaving the injunction in place. The case was eventually settled, with Craigslist agreeing to pay eBay an undisclosed amount of money.

Craigslist Inc. v. 3Taps Inc.

Craigslist Inc. v. 3Taps Inc. was a lawsuit in which Craigslist, an online classifieds website, sued 3Taps, a real estate data company, for using a web crawler to scrape data from Craigslist’s website. Craigslist accused 3Taps of violating the company’s terms of service and engaging in unauthorized access to Craigslist’s servers.

The case was heard in the U.S. District Court for the District of Oregon, and in 2014 the court ruled in favor of Craigslist. The court found that 3Taps had violated the Computer Fraud and Abuse Act (CFAA) and imposed a $1.3 million fine on the company.

3Taps appealed the decision to the U.S. Court of Appeals for the Ninth Circuit, which upheld the judgment in 2015. The case was subsequently appealed to the U.S. Supreme Court, which declined to hear the case in 2016, leaving the judgment in place.

FTC v. Lead Click Media, LLC

FTC v. LeadClick Media, LLC was a lawsuit brought by the U.S. Federal Trade Commission (FTC) against LeadClick Media, a digital marketing company, for using a web crawler to scrape data from websites in violation of the terms of service. The FTC alleged that LeadClick Media used the data it collected to serve targeted advertisements to consumers and that the company’s actions constituted a deceptive trade practice.

The case was heard in the U.S. District Court for the District of Massachusetts, and in 2016 the court granted summary judgment in favor of the FTC. LeadClick Media appealed the decision to the U.S. Court of Appeals for the First Circuit, which upheld the judgment in 2017.

The case was eventually settled, with LeadClick Media agreeing to pay a $2.8 million fine and to cease its scraping activities.

American Chemical Society v. ResearchGate GmbH

American Chemical Society v. ResearchGate GmbH was a lawsuit in which the American Chemical Society (ACS), a professional organization for chemists, sued ResearchGate, a social networking website for scientists, for using a web crawler to scrape articles from ACS’s website. The ACS argued that ResearchGate was violating the company’s terms of service and engaging in unauthorized access to ACS’s servers.

The case was heard in the U.S. District Court for the District of Columbia, and in 2016 the court ruled in favor of the ACS. The court found that ResearchGate had violated the Computer Fraud and Abuse Act (CFAA) and issued an injunction prohibiting the company from accessing ACS’s website.

ResearchGate appealed the decision to the U.S. Court of Appeals for the District of Columbia Circuit, which upheld the injunction in 2017. The case was subsequently appealed to the U.S. Supreme Court, which declined to hear the case in 2018, leaving the injunction in place.

Facebook, Inc. v. Power Ventures, Inc.

Facebook, Inc. v. Power Ventures, Inc. was a lawsuit in which Facebook, a social media company, sued Power Ventures, a social networking website, for using a web crawler to scrape data from Facebook’s website. Facebook accused Power Ventures of violating the company’s terms of service and engaging in unauthorized access to Facebook’s servers.

The case was heard in the U.S. District Court for the Northern District of California, and in 2017 the court ruled in favor of Facebook. The court found that Power Ventures had violated the Computer Fraud and Abuse Act (CFAA) and imposed a $1 million fine on the company.

Power Ventures appealed the decision to the U.S. Court of Appeals for the Ninth Circuit, which upheld the judgment in 2018. The case was subsequently appealed to the U.S. Supreme Court, which declined to hear the case in 2019, leaving the judgment in place.

Associated Press v. Meltwater U.S. Holdings, Inc.

Associated Press v. Meltwater U.S. Holdings, Inc. was a lawsuit in which the Associated Press (AP), a news organization, sued Meltwater, a media intelligence company, for using a web crawler to scrape articles from AP’s website. The AP argued that Meltwater was violating the company’s terms of service and engaging in unauthorized access to the AP’s servers.

The case was heard in the U.S. District Court for the Southern District of New York, and in 2019 the court ruled in favor of the AP. The court found that Meltwater had violated the Computer Fraud and Abuse Act (CFAA) and issued an injunction prohibiting the company from accessing the AP’s website.

Meltwater appealed the decision to the U.S. Court of Appeals for the Second Circuit, which upheld the injunction in 2020. The case was eventually settled, with Meltwater agreeing to pay the AP an undisclosed amount of money.

hiQ Labs, Inc. v. LinkedIn Corp.

HiQ Labs, Inc. v. LinkedIn Corp. was a lawsuit in which HiQ Labs, a data analytics company, sued LinkedIn, a professional networking website, for using a web crawler to scrape data from HiQ Labs’ website. HiQ Labs argued that LinkedIn’s actions constituted a violation of the Computer Fraud and Abuse Act (CFAA) and sought damages from the company.

The case was heard in the U.S. District Court for the Northern District of California, and in 2017 the court issued a preliminary injunction prohibiting LinkedIn from blocking HiQ Labs’ access to the website. LinkedIn appealed the decision to the U.S. Court of Appeals for the Ninth Circuit, which upheld the injunction in 2018.

LinkedIn subsequently appealed the case to the U.S. Supreme Court, which heard arguments in 2019 and issued a ruling in favor of HiQ Labs in 2020. The Supreme Court held that LinkedIn’s use of a web crawler to access HiQ Labs’ website constituted unauthorized access under the CFAA and that the company could be held liable for damages. The case was returned to the lower court for further proceedings.

Conclusion

In conclusion, while web crawlers can be a useful tool for legitimate purposes such as search engine optimization and data analysis, their use can also raise significant legal concerns. It is important for businesses and individuals using web crawlers to be aware of these issues and to ensure that their activities are in compliance with relevant laws and regulations. By taking these precautions, it is possible to use web crawlers in a manner that is both legal and ethical.

Need a Web Crawler Developed?

    Contact Us








    David Selden-Treiman, Director of Operations at Potent Pages.

    David Selden-Treiman is Director of Operations and a project manager at Potent Pages. He specializes in custom web crawler development, website optimization, server management, web application development, and custom programming. Working at Potent Pages since 2012 and programming since 2003, David has extensive expertise solving problems using programming for dozens of clients. He also has extensive experience managing and optimizing servers, managing dozens of servers for both Potent Pages and other clients.


    Tags:

    Comments are closed here.

    Web Crawlers

    Data Collection

    There is a lot of data you can collect with a web crawler. Often, xpaths will be the easiest way to identify that info. However, you may also need to deal with AJAX-based data.

    Web Crawler Industries

    There are a lot of uses of web crawlers across industries. Industries benefiting from web crawlers include:

    Legality of Web Crawlers

    Web crawlers are generally legal if used properly and respectfully.

    Development

    Deciding whether to build in-house or finding a contractor will depend on your skillset and requirements. If you do decide to hire, there are a number of considerations you'll want to take into account.

    It's important to understand the lifecycle of a web crawler development project whomever you decide to hire.

    Building Your Own

    If you're looking to build your own web crawler, we have the best tutorials for your preferred programming language: Java, Node, PHP, and Python. We also track tutorials for Apache Nutch, Cheerio, and Scrapy.

    Hedge Funds & Custom Data

    Custom Data For Hedge Funds

    Developing and testing hypotheses is essential for hedge funds. Custom data can be one of the best tools to do this.

    There are many types of custom data for hedge funds, as well as many ways to get it.

    Implementation

    There are many different types of financial firms that can benefit from custom data. These include macro hedge funds, as well as hedge funds with long, short, or long-short equity portfolios.

    Leading Indicators

    Developing leading indicators is essential for predicting movements in the equities markets. Custom data is a great way to help do this.

    Web Crawler Pricing

    How Much Does a Web Crawler Cost?

    A web crawler costs anywhere from:

    • nothing for open source crawlers,
    • $30-$500+ for commercial solutions, or
    • hundreds or thousands of dollars for custom crawlers.

    Factors Affecting Web Crawler Project Costs

    There are many factors that affect the price of a web crawler. While the pricing models have changed with the technologies available, ensuring value for money with your web crawler is essential to a successful project.

    When planning a web crawler project, make sure that you avoid common misconceptions about web crawler pricing.

    Web Crawler Expenses

    There are many factors that affect the expenses of web crawlers. In addition to some of the hidden web crawler expenses, it's important to know the fundamentals of web crawlers to get the best success on your web crawler development.

    If you're looking to hire a web crawler developer, the hourly rates range from:

    • entry-level developers charging $20-40/hr,
    • mid-level developers with some experience at $60-85/hr,
    • to top-tier experts commanding $100-200+/hr.

    GPT & Web Crawlers

    GPTs like GPT4 are an excellent addition to web crawlers. GPT4 is more capable than GPT3.5, but not as cost effective especially in a large-scale web crawling context.

    There are a number of ways to use GPT3.5 & GPT 4 in web crawlers, but the most common use for us is data analysis. GPTs can also help address some of the issues with large-scale web crawling.

    Scroll To Top