Archives: Robots.txt
Creating a Polite PHP Web Crawler: Checking robots.txt
May 31, 2018 | By David Selden-TreimanIn this tutorial, we create a PHP website spider that uses the robots.txt file to know which pages we're allowed to download. We continue from our previous tutorials to create a robust web spider and expand on it to check for download crawling permissions.