Advanced Web Scraping Tutorial Project

This repository is a companion to the article Advanced Web Scraping: Bypassing captcha, "403 Forbidden," and more. Please refer to the article for further details.

This is a scrapy web scraper for the fictional Zipru torrent site. It is designed to bypass four distinct anti-scraping mechanisms:

User agent filtering.
Obfuscated javascript redirects.
Captchas.
Header consistency checks.

The scraper is not actually functional because Zipru is not a real site. The code, however, is otherwise complete and can easily be adapted to work on other sites.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Advanced Web Scraping Tutorial Project

Files

README.md

Latest commit

History

README.md

File metadata and controls

Advanced Web Scraping Tutorial Project