Loading…
Attending this event?
Join us for networking, learning, sharing and fun!
Saturday, June 1 • 11:15am - 12:30pm
Scraping the unscrapable: advanced approaches to deal with complex sites and evade anti-scraping systems

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!


Scraped data can often be the backbone of an investigation, but some websites are more difficult to scrape than others. This could be because of the sheer volume of data you need, or the way the site is built - either accidentally or deliberately making it hard to scrape. This session will cover best practices for dealing with tricky sites, including coping with captchas, using proxy and other scraping services, and best ways to scale up your scraping by using the cloud. This is an advanced session aimed at people who already have experience of writing code to scrape websites and want to move up to the next level: participants will leave with an understanding of how to approach hard-to-scrape websites, plus the tradeoffs and costs.

Speakers
avatar for Max Harlow

Max Harlow

Financial Times
Max Harlow works on the visual and data journalism team at the Financial Times, focusing on investigations. He also runs Journocoders, a group for journalists to develop technical skills for use in their reporting.


Saturday June 1, 2024 11:15am - 12:30pm CEST
2.10
Feedback form isn't open yet.