How to configure the web scraper
Our web scraping steps have a set of configuration options optimized for accuracy ahead of speed. We favor accuracy over speed because webpages are still loading data several seconds after the page has appeared in the browser. If you scrape too quickly, data can be missed.
However, you can of course edit these options to alter the performance of your scraper, including the speed. We recommend that when changing any of these settings, you test your scraper and check that the output is as desired.
To access these configuration options, toggle "Configure scraper" to on.
# Wait time between scrolls (ms)
This setting adjusts the wait time between each scroll. Some page types load content as you scroll. Our algorithms account for this; however, there may be exceptions when you need to adjust the time if the content is loading slowly.
# Number of attempts when results not found
Our scraping algorithms will retry to scrape content because not all content loads when the page does. Many dynamic sites update content after the page first appears.
Reducing the retries can greatly speed up your scrape. If the page has several steps before the scrape step, then you should try reducing the number of attempts to 0. Test this setting as it can also have a negative impact.
This is the setting we most often use to speed up our scrapers, though we recommend strongly you test
# Minimum wait before scraping (ms)
This setting gives you granular control over how long the scraper pauses before each scrape. Give this a low setting to speed up your scrape, but do test any changes. If content is still loading, it may be missed.
# Page number to start scraping on
When scraping paginated pages, you can set a page number to start scrolling from.
# Specify exact number of pixels to scroll instead of auto-scrolling
Our scrapers will automatically scroll down the page to load new content, wait, then scrape before scrolling again to load more content. Nine times out of ten, you will never need to adjust this setting. However, if you do, this setting allows you to specify an amount of pixels to scroll before stopping to scrape.
# Force a re-scrape after each page change
The final configuration setting is for use when the scraper returns the same data each loop. This is very rare, but just in case, you can tick a box to force a new scrape.