Scrape data from a list of links
Learn how to build an automation that allows you to loop through a list of links from a spreadsheet using our Google Sheets integration and then extract data from each page. Get started by adding steps from the Builder and selecting the links using the Selector Tool. Follow the design pattern below to get started:
Design pattern: Loop through links scraping data
Tip: We recommend using websites that have an identical structure to ensure that the Selector Tool does not throw any errors, for example, a list of LinkedIn profiles.
To get started quickly, see our template.
# Building the automation
There are multiple steps used within the automation:
- Read data from a Google Sheet step.
- Loop through data step.
- Go to page step.
- Get data from bot's current page step.
- Write data to a Google Sheet step.
- Delete row from a Google Sheet step.
# Setup
Prepare your Google Sheets spreadsheet. Add a single link per row.
| Col A |
|---|
| https://example.com |
| https://example.com |
| https://example.com |
Create a new Axiom.ai automation by opening the extension and clicking "+ New automation". Use the Step Finder to add new steps to your automation.
# Read in the list of links
Add a Read data from a Google Sheet step to your automation. Configure as follows:
- Spreadsheet: select the spreadsheet containing your list of links.
- Sheet name: (optional) select the sheet within the spreadsheet containing your list of links.
- First cell: (optional) the first row and column to start reading, for example: "A1".
- Last cell: (optional) the last row and column to read, for example: "A21".
# Loop through the list of links
Add a Loop through data step to your automation. Click "Insert data" to select the google-sheet-data data token that was output from the "Read data from a Google Sheet" step.
# Perform actions using the loop data
The "Loop through data" step will iterate through each row of the google-sheet-data token - in this instance, the list of links. Using the "Insert data" option to select the google-sheet-data token within the "Loop through data" step will allow for the access to each row as the loop iterates through the data.
# Scrape data from pages
Steps should be added inside the "Loop through data" step.
To navigate to the page, add a Go to page step to your automation. Click "Insert data", select the google-sheet-data token and select the column that contains the link you wish for the automation to visit - you may need to click "Clear all" to ensure only one column is selected.
Once you have navigated to the link, add a Get data from bot's current page step. Click "Select" to open the Selector Tool and select the elements on the page that you would like to scrape.
# Manage Google Sheet
Steps should be added inside the "Loop through data" step.
To store the scrape data, add a Write data to a Google Sheet step. Configure as follows:
- Spreadsheet: select the spreadsheet to write to.
- Sheet name: (optional) select the sheet to write to.
- DATA: select the
scraped-datatoken. - Write options: select "Add to existing data" to append to the end of your sheet, select "Clear data before writing" to clear the sheet and write the new data.
Optionally, once you have scraped the data on the link, add a Delete row from a Google Sheet step to your automation to delete the link from the Google Sheet. Configure as follows:
- Spreadsheet: select the spreadsheet to write to.
- Sheet name: (optional) select the sheet to write to.
- First row to delete: set to 1.
- Last row to delete: set to 1.
# Wrapping up
This design pattern is best suited to web pages that share an identical structure, an ecommerce product listing or social media profile page, for example. Combine with the How to extract links and write to a Google Sheet guide to quickly get started.