How to extract emails from a webpage
Extracting a list of email addresses can be useful when attempting to retrieve contact details from multiple websites. Not all websites will have email addresses present within the page as the trend to move to contact forms continues to grow, but the details below will help you get started.
# Getting started
To get started, create an a new automation within Axiom.ai.
# Extracting email addresses
In order to extract email addresses from a single page, follow the steps below:
- Add a Go to page step to your automation, input the website that you would like to extract email addresses from.
- Add a Write Javascript step, include the code below to extract the email addresses. Select the variant that works for your needs.
- Add a Write to a Google Sheet step and write the
code-datadata token to your Google Sheet.
To use this method to extract email addresses from multiple pages, follow the steps below:
- Add a Read data from a Google Sheet step, this sheet should contain a list of URLs to visit. See How to extract links and write to a Google Sheet for more details on how to do this.
- Add a Loop through data step, use the
google-sheet-datatoken to loop through the list of URLs. - Inside the loop, add the steps from the section above.
# Scripts
Retrieve all email addresses:
function getEmails() {
var pageHTML = document.body.innerHTML;
pageHTMLStr = pageHTML.toString();
emailAddresses = pageHTMLStr.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
return [emailAddresses];
}
Retrieve all email addresses, remove duplicates:
function getEmails() {
var pageHTML = document.body.innerHTML;
pageHTMLStr = pageHTML.toString();
emailAddresses = pageHTMLStr.match(/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
return [[...new Set(emailAddresses)]];
}
Retrieve all email addresses that match a domain:
function getEmails() {
var pageHTML = document.body.innerHTML;
pageHTMLStr = pageHTML.toString();
// Replace "example.com" with your domain
emailAddresses = pageHTMLStr.match(/([a-zA-Z0-9._-]+@example.com)/gi);
return [emailAddresses]; // or return [[...new Set(emailAddresses)]];
}
# Testing your workflow
Click "Run" on your automation, observe the builder for any errors (see Common Errors for more details) and your Google Sheet for changes.
Alternatively, if you are a developer, this code can be copied into the Chrome Devtools console for testing outside of your automation.
# Wrapping up
Extracting email addresses from a page can be helpful from a marketing perspective, but should be used sparingly to avoid spamming potential customers. We are excited to see what you do with this - let us know over in our community (opens new window).