Web scraping products from Amazon using ScrapeStorm
In this article, we will tell you how to scrape products from Amazon using ScrapeStorm’s “Flowchart mode“.
Introduction to the scraping tool
ScrapeStorm (www.scrapestorm.com) is a new generation of Web Scraping Tool based on artificial intelligence technology. It is the first scraper to support both Windows, Mac and Linux operating systems.
Introduction to the scraping object
Amazon.com, Inc., doing business as Amazon, is an American electronic commerce and cloud computing company based in Seattle, Washington, that was founded by Jeff Bezos on July 5, 1994.
Official Website: https://www.amazon.com/
Scraping fields
title, title_link, Thumbnail, brand, rating, price, review
Function point directory
How to use the “Extract Data” component
Preview of the scraped result
Export to Excel2007:
Export images to local:
1. Download and install ScrapeStorm, then register and log in
(1) Open the ScrapeStorm official website, download and install the latest version.
(2) Click Register/Login to register a new account and then log in to ScrapeStorm.
Tips: You can use this web scraping software directly, you don’t need to register, but the tasks under the anonymous account will be lost when you switch to the registered user, so it is recommended that you use it after registration.
2. Create a task
(1) Copy the URL of Amazon
Click here to learn more about how to enter the URL correctly.
(2) Create a new flowchart mode task
You can create a new scraping task directly on the software, or you can create a task by importing rules.
Click here to learn how to import and export scraping rules.
3. Configure the scraping rules
(1) Set the fields
We click on the product field on the page and select Extract all elements in the prompt box in the top left corner.
Click here to learn more about Extract Data component
After extracting the fields on the comment list page, we can right click on the field to make related settings, including renaming, adding or deleting fields, modifying data, and so on.
The field settings are as follows:
(2) Set the page
We scraped out the product data of a single page. Now we need to scrape the data of the next page. We click the “Next Page” button on the page and select “Loop Click Next” in the prompt box that appears in the upper left corner.
Click here to learn how to manually select the page.
4. Set up and start the scraping task
(1) Running and Anti-block settings
Once the rules are configured, we can start the acquisition task. Click “Start” and then jump out of the taskbar. The taskbar interface has a “More Settings” button, which we can click to set up to improve stability and success rate.
Click “Setting”, set waiting time based on web page open speed. The anti-block settings follow the system default settings. Then click “Save”.
(2) Start scraping data
Premium Plan and above users can use “Scheduled job” and “Sync to Database”. If you want to download images, you can check “Download images while running”. Then click “Start”.
Click here to learn about scheduled job.
Click here to learn about sync to database.
Click here to learn about download images.
(3) Wait a moment, you will see the data being scraped.
5. Export and view data
(1) Click “Export” to download your data.
(2) Choose the format to export according to your needs.
ScrapeStorm provides a variety of export methods to export locally, such as excel, csv, html, txt or database. Professional Plan and above users can also post directly to wordpress.
Click here to learn more about how to view the extraction results and clear the extracted data.
评论
发表评论