Guides

Scraping Amazon Product Data With Python: A Step-by-Step Tutorial

Tomas Jurgaitis
Last Updated on 2025-07-23

Ready to scale your data?

Subscribe to our newsletter

To extract product data from Amazon reliably, you need to understand the foundations of how web data is accessed and manipulated. If you’re new to this space, it helps to start with the basics of how websites are structured and how data is scraped, concepts that are core to modern scraping techniques.

Amazon actively detects and blocks scraping attempts, returning errors or misleading content. That’s where proxies and error handling in Python become essential tools in your toolkit.

In this guide, we’ll walk through how to scrape any Amazon product page using Python and proxies, with tips to avoid detection and keep your scraper running reliably.

We will extract the following data from the pages:

Product Title
Price
Rating
Availability

You will also learn how to rotate proxies, avoid captchas, and scale your scraper without getting blocked.

Why Scraping Amazon Is Difficult

Amazon employs multiple layers of anti-scraping defenses, including:

IP rate limiting and blacklisting.
JavaScript rendering for certain content.
Captchas and temporary blocks.
Geo-based content restrictions.
User-Agent and header fingerprinting.

Using requests.get(url) without the right setup often leads to incomplete data or even a ban.

Setting Up Your Environment

Before we scrape anything, make sure Python and the required libraries are installed.

Install Python

If Python isn’t already installed:

Download Python 3.x from python.org/downloads
During installation, check “Add Python to PATH”.
Verify by running:

python --version

python --version

You should see something like:

Install Required Libraries

Open your terminal or command prompt and install dependencies:

pip install requests beautifulsoup4

pip install requests beautifulsoup4

These allow us to:

Fetch HTML content from Amazon.
Parse the HTML and extract relevant product data.

Use a Virtual Environment

For cleaner project management:

python -m venv amazon-scraper-env
source amazon-scraper-env/bin/activate  # On Windows: amazon-scraper-env\Scripts\activate
pip install requests beautifulsoup4

python -m venv amazon-scraper-env
source amazon-scraper-env/bin/activate  # On Windows: amazon-scraper-env\Scripts\activate
pip install requests beautifulsoup4

What Are We Scraping?

We will be scraping an Amazon product page for the demo:

Remember: This method applies to any Amazon product link.

Send Request with Proxy and Headers

Amazon blocks requests that don’t look like real browsers. Sometimes, sending only the User-Agent is enough to get the data, but at other times, you may need to send more headers.

To identify the User-agent by your browser, press F12 and open the Network tab. Select the first request and examine Request headers.

So we’ll add headers and use a proxy to make our requests look more like a regular browser.

import requests
from bs4 import BeautifulSoup
url = "https://www.amazon.com/dp/B098FKXT8L"
headers = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/114.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
}
proxies = {
    "http": "http://username:password@proxy.proxying.io:port",
    "https": "http://username:password@proxy.proxying.io:port",
}
response = requests.get(url, headers=headers, proxies=proxies)
soup = BeautifulSoup(response.text, "html.parser")

import requests
from bs4 import BeautifulSoup
url = "https://www.amazon.com/dp/B098FKXT8L"
headers = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/114.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
}
proxies = {
    "http": "http://username:[email protected]:port",
    "https": "http://username:[email protected]:port",
}
response = requests.get(url, headers=headers, proxies=proxies)
soup = BeautifulSoup(response.text, "html.parser")

Replace the proxy URL, username, password, and port with your Proxying.io credentials.

Extract Product Information

Now that we have the page HTML, let’s pull the product data.

def extract_data(soup):
    title = soup.find("span", id="productTitle")
    price = soup.find("span", class_="a-offscreen")
    rating = soup.find("span", class_="a-icon-alt")
    availability = soup.find("div", id="availability")
    return {
        "Title": title.get_text(strip=True) if title else "N/A",
        "Price": price.get_text(strip=True) if price else "N/A",
        "Rating": rating.get_text(strip=True) if rating else "N/A",
        "Availability": availability.get_text(strip=True) if availability else "N/A",
    }
data = extract_data(soup)
print(data)

def extract_data(soup):
    title = soup.find("span", id="productTitle")
    price = soup.find("span", class_="a-offscreen")
    rating = soup.find("span", class_="a-icon-alt")
    availability = soup.find("div", id="availability")
    return {
        "Title": title.get_text(strip=True) if title else "N/A",
        "Price": price.get_text(strip=True) if price else "N/A",
        "Rating": rating.get_text(strip=True) if rating else "N/A",
        "Availability": availability.get_text(strip=True) if availability else "N/A",
    }
data = extract_data(soup)
print(data)

Avoid Bot Detection with Proxies

Amazon may serve a captcha or error page if you scrape too often from the same IP.

Fixes:

Rotate IPs with each request
Add delays
Change headers
Use residential proxies for better stealth

Proxying.io makes this easy with:

Rotating endpoints
Sticky sessions
Datacenter and residential IPs
Global locations for geo-unlocking

Rotate Proxies for Large-Scale Scraping

If you plan to scrape multiple products, you’ll need IP rotation:

import random
proxy_list = [
    "http://user:pass@proxy1.proxying.io:port",
    "http://user:pass@proxy2.proxying.io:port",
]
def get_random_proxy():
    return random.choice(proxy_list)
response = requests.get(
    url,
    headers=headers,
    proxies={"http": get_random_proxy(), "https": get_random_proxy()}
)

import random
proxy_list = [
    "http://user:[email protected]:port",
    "http://user:[email protected]:port",
]
def get_random_proxy():
    return random.choice(proxy_list)
response = requests.get(
    url,
    headers=headers,
    proxies={"http": get_random_proxy(), "https": get_random_proxy()}
)

Alternatively, use Proxying.io’s automatic rotating endpoint to handle this behind the scenes.

Save Scraped Data to CSV

Exporting your results makes them easier to work with later.

import csv
with open("amazon_product.csv", mode="w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=data.keys())
    writer.writeheader()
    writer.writerow(data)

import csv
with open("amazon_product.csv", mode="w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=data.keys())
    writer.writeheader()
    writer.writerow(data)

Add Retry and Delay Logic

Always retry failed requests and slow your scraper slightly to avoid rate limits.

import time
def safe_request(url, headers, proxies, retries=3):
    for _ in range(retries):
        try:
            response = requests.get(url, headers=headers, proxies=proxies, timeout=10)
            if "captcha" not in response.text.lower():
                return response
        except:
            time.sleep(3)
    return None

import time
def safe_request(url, headers, proxies, retries=3):
    for _ in range(retries):
        try:
            response = requests.get(url, headers=headers, proxies=proxies, timeout=10)
            if "captcha" not in response.text.lower():
                return response
        except:
            time.sleep(3)
    return None

Use Selenium for Dynamic Elements

If certain data is loaded via JavaScript (e.g., product variations or reviews), use a browser automation tool.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--proxy-server=http://user:pass@proxy.proxying.io:port")
driver = webdriver.Chrome(options=options)
driver.get(url)
print(driver.page_source)
driver.quit()

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--proxy-server=http://user:[email protected]:port")
driver = webdriver.Chrome(options=options)
driver.get(url)
print(driver.page_source)
driver.quit()

Try Proxying.io for Free

Scraping Amazon data may seem difficult at first, but with the right tools, Python, BeautifulSoup, and reliable proxies, it becomes pretty simple. Proxying.io offers lightning-fast, anonymous proxies ideal for Amazon scraping, data collection, and automation workflows.

25MB of free bandwidth
No credit card required
Datacenter and residential proxy pools
Global IPs, sticky sessions, and auto-rotation

Sign up now and start scraping Amazon the smart way.

Frequently Asked Questions (FAQs)

How can I handle JavaScript-rendered content on Amazon pages?

For dynamic elements loaded via JavaScript, use Selenium with a headless browser configuration. The tutorial includes an example using Selenium with Chrome options and a proxy for automation.

How do I save the scraped Amazon data for later use?

The tutorial shows how to save scraped data to a CSV file using Python’s csv module. The data is written with headers and values, ensuring it’s organized and accessible for further analysis.

What product data can I extract from an Amazon page using this method?

You can extract the product title, price, rating, and availability. The tutorial provides a function to locate these elements using BeautifulSoup by targeting specific HTML tags and classes.

About the author

Tomas Jurgaitis

Tomas Jurgaitis has led PR initiatives at the forefront of tech, blending a sharp eye for storytelling with a deep-rooted curiosity for all things digital. Raised in an environment where innovation was the norm, his passion for the internet and emerging tech came naturally where he regularly crafts how-to tutorials for web scraping.

Starts from

$4/GB

Pay as You Go

ISP Proxies

Starts from

$2/IP

Coming Soon

Dedicated Datacenter Proxies

Starts from

$1.5/IP

Coming Soon

IN THIS ARTICLE:

Ready to scale your data?

Subscribe to our newsletter

Why Scraping Amazon Is Difficult

Setting Up Your Environment

Install Python

Install Required Libraries

Use a Virtual Environment

What Are We Scraping?

Send Request with Proxy and Headers

Extract Product Information

Avoid Bot Detection with Proxies

Rotate Proxies for Large-Scale Scraping

Save Scraped Data to CSV

Add Retry and Delay Logic

Use Selenium for Dynamic Elements

Try Proxying.io for Free

Frequently Asked Questions (FAQs)

About the author

IN THIS ARTICLE:

Ready to scale your data?

Subscribe to our newsletter

Want to scale your web data gathering with Proxies?

Related articles

Start testing our proxies for free

Helping companies scale their web data gathering with residential proxies.