Scale your Web Data Gathering: Talk to us and we’ll help scale with high quality Residential Proxies.

IN THIS ARTICLE:

Ready to scale your data?

Subscribe to our newsletter

To extract product data from Amazon reliably, you need to understand the foundations of how web data is accessed and manipulated. If you’re new to this space, it helps to start with the basics of how websites are structured and how data is scraped, concepts that are core to modern scraping techniques.

Amazon actively detects and blocks scraping attempts, returning errors or misleading content. That’s where proxies and error handling in Python become essential tools in your toolkit.

In this guide, we’ll walk through how to scrape any Amazon product page using Python and proxies, with tips to avoid detection and keep your scraper running reliably.

We will extract the following data from the pages:

  • Product Title
  • Price
  • Rating
  • Availability 

You will also learn how to rotate proxies, avoid captchas, and scale your scraper without getting blocked.

Why Scraping Amazon Is Difficult 

Amazon employs multiple layers of anti-scraping defenses, including:

  • IP rate limiting and blacklisting.
  • JavaScript rendering for certain content.
  • Captchas and temporary blocks.
  • Geo-based content restrictions.
  • User-Agent and header fingerprinting.

Using requests.get(url) without the right setup often leads to incomplete data or even a ban.

Setting Up Your Environment

Before we scrape anything, make sure Python and the required libraries are installed.

Install Python

If Python isn’t already installed:

  • Download Python 3.x from python.org/downloads
  • During installation, check “Add Python to PATH”.
  • Verify by running:

python --version

You should see something like:

Output of upper command

Install Required Libraries

Open your terminal or command prompt and install dependencies:

pip install requests beautifulsoup4

These allow us to:

  • Fetch HTML content from Amazon.
  • Parse the HTML and extract relevant product data.

Use a Virtual Environment 

For cleaner project management:

python -m venv amazon-scraper-env
source amazon-scraper-env/bin/activate  # On Windows: amazon-scraper-env\Scripts\activate
pip install requests beautifulsoup4

What Are We Scraping?

We will be scraping an Amazon product page for the demo:

Remember: This method applies to any Amazon product link.

Send Request with Proxy and Headers

Amazon blocks requests that don’t look like real browsers. Sometimes, sending only the User-Agent is enough to get the data, but at other times, you may need to send more headers. 

To identify the User-agent by your browser, press F12 and open the Network tab. Select the first request and examine Request headers. 

Showing the headers in Inspection mode

So we’ll add headers and use a proxy to make our requests look more like a regular browser.

import requests
from bs4 import BeautifulSoup
url = "https://www.amazon.com/dp/B098FKXT8L"
headers = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/114.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
}
proxies = {
    "http": "http://username:[email protected]:port",
    "https": "http://username:[email protected]:port",
}
response = requests.get(url, headers=headers, proxies=proxies)
soup = BeautifulSoup(response.text, "html.parser")

Replace the proxy URL, username, password, and port with your Proxying.io credentials.

Extract Product Information

Now that we have the page HTML, let’s pull the product data.

def extract_data(soup):
    title = soup.find("span", id="productTitle")
    price = soup.find("span", class_="a-offscreen")
    rating = soup.find("span", class_="a-icon-alt")
    availability = soup.find("div", id="availability")
    return {
        "Title": title.get_text(strip=True) if title else "N/A",
        "Price": price.get_text(strip=True) if price else "N/A",
        "Rating": rating.get_text(strip=True) if rating else "N/A",
        "Availability": availability.get_text(strip=True) if availability else "N/A",
    }
data = extract_data(soup)
print(data)

Avoid Bot Detection with Proxies

Amazon may serve a captcha or error page if you scrape too often from the same IP.

Fixes:

  • Rotate IPs with each request 
  • Add delays 
  • Change headers
  • Use residential proxies for better stealth

Proxying.io makes this easy with:

  • Rotating endpoints
  • Sticky sessions
  • Datacenter and residential IPs
  • Global locations for geo-unlocking

Rotate Proxies for Large-Scale Scraping

If you plan to scrape multiple products, you’ll need IP rotation:

import random
proxy_list = [
    "http://user:[email protected]:port",
    "http://user:[email protected]:port",
]
def get_random_proxy():
    return random.choice(proxy_list)
response = requests.get(
    url,
    headers=headers,
    proxies={"http": get_random_proxy(), "https": get_random_proxy()}
)

Alternatively, use Proxying.io’s automatic rotating endpoint to handle this behind the scenes.

Save Scraped Data to CSV

Exporting your results makes them easier to work with later.

import csv
with open("amazon_product.csv", mode="w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=data.keys())
    writer.writeheader()
    writer.writerow(data)

Add Retry and Delay Logic

Always retry failed requests and slow your scraper slightly to avoid rate limits.

import time
def safe_request(url, headers, proxies, retries=3):
    for _ in range(retries):
        try:
            response = requests.get(url, headers=headers, proxies=proxies, timeout=10)
            if "captcha" not in response.text.lower():
                return response
        except:
            time.sleep(3)
    return None

Use Selenium for Dynamic Elements

If certain data is loaded via JavaScript (e.g., product variations or reviews), use a browser automation tool.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
options.add_argument("--proxy-server=http://user:[email protected]:port")
driver = webdriver.Chrome(options=options)
driver.get(url)
print(driver.page_source)
driver.quit()

Try Proxying.io for Free

Scraping Amazon data may seem difficult at first, but with the right tools, Python, BeautifulSoup, and reliable proxies, it becomes pretty simple. Proxying.io offers lightning-fast, anonymous proxies ideal for Amazon scraping, data collection, and automation workflows.

  • 25MB of free bandwidth
  • No credit card required
  • Datacenter and residential proxy pools
  • Global IPs, sticky sessions, and auto-rotation

Sign up now and start scraping Amazon the smart way.

Frequently Asked Questions (FAQs)

For dynamic elements loaded via JavaScript, use Selenium with a headless browser configuration. The tutorial includes an example using Selenium with Chrome options and a proxy for automation.

The tutorial shows how to save scraped data to a CSV file using Python’s csv module. The data is written with headers and values, ensuring it’s organized and accessible for further analysis.

You can extract the product title, price, rating, and availability. The tutorial provides a function to locate these elements using BeautifulSoup by targeting specific HTML tags and classes.

About the author

IN THIS ARTICLE:

Ready to scale your data?

Subscribe to our newsletter

Want to scale your web data gathering with Proxies?

Related articles