Manthan Koolwal – Scraping Dog https://www.scrapingdog.com Tue, 25 Nov 2025 09:05:38 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 https://www.scrapingdog.com/wp-content/uploads/2024/09/fav-150x150.png Manthan Koolwal – Scraping Dog https://www.scrapingdog.com 32 32 Scrape Amazon Using Python (Updated) https://www.scrapingdog.com/blog/scrape-amazon/ https://www.scrapingdog.com/blog/scrape-amazon/#comments Mon, 24 Nov 2025 02:20:46 +0000 https://scrapingdog.com/?p=5524

TL;DR

  • Walks you through how to scrape product pages on Amazon using Python with requests + BeautifulSoup (for title, images, price, rating, specs).
  • Shows how to mimic browser-like headers to bypass Amazon’s anti-bot mechanisms.
  • Details how to extract high-resolution images via regex search for hiRes in the page’s <script> content.
  • Provides a full example script with rotating user-agents for basic scraping.
  • Explains when you need to scale: using a proxy/API solution (specifically Scrapingdog’s Amazon Scraper API) to avoid IP blocks and handle high volume.
  • Covers how to call that API (by ASIN, domain, postal-code-based locale) and other related endpoints (offers, autocomplete) for richer Amazon data.

The e-commerce industry has grown in recent years, transforming from a mere convenience to an essential facet of our daily lives.

As digital storefronts multiply and consumers increasingly turn to online shopping, there’s an increasing demand for data that can drive decision-making, competitive strategies, and customer engagement in the digital marketplace.

Additionally, scraped Amazon product data can significantly enhance customer service automation by providing customer service teams with real-time product information, pricing details, and availability status, enabling them to respond more efficiently to customer inquiries and resolve issues faster.

If you are into an e-commerce niche, scraping Amazon can give you a lot of data points to understand the market.

In this guide, we will use Python to scrape Amazon, do price scraping from this platform, and demonstrate how to extract crucial information to help you make well-informed decisions in your business.

Setting up the prerequisites

I am assuming that you have already installed python 3.x on your machine. If not then you can download it from here. Apart from this, we will require two III-party libraries of Python.

  • Requests– We will use this library to connect HTTP with the Amazon page. This library will help us to extract the raw HTML from the target page.
  • BeautifulSoup– This is a powerful data parsing library. Using this we will extract necessary data out of the raw HTML we get using the requests library.

Before we install these libraries we will have to create a dedicated folder for our project.

				
					mkdir amazonscraper
				
			

Now, we will have to install the above two libraries in this folder. Here is how you can do it.

				
					pip install beautifulsoup4
pip install requests
				
			
Now, you can create a Python file by any name you wish. This will be the main file where we will keep our code. I am naming it amazon.py

Downloading raw data from amazon.com

Let’s make a normal GET request to our target page and see what happens. For GET request we are going to use the requests library.
				
					import requests
from bs4 import BeautifulSoup

target_url="https://www.amazon.com/dp/B0BSHF7WHW"

resp = requests.get(target_url)

print(resp.text)
				
			

Once you run this code, you might see this.

This is a captcha from amazon.com and this happens once their architecture observes that the incoming request is from a bot/script and not from a real human being.

To bypass this on-site protection of Amazon we can send some headers like User-Agent. You can even check what headers are sent to amazon.com once you open the URL in your browser. You can check them from the network tab.

Once you pass this header to the request, your request will act like a request coming from a real browser. This can melt down the anti-bot wall of amazon.com. Let’s pass a few headers to our request.

				
					import requests
from bs4 import BeautifulSoup

target_url="https://www.amazon.com/dp/B0BSHF7WHW"

headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}

resp = requests.get(target_url, headers=headers)

print(resp.text)
				
			

Once you run this code you might be able to bypass the anti-scraping protection wall of Amazon.

Now let’s decide what exact information we want to scrape from the page.

What are we going to scrape from Amazon?

It is always great to decide in advance what are you going to extract from the target page. This way we can analyze in advance which element is placed where inside the DOM.

Product details we are going to scrape from Amazon

We are going to scrape five data elements from the page.

  • Name of the product
  • Images
  • Price (Most important)
  • Rating
  • Specs

First, we are going to make the GET request to the target page using the requests library and then using BS4 we are going to parse out this data. Of course, there are multiple other libraries like lxml that can be used in place of BS4, but BS4 has the most powerful and easy-to-use API.

Before making the request we are going to analyze the page and find the location of each element inside the DOM. One should always do this exercise to identify the location of each element.

We are going to do this by simply using the developer tool. This can be accessed by right-clicking on the target element and then clicking on the inspect. This is the most common method, you might already know this.

Identifying the location of each element

Location of the title tag

Identifying location of title tag in source code of amazon website

Once you inspect the title you will find that the title text is located inside the h1 tag with the id title.

Coming back to our amazon.py file, we will write the code to extract this information from Amazon.

				
					import requests
from bs4 import BeautifulSoup

l=[]
o={}


url="https://www.amazon.com/dp/B0BSHF7WHW"

headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}

resp = requests.get(url, headers=headers)
print(resp.status_code)

soup=BeautifulSoup(resp.text,'html.parser')


try:
    o["title"]=soup.find('h1',{'id':'title'}).text.strip()
except:
    o["title"]=None





print(o)
				
			

Here the line soup=BeautifulSoup(resp.text,’html.parser’) is using the BeautifulSoup library to create a BeautifulSoup object from an HTTP response text, with the specified HTML parser.

Then using soup.find() method will return the first occurrence of the tag h1 with id title. We are using .text method to get the text from that element. Then finally I used .strip() method to remove all the whitespaces from the text we receive.

Once you run this code you will get this.

				
					[{'title': 'Apple 2023 MacBook Pro Laptop M2 Pro chip with 12‑core CPU and 19‑core GPU: 16.2-inch Liquid Retina XDR Display, 16GB Unified Memory, 1TB SSD Storage. Works with iPhone/iPad; Space Gray'}]
				
			

If you have not read the above section where we talked about downloading HTML data from the target page then you won’t be able to understand the above code. So, please read the above section before moving ahead.

Location of the image tag

This might be the most tricky part of this complete tutorial. Let’s inspect and find out why it is a little tricky.

Inspecting image tag in the source code of amazon website
As you can see the img tag in which the image is hidden is stored inside div tag with class imgTagWrapper.
				
					allimages = soup.find_all("div",{"class":"imgTagWrapper"})
print(len(allimages))
				
			

Once you print this it will return 3. Now, there are 6 images and we are getting just 3. The reason behind this is JS rendering. Amazon loads its images through an AJAX request at the backend. That’s why we never receive these images when we make an HTTP connection to the page through requests library.

Finding high-resolution images is not as simple as finding the title tag. But I will explain to you step by step how you can find all the images of the product.

  1. Copy any product image URL from the page.
  2. Then click on the view page source to open the source page of the target webpage.
  3. Then search for this image.

You will find that all the images are stored as a value for hiRes key.

All this information is stored inside a script tag. Now, here we will use regular expressions to find this pattern of hiRes”:”image_url”

We can still use BS4 but it will make the process a little lengthy and it might slow down our scraper. For now, we will use (.+?) non-greedy matches for one or more characters. Let me explain what each character in this expression means.

  • The . matches any character except a newline
  • The + matches one or more occurrences of the preceding character.
  • The ? makes the match non-greedy, meaning that it will match the minimum number of characters needed to satisfy the pattern.

The regular expression will return all the matched sequences of characters from the HTML string we are going to pass.

				
					images = re.findall('"hiRes":"(.+?)"', resp.text)
o["images"]=images
				
			

This will return all the high-resolution images of the product in a list. In general, it is not advised to use regular expression in data parsing but it can do wonders sometimes.

Parsing the price tag

There are two price tags on the page, but we will only extract the one which is just below the rating.

We can see that the price tag is stored inside span tag with class a-price. Once you find this tag you can find the first child span tag to get the price. Here is how you can do it.
				
					try:
    o["price"]=soup.find("span",{"class":"a-price"}).find("span").text
except:
    o["price"]=None
				
			

Once you print object o, you will get to see the price.

				
					{'price': '$2,499.00'}
				
			

Extract rating

You can find the rating in the first i tag with class a-icon-star. Let’s see how to scrape this too.

				
					try:
    o["rating"]=soup.find("i",{"class":"a-icon-star"}).text
except:
    o["rating"]=None
				
			

It will return this.

				
					{'rating': '4.1 out of 5 stars'}
				
			

In the same manner, we can scrape the specs of the device.

Extract the specs of the device

These specs are stored inside these tr tags with class a-spacing-small. Once you find these you have to find both the span under it to find the text. You can see this in the above image. Here is how it can be done.

				
					specs_arr=[]
specs_obj={}

specs = soup.find_all("tr",{"class":"a-spacing-small"})

for u in range(0,len(specs)):
    spanTags = specs[u].find_all("span")
    specs_obj[spanTags[0].text]=spanTags[1].text


specs_arr.append(specs_obj)
o["specs"]=specs_arr
				
			

Using .find_all() we are finding all the tr tags with class a-spacing-small. Then we are running a for loop to iterate over all the tr tags. Then under for loop we find all the span tags. Then finally we are extracting the text from each span tag.

Once you print the object o it will look like this.

Throughout the tutorial, we have used try/except statements to avoid any run time error. We have not managed to scrape all the data we decided to scrape at the beginning of the tutorial.

Complete Code

You can of course make a few changes to the code to extract more data because the page is filled with large information. You can even use cron jobs to mail yourself an alert when the price drops. Or you can integrate this technique into your app, this feature can mail your users when the price of any item on Amazon drops.

But for now, the code will look like this.

				
					import requests
from bs4 import BeautifulSoup
import re

l=[]
o={}
specs_arr=[]
specs_obj={}

target_url="https://www.amazon.com/dp/B0BSHF7WHW"

headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}

resp = requests.get(target_url, headers=headers)
print(resp.status_code)
if(resp.status_code != 200):
    print(resp)
soup=BeautifulSoup(resp.text,'html.parser')


try:
    o["title"]=soup.find('h1',{'id':'title'}).text.lstrip().rstrip()
except:
    o["title"]=None


images = re.findall('"hiRes":"(.+?)"', resp.text)
o["images"]=images

try:
    o["price"]=soup.find("span",{"class":"a-price"}).find("span").text
except:
    o["price"]=None

try:
    o["rating"]=soup.find("i",{"class":"a-icon-star"}).text
except:
    o["rating"]=None


specs = soup.find_all("tr",{"class":"a-spacing-small"})

for u in range(0,len(specs)):
    spanTags = specs[u].find_all("span")
    specs_obj[spanTags[0].text]=spanTags[1].text


specs_arr.append(specs_obj)
o["specs"]=specs_arr
l.append(o)


print(l)
				
			

Changing Headers on every request

With the above code, your scraping journey will come to a halt, once Amazon recognizes a pattern in the request.

To avoid this you can keep changing your headers to keep the scraper running. You can rotate a bunch of headers to overcome this challenge. Here is how it can be done.

				
					import requests
from bs4 import BeautifulSoup
import re
import random

l=[]
o={}
specs_arr=[]
specs_obj={}

useragents=['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4894.117 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4855.118 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4892.86 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4854.191 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4859.153 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.79 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36/null',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36,gzip(gfe)',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4895.86 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 12_3_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_13) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4860.89 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4885.173 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4864.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_12) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4877.207 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 12_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML%2C like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.133 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_16_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4872.118 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 12_3_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_13) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4876.128 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML%2C like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36']

target_url="https://www.amazon.com/dp/B0BSHF7WHW"

headers={"User-Agent":useragents[random.randint(0,31)],"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}

resp = requests.get(target_url,headers=headers)
print(resp.status_code)
if(resp.status_code != 200):
    print(resp)
soup=BeautifulSoup(resp.text,'html.parser')


try:
    o["title"]=soup.find('h1',{'id':'title'}).text.lstrip().rstrip()
except:
    o["title"]=None


images = re.findall('"hiRes":"(.+?)"', resp.text)
o["images"]=images

try:
    o["price"]=soup.find("span",{"class":"a-price"}).find("span").text
except:
    o["price"]=None

try:
    o["rating"]=soup.find("i",{"class":"a-icon-star"}).text
except:
    o["rating"]=None


specs = soup.find_all("tr",{"class":"a-spacing-small"})

for u in range(0,len(specs)):
    spanTags = specs[u].find_all("span")
    specs_obj[spanTags[0].text]=spanTags[1].text


specs_arr.append(specs_obj)
o["specs"]=specs_arr
l.append(o)


print(l)
				
			

We are using a random library here to generate random numbers between 0 and 31(31 is the length of the useragents list). These user agents are all latest so you can easily bypass the anti-scraping wall.

But again this technique is not enough to scrape Amazon at scale. What if you want to scrape millions of such pages? Then this technique is super inefficient because your IP will be blocked. So, for mass scraping one has to use a web scraping proxy API to avoid getting blocked while scraping.

Using Scrapingdog for scraping Amazon

The advantages of using Scrapingdog’s Amazon Scraper API are:

  • You won’t have to manage headers anymore.
  • Every request will go through a new IP. This keeps your IP anonymous.
  • Our API will automatically retry on its own if the first hit fails.
  • Scrapingdog will handle issues like changes in HTML tags. You won’t have to check every time for changes in tags. You can focus on data collection.

Let me show you how easy it is to scrape Amazon product pages using Scrapingdog with just an ASIN code. It would be great if you could read the documentation first before trying the API.

Before you try the API you have to signup for the free pack. The free pack comes with 1000 credits which is enough for testing Amazon scraper API.

				
					import requests

url = "https://api.scrapingdog.com/amazon/product"
params = {
    "api_key": "Your-API-Key",
    "domain": "com",
    "asin": "B0C22KCKVQ"
}

response = requests.get(url, params=params)

if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f"Request failed with status code {response.status_code}")
				
			

Once you run this code you will get this beautiful JSON response.

This JSON contains almost all the data you see on the Amazon product page. 

Scraping Amazon data based on Postal Codes

Now, let’s scrape the data for a particular postal code. For this example, we are going to target New York. 10001 is the postal code of New York.

				
					 import requests
  
  api_key = "Your-API-Key"
  url = "https://api.scrapingdog.com/amazon/product"
  
  params = {
      "api_key": api_key,
      "asin": "B0CTKXMQXK",
      "domain": "com",
      "postal_code": "10001",
      "country": "us"
  }
  
  response = requests.get(url, params=params)
  
  if response.status_code == 200:
      data = response.json()
      print(data)
  else:
      print(f"Request failed with status code: {response.status_code}")
				
			

Once you run this code you will get a beautiful JSON response based on the New York Location.

 

I have also created a video to guide you using Scrapingdog to scrape Amazon.

Scraping Amazon Offers Data Using Scrapingdog

This data will help you identify details about the seller, delivery options, pricing, etc.

				
					import requests

url = "https://api.scrapingdog.com/amazon/offers"

params = {
    "api_key": "your-api-key",
    "asin": "B0BVJT3HVN",
    "domain": "com",
    "country": "us"
}

try:
    response = requests.get(url, params=params)
    response.raise_for_status()  # Raise error for bad responses
    data = response.json()
    print("<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> API Response:")
    print(data)
except requests.exceptions.RequestException as e:
    print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/274c.png" alt="❌" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Request failed: {e}")
				
			

After running this code you will get this JSON response.

In addition to this, if you’re building a keyword research tool, validating product ideas, or running sentiment analysis, you can use Scrapingdog’s Amazon Autocomplete API for these use cases.

You just have to make a GET request to this endpoint https://api.scrapingdog.com/amazon/autocomplete and pass your target keyword. For example let’s say you are looking for a pen holder then you will pass a prefix by the name “pen holder”.

				
					import requests

# API URL and key
api_url = "https://api.scrapingdog.com/amazon/autocomplete"
api_key = "your-api-key"

# Search parameters
domain = "com"
prefix = "pen holder"

# Create a dictionary with the query parameters
params = {
    "api_key": api_key,
    "prefix": prefix
}

# Send the GET request with the specified parameters
response = requests.get(api_url, params=params)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f"HTTP Request Error: {response.status_code}")
				
			

This will generate a list of keywords associated with the prefix.

Conclusion

Over 80% of the e-commerce businesses today rely on web scraping. If you’re not using it, you’re already falling behind. 

There are many marketplaces that you can scrape & extract data from. Having a strategy to scrape e-commerce data for your product can take you far ahead of your competitors. 

In this tutorial, we scraped various data elements from Amazon. First, we used the requests library to download the raw HTML, and then using BS4 we parsed the data we wanted. You can also use lxml in place of BS4 to extract data. Python and its libraries make scraping very simple for even a beginner. Once you scale, you can switch to web scraping APIs to scrape millions of such pages.

Combination of requests and Scrapingdog can help you scale your scraper. You will get more than a 99% success rate while scraping Amazon with Scrapingdog.

If you want to track the price of a product on Amazon, we have a comprehensive tutorial on tracking Amazon product prices using Python

I hope you like this little tutorial. If you do, please don’t forget to share it with your friends and on your social media.

You can combine this data with business plan software to offer different solutions to your clients.

If you are a non-developer and wanted to scrape the data from Amazon, here is a good news for you.
We have recently launched a Google Sheet add-on Amazon Scraper. 

Here is the video 🎥 tutorial for this action.

Frequently Asked Questions

 

Amazon detects scraping by the anti-bot mechanism which can check your IP address and thus can block you if you continue to scrape it. However, using a proxy management system will help you to bypass this security measure.

]]>
https://www.scrapingdog.com/blog/scrape-amazon/feed/ 15
Zenrows vs Scrapingbee vs Scrapingdog: Which One To Choose & Why https://www.scrapingdog.com/blog/zenrows-vs-scrapingbee-vs-scrapingdog/ https://www.scrapingdog.com/blog/zenrows-vs-scrapingbee-vs-scrapingdog/#respond Mon, 17 Nov 2025 10:38:36 +0000 https://www.scrapingdog.com/?p=31451

In this post, we’ll walk through a detailed comparison of three popular web-scraping API providers: ZenRowsScrapingBee, and Scrapingdog. We’ll examine pricing, performance, success rates, and key features so you can decide which fits your needs.

We’ll be testing these APIs across multiple domains before sharing our final verdict. This report aims to help you identify the most suitable scraping service for your specific project needs

Criteria To Test These APIs

We are going to scrape a few domains like AmazoneBay, and Google. We will judge each scraper on the basis of these points.

  • Speed
  • Success Rate
  • Support
  • Scalability
  • Developer friendly

We are going to use this Python code to test all the APIs.

				
					import requests
import time
import random
import urllib.parse

# List of search terms
amazon_urls = ['https://www.amazon.de/dp/B0F13KXRG8','https://www.amazon.com.au/dp/B0D8V3N28Z','https://www.amazon.in/dp/B0FHB5V36G','https://www.amazon.com/dp/B0CDJ4LS6X','https://www.amazon.com.br/dp/B0FQHRR7L7/']

ebay_url=['https://www.ebay.it/usr/elzu51','https://www.ebay.com/sch/i.html?_nkw=watch','https://www.ebay.com/itm/324055713627','https://www.ebay.com.au/b/Smarthome/bn_21835561','https://www.ebay.com/p/25040975636']

serp_terms = ['burger','bat','beans','curd','meat']

# Replace with your actual API endpoint
# Make sure it includes {query} where the search term should be inserted
base_url = "https://app.example.com/scrape"


total_requests = 10
success_count = 0
total_time = 0
apiKey=your-api-key
for i in range(total_requests):
    try:
        search_term = random.choice(ebay_url)
        search_term = random.choice(serp_terms)

        

        params = {
    "api_key": apiKey,
    "results": 10,
    "query": search_term,
    "country": "us",
    "advance_search": "true",
    "domain": "google.com"
}

        # params={
        #     'api_key': apiKey,
        #     'search': search_term,
        #     'language': 'en'
        # }



        # url = base_url.format(query=search_term)

        start_time = time.time()
        response = requests.get(base_url,params=params)
        end_time = time.time()

        request_time = end_time - start_time
        total_time += request_time

        if response.status_code == 200:
            success_count += 1
        print(f"Request {i+1}: '{search_term}' took {request_time:.2f}s | Status: {response.status_code}")

    except Exception as e:
        print(f"Request {i+1} with '{search_term}' failed due to: {str(e)}")

# Final Stats
average_time = total_time / total_requests
success_rate = (success_count / total_requests) * 100

print(f"\n<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f50d.png" alt="🔍" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Total Requests: {total_requests}")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Successful: {success_count}")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/23f1.png" alt="⏱" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Average Time: {average_time:.2f} seconds")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f4ca.png" alt="📊" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Success Rate: {success_rate:.2f}%")
				
			

Let’s first test how Zenrows performs in this test across different platforms.

Zenrows

A platform built for developers to scrape public data at scale. They have been a popular option among the community. Let’s test how this API performs against all of the targets we choose to test.

Feature & Pricing of Zenrows

  • You get free credits worth $1 on signup.
  • The credit cost changes from website to website. But the starting pack will cost you around $70 per month which includes 250000 credits.
  • Documentation is clear and the API can be integrated very easily.
  • Customers can contact them via instant chat support or email.

Test Result with Amazon

Test Result with eBay

Test Results with Google Search

Summary of All Tests (Zenrows)

  • ZenRows achieved a 40% success rate on Amazon, with an average response time of 19.48 seconds.
  • While scraping eBay we got a success rate of 90% with average response time of 3.93 seconds.
  • Scraping Google with ZenRows resulted in a 90% success rate and an average response time of 18.81 seconds.

Scrapingbee

ScrapingBee is a web-scraping API service designed to simplify and streamline data extraction from modern websites.

Features & Pricing of Scrapingbee

  • They offer 1000 free credits on signup.
  • Their basic plan costs around $49 per month and includes 250,000 credits.
  • The documentation is clear, and the APIs can be seamlessly integrated into any development environment.
  • You can contact them via chat support or through email.

Test Results with Amazon

Test Results with eBay

Test Results with Google Search

Summary of All Tests (Scrapingbee)

  • Scrapingbee achieved a 100% success rate on Amazon, with an average response time of 5.82 seconds.
  • While scraping eBay, we got a success rate of 80% with an average response time of 3.85 seconds.
  • Scraping Google with Scrapingbee resulted in a 100% success rate and an average response time of 7.02 seconds.

Read More: How Scrapingdog is a Better Alternative To Scrapingbee

Scrapingdog: A Better Alternative to Zenrows & Scrapingbee

Scrapingdog is a web-scraping API platform that lets you extract data from websites without worrying about proxies, CAPTCHA, or browser automation.

scrapingdog homepage

Features & Pricing of Scrapingdog

  • Scrapingdog provides free 1000 credits on signup.
  • The entry-level plan costs around $40 per month and includes 200,000 credits.
  • The documentation is developer-friendly, making it easy to integrate the API into any project.
  • You can contact us via chat support or by email support.

Test Results with Amazon

 

Test Results with eBay

 

Test Results with Google Search

 

Summary of All Tests (Scrapingdog)

  • Scrapingdog achieved a 100% success rate on Amazon, with an average response time of 4.27 seconds.
  • While scraping eBay, we got a success rate of 100% with an average response time of 3.14 seconds.
  • Scraping Google with Scrapingdog resulted in a 100% success rate and an average response time of 3.49 seconds.

Success Rate Comparison (Zenrows vs Scrapingbee vs Scrapingdog)

Provider Amazon eBay Google
ZenRows 40% 90% 90%
ScrapingBee 100% 80% 100%
Scrapingdog 100% 100% 100%

When comparing success rates across all three APIs, ScrapingDog delivered flawless performance, achieving a 100% success rate on Amazon, eBay, and Google. 

ScrapingBee performed reliably overall, maintaining 100% on Amazon and Google but dropping slightly to 80% on eBay. 

ZenRows, on the other hand, struggled with Amazon, managing only 40% success, though it performed much better on eBay and Google with 90% success each.

Speed Comparison

Provider Amazon eBay Google
ZenRows 19.48 s 3.93 s 18.81 s
ScrapingBee 5.82 s 3.85 s 7.02 s
ScrapingDog 4.27 s 3.14 s 3.49 s

In terms of speed, ScrapingDog once again led the pack with the fastest average response times across all three platforms, staying under 4.5 seconds, even on Google, which is typically the most challenging site to scrape.

 

ScrapingBee demonstrated stable performance, averaging between 3.8 and 7 seconds, but lagged slightly behind on Google. 

ZenRows was considerably slower on Amazon and Google, taking nearly 19 seconds per request, though it performed well on eBay.

Conclusion

After testing all three web scraping APIs, ZenRowsScrapingBee, and ScrapingDog across AmazoneBay, and Google, here’s the takeaway:

  • ScrapingDog consistently came out on top, offering 100% success rates and the fastest response times across all platforms. It’s highly optimized for performance and reliability, making it the best choice for large-scale or production-grade scraping.
  • ScrapingBee delivered strong, stable results with solid success rates and good speed. It’s a balanced option if you prioritize simplicity and consistency.
  • ZenRows performed decently on eBay and Google but struggled significantly with Amazon, both in speed and success rate, suggesting its infrastructure isn’t yet fully tuned for heavy e-commerce scraping.

Additional Resources

]]>
https://www.scrapingdog.com/blog/zenrows-vs-scrapingbee-vs-scrapingdog/feed/ 0
10 Best Google SERP APIs in 2026 to Scale Data Extraction from Google Search https://www.scrapingdog.com/blog/best-serp-apis/ https://www.scrapingdog.com/blog/best-serp-apis/#comments Mon, 17 Nov 2025 00:09:28 +0000 https://scrapingdog.com/?p=12082

TL;DR

  • Benchmarks 10 SERP APIs on speed, price and scale.
  • Times: Scrapingdog 1.83 s; Serper 2.87 s; Bright Data 5.58 s; SearchAPI 2.96 s; ScraperAPI 33.6 s.
  • Verdict: Scrapingdog & Serper are fastest; ScraperAPI slowest.
  • Pricing: Scrapingdog is economical at scale (~$0.00029 / request); most offer free trials.

Search engines hold a massive amount of data, just to be specific in number there are around 8.5 billion searches per day, and Google alone caters it.

Scraping Google or any other search engine is worth considering if you need the data for SEO tools, lead generation, and price monitoring.

I’ve analyzed the best SERP APIs that deserve to be on this list. Each API has been tested on key factors like speed, scalability, and pricing. 

I’ve shared my results at the very end of this article.

Let’s get started!!

10 Best APIs for Scraping Google in 2026

We will be judging these APIs based on 5 attributes.

  • Scalability means how many pages you can scrape in a day.
  • Pricing of the API. What is the cost of one API call?
  • Speed means how fast an API can respond with results.
  • Developer-friendly refers to the ease with which a software engineer can use the service.
  • Stability refers to how much load a service can handle or for how long the service is in the market.
				
					import requests
import time
import random

# List of random words to use in the search query

search_terms_google = [
    "pizza", "burger", "sushi", "coffee", "tacos", "salad", "pasta", "steak",
    "sandwich", "noodles", "bbq", "dumplings", "shawarma", "falafel",
    "pancakes", "waffles", "curry", "soup", "kebab", "ramen"
  ];


# base_url = Your-API-URL

total_requests = 50
success_count = 0
total_time = 0

for i in range(total_requests):
    try:
        # Pick a random search term from the list
        search_term = random.choice(search_terms)
        url = base_url.format(query=search_term)

        start_time = time.time()  # Record the start time
        response = requests.get(url)
        end_time = time.time()  # Record the end time

        # Calculate the time taken for this request
        request_time = end_time - start_time
        total_time += request_time

        # Check if the request was successful (status code 200)
        if response.status_code == 200:
            success_count += 1
        print(f"Request {i+1} with search term '{search_term}' took {request_time:.2f} seconds, Status: {response.status_code}")

    except Exception as e:
        print(f"Request {i+1} with search term '{search_term}' failed due to {str(e)}")

# Calculate the average time taken per request
average_time = total_time / total_requests
success_rate = (success_count / total_requests) * 100

# Print the results
print(f"\nTotal Requests: {total_requests}")
print(f"Successful Requests: {success_count}")
print(f"Average Time per Request: {average_time:.2f} seconds")
print(f"Success Rate: {success_rate:.2f}%")
				
			

We will test the APIs with the above Python code.

Scrapingdog’s Google SERP API

Scrapingdog’s Google Search API provides raw and parsed data from Google search results.

Now, we might be biased for including our API on top (and yes, it’s what I get paid for — JK, I’m the CTO!). But honestly, all the APIs are tested, I have the results in the screenshots all through this article.

Scrapinggod google scraper API

Details

  • With this API you get more than a billion API requests every month which makes this API a healthy choice.
  • Per API call cost for scraping Google starts from $0.003 and goes below $0.00125 for higher volumes.
  • For testing the speed of the API we are going to test the API on POSTMAN.
test screen

It took around 1.83 seconds to complete the request.

  • Has documentation in multiple languages. From curl to Java, you will find a code snippet in almost every language.
  • Scrapingdog has been in the market for more than 5 years now and you can see how customers have reviewed so far Scrapingdog on Trustpilot. The API is stable.
  • You can even test the API for free, we provide 1000 free credits to spin it.

Here’s a quick video tutorial on how you can use Scrapingdog’s Google Search Scraper API.

Recently, we have introduced a new endpoint for scraping all major search engines via one call. We are calling it Universal Search API. If you are someone looking to get data from these engines, this API would fit in, wherein you get filtered results, so you don't have to omit repetitive results.

Further, using this API instead of calling each out would be a much more economical way.

Data For SEO

Data for SEO provides the data required for creating any SEO tool. They have APIs for backlinks, keywords, search results, etc.

Details

  • Documentation is too noisy, which makes integration of the API time-consuming.
  • The pricing is not clear. Their pricing changes based on the speed you want. But the high-speed pack will cost $0.002 per search. The minimum investment is $2k per month.
  • They have been into scraping for so long and hence they have optimized it for scalability and stability.
  • Cannot comment on the speed as we were unable to test the API because of the very confusing documentation.

Apify

Apify is a web scraping and automation platform that provides tools and infrastructure to simplify data extraction, web automation, and data processing tasks. It allows developers to easily build and run web scrapers, crawlers, and other automation workflows without having to worry about infrastructure management.

Apify

Details

  • The documentation is pretty clear and makes integration simple.
  • The average response time was around 8.2 seconds.
apify results
  • Pricing starts from $0.003 per search and goes below $0.0019 per search in their Business packs.
  • They have been in this industry for a very long time, which indicates they are reliable and scalable.

SearchAPI

SearchAPI is another popular option among developers to scrape Google search results at scale.

This product has been there for a while now and its worth mentioning it in the list for the same reason it performed well in our test.  

Details

  • When you sign up, you get 100 free credits to test the API.
  • Documentation is clear, and the API can be easily integrated into any environment.
  • Pricing per page starts from $0.004 and drops below $0.002.

Testing

 

  • We got 100% success rate with an average response time of 2.96 seconds.

Bright Data

Bright Data as we all know is a huge company focused on data collection. They provide proxies, data scrapers, etc.

Brightdata Google Search API

Details

  • Their documentation is quite clear and testing is super simple.
  • We tested their API, and the average response time was close to 5.58 seconds, which is good.
  • Per API call cost starts from $0.005. The success rate is pretty great, which makes this API scalable and stable. The service is top-notch and again, any product you use is good.
  • The only downside with Brightdata is that it’s a bit more expensive compared to other providers.

Hasdata

Hasdata is another great option if you are looking for a search engine API. Their dashboard makes your onboarding training pretty simple.

Hasdata google search api

Details

  • Documentation is pretty simple and easy to understand.
  • Per API call response time is around 3.80 seconds.
  • In my testing, I observed that APIs slow down if you hit the same API multiple times, which the API would not perform well when chosen for scalability.
  • Per API call price starts from $0.003 and goes around $0.0004 with higher volumes.

Serper

Serper provides a dedicated solution for scraping all the Google products.

Serper google search api

Details

  • The documentation is clear, and the API can be integrated with ease.
  • It’s a new service, and the people behind it don’t seem to have much public presence.
  • Pricing per scrape starts from $0.001 and drops below $0.00075 with high volume.
  • If you need more than 10 results per query in its SERP API, then you will be charged 2 credits, so the pricing automatically doubles.
  • You can only contact them through email.

Testing

Serper testing
  • So, the API took around 2.87 seconds to scrape a single Google page.

SerpAPI

SerpAPI is the fastest Google search scraper API with the highest variety of Google-related APIs.

SerpAPI Google Search API

Details

  • The documentation is very clear and concise. You can quickly start scraping and Google service within minutes.
  • The average response time was around 5.49 seconds. API is fast and reliable. This API can be used for any commercial purpose which requires a high volume of scraping.
Serp api testing
  • Pricing starts at $0.01 per request and it goes down to $0.0083!
  • SerpAPI has been in this industry since 2016 and they have immense experience in this market. If you have a high-volume project then you can consider them.

Decodo(Smartproxy)

Decodo is another Google search API provider in this list.

Decodo google search api

Details

  • Documentation is simple. Integration with them is super simple.
  • They have a great proxy infrastructure, which ultimately assures a seamless data pipeline.
  • Pricing for Google scraping starts from $0.00125 and drops below $0.00095 with high volume.
  • You can contact them via chat or email.
  • It was not possible to test their API in our environment, so we tested it on the dashboard itself. So, their API took around 4 to 5 seconds to scrape a single Google Page.

ScraperAPI

ScraperAPI was initially launched as a free web scraping API but now it also offers multiple dedicated APIs around Google and its other services like SERP, News, Jobs, Shopping, etc.

ScraperAPI

Details

  • Documentation is very clear and has code snippets for all major languages like Java, NodeJS, etc. This makes testing this API super easy.
  • The average response time was around 33.6 seconds, and it might go up for high concurrency.

  • Pricing starts from 0.00196$ per search and goes up to $0.0024 for bigger packs.
  • They have been in the market for a long time but the SERP API doesn’t meet the expectations.

Overall Results

Provider Response Time (s) Pricing ($ per request)
Scrapingdog 1.83 0.001 → 0.00029
Serper 2.87 0.001 → 0.00075
SearchAPI 2.96 0.004 → 0.002
Hasdata 3.8 0.00245 → 0.00083
Decodo 4.5 0.00125 → 0.00095
Brightdata 5.58 0.0011
SerpAPI 5.49 0.015 → 0.0075
Apify 8.0 0.003 → 0.0019
ScraperAPI 33.6 0.00196 → 0.0024
Dataforseo N/A 0.002

At first glance, many of the APIs we’ve discussed may appear quite similar. But once you dig deeper and start testing, you’ll notice that only a few (specifically two or three) are truly stable and suitable for production use.

serp api response time comparison bar graph

🚀 Conclusion: Serper, Scrapingdog & SearchAPI are the fastest, while ScraperAPI is the slowest among the tested services

The report above is based on a thorough analysis of each API, focusing on factors like speed, scalability, and pricing.

Almost all the APIs mentioned here offer free trials, so you can test them yourself firsthand and see which one fits your needs best.

Price Comparison

Scrapingdog offers the lowest effective pricing, dropping to $0.00029 per request at scale, far cheaper than competitors like SerpAPI ($0.015) or Apify ($0.003). Most other providers range between $0.0008 and $0.002 per request.

Why You Should Choose A SERP API Instead of Building Your Own Scraper

While you could build your scraper to extract Google search results, maintaining them over time can be quite challenging.

Search engines, including Google, often block scrapers after approximately 100 requests, making it difficult to scale without hitting roadblocks.

You’d need to constantly update your scraper to bypass these restrictions, which can be time-consuming and inefficient.

For production purposes, using an API is a much better option.

Here’s why:

  1. Anonymity: With these APIs, you stay anonymous. Every request is made using a different IP address, so your IP is always hidden, preventing any blocks or restrictions from Google.

  2. Cost-Effective: These APIs are far more affordable than Google’s official API. You can scrape search results at a fraction of the cost.

  3. Parsed Data Options: Whether you need parsed JSON data for easy integration or raw HTML data for flexibility, these APIs offer both.

  4. Customization: Many API vendors offer customization options to tailor the API exactly to your needs, making it easier to extract the exact data you want.

  5. Reliability for Production: Unlike self-built scrapers that might get blocked or require constant maintenance, these APIs are designed to be stable, scalable, and perfect for production use.

  6. 24X7 Support: Round-the-clock support to help you solve any issues or queries, ensuring smooth operations.

What Data Other Then Google Search You Can Scrape From Google Products?

Search engine scraping is one of the most common ways to collect valuable data.

But search results aren’t the only data you can access. Other valuable sources can be scraped for more data. To name a few:

Scraping Google AI mode to keep track of your brand visibility, if SEO is one of the channels through which your brand acquires customers.

Scraping Google Maps opens up valuable opportunities to gather business details, reviews, and location data. This information is useful for local SEO, lead generation, and market analysis.

On the other hand, you can scrape Google News to do content analysis or monitor news coverage.

You can also collect data from other Google products, such as Google Scholar and Google Images.

I’ll continue to add more details and use cases for scraping these products as I write articles on them using Python. 

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/best-serp-apis/feed/ 1
5 Best Indeed Scrapers To Test Out in 2025 https://www.scrapingdog.com/blog/best-indeed-scrapers/ https://www.scrapingdog.com/blog/best-indeed-scrapers/#respond Thu, 06 Nov 2025 12:05:06 +0000 https://www.scrapingdog.com/?p=31132

TL;DR

  • Compares 5 Indeed scrapers — ScraperAPI, Scrapingdog, ZenRows, Bright Data, ScrapingBee using each product’s general scraper.
  • Criteria: speed, success rate, support, scalability, dev-friendliness; simple test harness shown.
  • Scrapingdog is featured with a free 1k-credit trial to test reliability on Indeed.
  • Bottom line: choose based on your success-vs-cost needs at target scale.

If you’re planning to scrape job-listing sites like Indeed (or similar platforms) at scale, choosing the right web scraping API can make a big difference. You’ll typically need:

  • Reliable JavaScript rendering (many job portals use dynamic loading)
  • Anti-bot & CAPTCHA handling
  • Proxy rotation / geo-flexibility
  • Predictable costs and data structure output

In this article we compare five major scraping APIs: ScraperAPI, Scrapingdog, ZenRows, Brightdata, and ScrapingBee. The goal is to help you decide which is best for scraping Indeed.com with high reliability and minimal fuss.

Criteria

We are going to compare 3 APIs from each product and then compare them on the basis of:

  • Speed
  • Success rate
  • Support
  • Scalability
  • Developer friendly

We will use general web scraper of each product to scrape Indeed.

We are going to use this nodejs code to test different products.

				
					import requests
import time
import random
import urllib.parse

# List of search terms
indeed_urls = ['https://www.indeed.com/jobs?q=Software+Engineer&l=New%20York',"https://www.indeed.com/jobs?q=python&l=New+York%2C+NY","https://il.indeed.com/jobs?q=&l=israel&fromage=1&vjk=3e2c3c5a7577fa90","https://www.indeed.com/jobs?q=python&l=New+York%2C+NY","https://www.indeed.com/jobs?q=Assistant+Restaurant+Manager&start=0&l=Chicago%2C+IL"]



# Replace with your actual API endpoint
# Make sure it includes {query} where the search term should be inserted
base_url = "https://api.example.com/"

total_requests = 10
success_count = 0
total_time = 0

for i in range(total_requests):
    try:
        search_term = random.choice(indeed_urls)

        
  
        params={ 'api_key': 'e021d6abdf4575687890e10deb3189c8', 
         'url': search_term}



        # url = base_url.format(query=search_term)

        start_time = time.time()
        response = requests.get(base_url,params=params)
        end_time = time.time()

        request_time = end_time - start_time
        total_time += request_time

        if response.status_code == 200:
            success_count += 1
        print(f"Request {i+1}: '{search_term}' took {request_time:.2f}s | Status: {response.status_code}")

    except Exception as e:
        print(f"Request {i+1} with '{search_term}' failed due to: {str(e)}")

# Final Stats
average_time = total_time / total_requests
success_rate = (success_count / total_requests) * 100

print(f"\n<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f50d.png" alt="🔍" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Total Requests: {total_requests}")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Successful: {success_count}")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/23f1.png" alt="⏱" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Average Time: {average_time:.2f} seconds")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f4ca.png" alt="📊" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Success Rate: {success_rate:.2f}%")
				
			

Scrapingdog

Scrapingdog provides powerful web scrapers to scrape websites with CAPTCHA and bot protection.

  • You get 1000 free credits when you signup for the free pack.
  • To scrape Indeed, you’ll need to enable Stealth Mode. Pricing begins at roughly $0.002 per request and can go as low as $0.000583 on larger plans.
  • Documentation is pretty clear and API can be integrated easily with any system.
  • Support is available 24*7through chat and email.

Testing Indeed

Summary

  • Scrapingdog scraped indeed with 100% success rate with an average response time of 14.47 seconds.

Scraperapi​

Scraperapi provides web scraping api to scrape any website. The scraper will respond with the html data of the target website.

  • On new sign up you get free 5000 credits to test the API.
  • Each successful response will cost you around $0.0049 but the pricing drops to $0.00095 on their biggest pack.
  • The documentation is very clear and you can easily integrate their APIs in your system.
  • Customer support is only available through email. No instant chat support is available.

Testing Indeed

Summary

  • Scraperapi scraped indeed at an average response time of 50.43 seconds with 50% success rate.

Zenrows

Zenrows is another web scraping API in the market which offers general scraper for scraping websites.

  • On signup you get 1000 credits which remains active until the next 14 days.
  • Every request to indeed.com will cost you $0.025 and it goes down with bigger packs.
  • Dashboard is a little confusing to operate but documentation is clear and the API can be integrated easily.
  • Instant customer support through chat and email is available.

Testing Indeed

Summary

  • We got 100% success rate with Zenrows with 22.23 seconds as average response time.

Brightdata

This is one of the pioneer company in the scraping industry. They provide powerful scrapers and proxies to scrape websites.

Brightdata dashboard

  • You have to go through their KYC process in order to test the APIs and proxies.
  • You have to use their web unblocker to scrape indeed at scale.
  • Pricing starts from $0.0015 and drops to $0.001.
  • You can easily integrate their proxies in your system.
  • Support is available 24*7 and literally waiting for your query.

Testing Indeed

Summary

  • We got 100% success rate with Brightdata with an average response time of 6.36 seconds.

Read More: 5 Economical Brightdata Alternatives You Can Try

Scrapingbee

Scrapingbee also provides a general scraper to scrape websites at scale. Using their extract rule feature you can extract parsed JSON data from raw html data.

  • On signup you get free 1000 credits to test the API.
  • You’ll need to use their Stealth Proxy mode to scrape Indeed. The pricing starts at $0.0147 per request and drops to $0.00562 on their largest available plan.
  • APIs can be easily integrated in any working environment.
  • Support is available 24*7 through chat and email.

Testing Indeed

Summary

  • We got 98% success rate with an average response time of 15.88 seconds.

Price Comparison

Provider Starting Price / Request Lowest Price (High Volume) Approx. Cost per 1K Requests
Scrapingdog $0.002 $0.000583 ~$0.58 – $2.00
ScraperAPI $0.0049 $0.00095 ~$0.95 – $4.90
ZenRows $0.025 $0.022 ~$22 – $25
Bright Data $0.0015 $0.001 ~$1.00 – $1.50
ScrapingBee $0.0147 $0.00562 ~$5.62 – $14.70

When it comes to pricing, Scrapingdog clearly leads the pack, offering one of the lowest per-request costs in the industry, especially at scale.
While Bright Data remains competitive on volume, most other providers like ScraperAPIZenRows, and ScrapingBee are considerably more expensive for large-scale scraping operations.

If your use case involves frequent or high-volume scraping (like tracking Indeed job listings), Scrapingdog delivers the best balance between cost efficiency and scalability.

Speed Comparison

Provider Success Rate Average Response Time (seconds)
Scrapingdog 100% 14.47 s
ScraperAPI 50% 50.43 s
ZenRows 100% 22.23 s
Bright Data 100% 6.36 s
ScrapingBee 98% 15.88 s

When it comes to speedBright Data tops the chart with an impressive 6.36-second average response time, followed by Scrapingdog at 14.47 seconds, maintaining strong performance even under consistent 100% success.

In terms of reliabilityScrapingdogZenRows, and Bright Data all achieved perfect 100% success rates, while ScrapingBee performed well at 98%, and ScraperAPI lagged behind with only 50% reliability.

Final Verdict

All five providers delivered usable results, but their performance varied across speed and consistency. Bright Data was the fastest in response time, while ScrapingdogZenRows, and Bright Data maintained perfect success rates. ScrapingBee also performed reliably with only a slight dip in success, and ScraperAPI showed room for improvement in stability.

Ultimately, the best choice depends on your specific needs. Whether that’s speedscalability, or cost efficiency. Each provider has its strengths, and the right fit comes down to balancing performance with your project’s priorities.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/best-indeed-scrapers/feed/ 0
How to Take Screenshot with Puppeteer (Step-by-Step Guide) https://www.scrapingdog.com/blog/how-to-take-screenshot-with-puppeteer/ https://www.scrapingdog.com/blog/how-to-take-screenshot-with-puppeteer/#respond Fri, 24 Oct 2025 09:55:25 +0000 https://www.scrapingdog.com/?p=30922

TL;DR

  • Quick setup: install puppeteer, launch headless Chrome, take a basic screenshot.
  • Full-page capture: pass { fullPage: true }; save to file.
  • Stability: await page.waitForSelector(...) before shooting to ensure the UI is ready.
  • For scale / rotation and hands-off rendering, use Scrapingdog’s Screenshot API instead of running your own browsers.

Capturing screenshots with Puppeteer is one of the easiest and most useful ways to automate browser tasks. Whether you’re testing UI changes, generating website previews, or scraping visual data, Puppeteer gives developers precise control over how to capture a page.

In this guide, we’ll walk through everything you need to know about taking screenshots using Puppeteer, from simple single-page captures to full-page.

What is Puppeteer?

Puppeteer is a Node.js library developed by Google that provides a high-level API to control Chrome or Chromium through the DevTools Protocol. It’s widely used for:

  • Web scraping and automation
  • End-to-end testing
  • PDF generation
  • Visual regression testing
  • Screenshot capture

When you install Puppeteer, it automatically downloads a compatible version of Chromium, so you can get started right away.

Prerequisites

Create a folder by any name your like. I am naming the folder as screenshot.

				
					mkdir screenshot

				
			

Now, inside this folder install puppeteer with this command.

				
					npm init -y
npm install puppeteer
				
			

Now, create a js file where you will write your code. I am naming the file as puppy.js. That’s all, our environment is ready.

Taking Our First Screenshot with Puppeteer

				
					let puppeteer = require('puppeteer');

(async () => {
  let browser = await puppeteer.launch();
  let page = await browser.newPage();
  await page.goto('https://www.scrapingdog.com');
  await page.screenshot({ path: 'screenshot.png' });
  await browser.close();
})();
				
			

The code is pretty simple but let me explain it step by step

  • Import Puppeteer — Loads the Puppeteer library to control a headless Chrome browser.
  • Start an async function — Allows the use of await for smoother asynchronous execution.
  • Launch the browser — Opens a new headless (invisible) Chrome instance.
  • Create a new page — Opens a fresh browser tab for interaction.
  • Go to the target URL — Navigates the page to https://www.scrapingdog.com.
  • Capture a screenshot — Takes the screenshot and saves it locally as screenshot.png.
  • Close the browser — Ends the session and frees up system resources.

Once you execute the code you will find the screenshot inside your folder screenshot.

How to Capture a Full-Page Screenshot

				
					let puppeteer = require('puppeteer');

(async () => {
  let browser = await puppeteer.launch();
  let page = await browser.newPage();
  await page.goto('https://www.scrapingdog.com');
  await page.screenshot({ path: 'screenshot.png' , fullPage: true});
  await browser.close();
})();
				
			

This ensures Puppeteer scrolls through the page and stitches everything into a single image.

If you don't want to use Puppeteer or any other toolkit for that matter to scale your screenshot generation, you can use Screenshot API. We manage proxies, headless browsers and other corner cases to maintain blockage free screenshots of any number of URLs.

Wait for Elements Before Taking Screenshot

Let’s take a screenshot of google home page once the search box appears.

				
					let puppeteer = require('puppeteer');

(async () => {
  // 1. Launch a browser
  let browser = await puppeteer.launch({ headless: true});

  // 2. Open a new page
  let page = await browser.newPage();

  // 3. Navigate to the website
  await page.goto('https://www.google.com', { waitUntil: 'domcontentloaded' });

  // 4. Wait for a specific element (Google search box)
  await page.waitForSelector('textarea[name="q"]');

  // 5. Take the screenshot
  await page.screenshot({
    path: 'google.png',
    fullPage: true
  });

  console.log("<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Screenshot taken after search box loaded!");

  // 6. Close the browser
  await browser.close();
})();
				
			

The code is almost similar we have just used waitForSelector to pause execution until a particular element appears in the DOM.

Conclusion

Puppeteer makes taking screenshots in Node.js fast, flexible, and reliable — whether you’re capturing a simple webpage, an entire site, or specific UI components.

With just a few lines of code, you can automate screenshot generation for monitoring, reporting, or testing.

If you’re already using automation tools or APIs, Puppeteer integrates perfectly into your workflow for capturing website visuals at scale.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/how-to-take-screenshot-with-puppeteer/feed/ 0
Serpapi vs Searchapi vs Scrapingdog: Which One Is Best For You & Why https://www.scrapingdog.com/blog/serpapi-vs-searchapi-vs-scrapingdog/ https://www.scrapingdog.com/blog/serpapi-vs-searchapi-vs-scrapingdog/#respond Thu, 09 Oct 2025 09:08:08 +0000 https://www.scrapingdog.com/?p=30522

TL;DR

  • Benchmarked SerpAPI, SearchAPI, and Scrapingdog on Google Search & Shopping with the same Node test harness.
  • Success rate: all hit 100%.
  • Avg response time (Search / Shopping): Scrapingdog 3.32s / 3.22s; SerpAPI 3.34s / 7.20s; SearchAPI 4.55s / 3.67s.
  • Pricing: Scrapingdog cheapest ($1 → $0.054 per 1k), SearchAPI mid, SerpAPI highest.
  • Bottom line: Scrapingdog leads on speed + price; pick based on scale and support needs.

Choosing the right SERP API can make or break your data projects. With so many providers promising speed, accuracy, and reliability, developers often get stuck comparing feature lists instead of focusing on what really matters.

In this article, we’ll look at SearchApiSerpApi, and Scrapingdog, three well-known names in the SERP API space. Each offers similar core functionality (getting structured search results from Google and other engines), but they differ in pricing, scale, and developer-friendliness.

If you’re evaluating which service to use for your project, whether it’s SEO monitoring, price intelligence, or building data-driven applications, this breakdown will give you a side-by-side comparison of where each API stands and help you decide which one fits your use case best.

Criteria

We’ll compare two APIs from each product and evaluate them based on the following points.

  • Speed
  • Success rate
  • Support
  • Scalability
  • Developer friendly

We will use these APIs from each product.

We will be using this code for testing each API. Every test result can be verified by using this code at your end.

				
					const axios = require("axios");

const google_serp_terms = ["shoes", "burger", "corona", "cricket", "tennis"];
const google_shopping_terms = ["jeans", "shoes", "shirts", "socks", "pants"];
const base_url = "https://api.example.com/google_shopping";
const total_requests = 10;
let success_count = 0;
let total_time = 0;
function getRandom(arr) {
  return arr[Math.floor(Math.random() * arr.length)];
}
const run = async () => {
  for (let i = 0; i < total_requests; i++) {
    let search_term;
    try {
      
    //   search_term = getRandom(google_serp_terms);
      search_term = getRandom(google_shopping_terms);
      const params = {
    api_key: 'your-api-key',
    query: search_term,
    language: 'en',
    country: 'us'
  }

      const start_time = Date.now();
      const response = await axios.get(base_url, { params });
      const end_time = Date.now();
      const request_time = (end_time - start_time) / 1000;
      total_time += request_time;
      if (response.status === 200) success_count++;
      console.log(
        `Request ${i + 1}: '${search_term}' took ${request_time.toFixed(
          2
        )}s | Status: ${response.status}`
      );
    } catch (e) {
      console.log(`Request ${i + 1} with '${search_term}' failed due to: ${e.message}`);
    }
  }
  const average_time = total_time / total_requests;
  const success_rate = (success_count / total_requests) * 100;
  console.log(`\n<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f50d.png" alt="🔍" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Total Requests: ${total_requests}`);
  console.log(`<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Successful: ${success_count}`);
  console.log(`<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/23f1.png" alt="⏱" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Average Time: ${average_time.toFixed(2)} seconds`);
  console.log(`<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f4ca.png" alt="📊" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Success Rate: ${success_rate.toFixed(2)}%`);
};
run();
				
			

SerpAPI

Serpapi is the oldest player in this industry and provides a robust search API around Google.

Details

  • On creating a new account, you get 250 free credits.
  • Per scrape cost starts from $0.015 and goes below $0.0075 with a higher volume.
  • Documentation is very clean and can be integrated easily in any production environment.
  • Support is available 24*7 through chats and emails.
  • They recently launched an API that allowed extracting 100 results at once, bypassing Google’s removal of the num=100 parameter. However, Google quickly blocked this API.

Testing Google Search API

Here we will make 10 requests to the Google search endpoint and see what success rate we can achieve.

Testing Google Shopping API

Here we will make 10 calls to the shopping API.

Test Summary

  • The Google Search API from SerpApi took an average of 3.34 seconds to complete each request.
  • The Shopping API took an average of 7.20 seconds to complete a single request.

SearchAPI

SearchAPI is another product that provides SERP APIs for seamless scraping of Google products.

Details

  • You get 100 free credits once you sign up.
  • Per scrape cost starts from $0.004 and drops below $0.001.
  • Docs are very clear and concise.
  • Support is available through chats and emails.
  • They also provide multiple other APIs around Bing and other search engines.

Testing Google Search API

Testing Google Shopping API

Test Summary

  • The Google Search API from SearchAPI took an average of 4.55 seconds to complete each request.
  • The Shopping API took an average of 3.67 seconds to complete a single request.

Scrapingdog

Scrapingdog offers high-performance Google APIs that let you scrape search results and other Google products with speed and reliability.

Details

  • You get 1000 free credits for testing the API. You can test any APi directly from the dashboard.
  • Per scrape cost starts from $0.001 and drops below $0.000054 on top plans.
  • The documentation is clear and easy to follow, and the API can be seamlessly integrated into any environment.
  • Scrapingdog provides support through chat and emails. Agents are available online 24*7 to solve any issue.
  • They recently launched a Universal Search API that collects data from all major search engines in a single API call. This allows users to gather data faster and at a lower cost.

Testing Google Search API

Testing Google Shopping API

Test Summary

  • The Google Search API from Scrapingdog took an average of 3.32 seconds to complete each request.
  • The Shopping API took an average of 3.22 seconds to complete a single request.

Speed Comparison (SerpAPI vs SearchAPI vs Scrapingdog)

Provider Google Search API (Avg. Time) Google Shopping API (Avg. Time) Success Rate
SerpApi 3.34 s 7.20 s 100%
SearchAPI 4.55 s 3.67 s 100%
Scrapingdog 3.32 s 3.22 s 100%

All three providers, SerpApiSearchAPI, and Scrapingdog delivered a 100% success rate for both Google Search and Shopping APIs, showing solid reliability across the board.

When it comes to speedScrapingdog outperformed the others with the lowest average response times 3.32s for Search and 3.22s for Shopping.

SearchAPI performed moderately well, while SerpApi showed noticeably slower results, especially in the Shopping API tests where it took 7.20s on average.

Overall, Scrapingdog stood out as the most efficient option, maintaining consistency across both APIs while delivering faster responses. 🚀

Price Comparison

Provider Starting Cost / Call Lowest Cost / Call (High Volume) Approx. Cost per 1K Calls
SerpApi $0.015 $0.0075 ~$7.50 – $15.00
SearchAPI $0.004 $0.001 ~$1.00 – $4.00
Scrapingdog $0.001 $0.000054 ~$0.05 – $1.00

All three APIs offer volume-based pricing, but the gap between them is significant. SerpApi remains the most expensive, with costs ranging from $7.50 to $15.00 per 1,000 callsSearchAPI sits in the mid-range, priced between $1.00 and $4.00 per 1,000 calls.

In contrast, Scrapingdog delivers the most affordable solution, costing only $1.00 per 1,000 calls at entry-level and dropping to just $0.054 per 1,000 calls at high volume.

Overall, Scrapingdog provides the best value for developers and enterprises seeking scalable data extraction without high recurring costs.

Final Verdict

  • Fastest Performance: Scrapingdog delivered the quickest response times for both Search and Shopping APIs.
  • Best Pricing: It’s also the most cost-efficient, up to 10–150× cheaper than competitors at scale.
  • Reliability: All three APIs achieved a 100% success rate during testing.
  • Overall Winner: Scrapingdog stands out as the best-balanced choice for speed, stability, and affordability.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/serpapi-vs-searchapi-vs-scrapingdog/feed/ 0
6 Best Programming Languages for Web Scraping in 2025 https://www.scrapingdog.com/blog/best-language-for-web-scraping/ https://www.scrapingdog.com/blog/best-language-for-web-scraping/#comments Sat, 20 Sep 2025 09:46:05 +0000 https://scrapingdog.com/?p=11023

TL;DR

  • Pick the language you know; weigh flexibility, crawling ability, ease, scalability, and maintainability.
  • Python: most popular and beginner-friendly; strong libs (BS4/Selenium/Scrapy) but can be slower.
  • Ruby & JS/Node: Nokogiri/HTTParty and Puppeteer/Cheerio; Node suits streaming / distributed but can be less stable.
  • PHP & C++: PHP works with cURL but weak at scale; C++ is fast / parallel yet costly; tools like Scrapingdog are an alternative.

In 2025, the best programming language for web scraping will be the one that is best suited to the task at hand. Many languages can be used for web scraping, but the best one for a particular project will depend on the project’s goals and the programmer’s skills.

Python is a good choice for web scraping because it is a versatile language used for many tasks. It is also relatively easy to learn, so it is a good choice for those who are new to web scraping.

C++ will allow you to build a unique setup of web scraping, as it offers an excellent execution solution for this task. 

PHP is another popular language for web scraping. It is not as powerful as Java, but it is easier to learn and use. It is also a good choice for those who want to scrape websites built with PHP.

Other alternative languages can be used for web scraping, but these are the most popular choices. Let’s dive in and explore the best language to scrape websites with a thorough comparison of their strengths and limitations.

Which Programming Language To Choose & Why?

It’s important that a developer selects the best programming language that will help them scrape certain data that they want to scrape. These days, programming languages are quite robust when it comes to supporting different use cases, such as web scraping.

When a developer wants to build a web scraper, the best programming language they can go for is the one they are most comfortable with and familiar with. Web data can come in highly complex formats very often, and since the structure of web pages keeps changing frequently, software developers need to adjust their code accordingly.

When selecting the programming language, the first and main criterion should be proper familiarity with it. Web scraping is supported in almost any programming language, so the one a developer is most familiar with should be chosen.

For instance, if you know PHP. start with PHP only and later take it from there. It will make sure that you already have built-in resources for that language, as well as prior experience and knowledge about how it functions. It will also help you do web scraping faster. 

The second consideration should be the availability of online resources for a particular programming language when it comes to solving bugs or finding standby coding solutions for different problems. 

Apart from these, there are a few other parameters that you should consider when selecting any programming language for web scraping. Let’s have a look at those parameters. 

Parameters to Select the Best Programming Language

Flexibility

The more flexible a programming language is, the better it will be for a developer to use it for web scraping. Before choosing a language, make sure that it’s flexible enough for your desired endeavors.

Operational ability to feed database

It’s also a highly important thing to look for when choosing a programming language.

Crawling effectiveness

The language you choose must have the ability to crawl through web pages effectively.

Ease of coding

It’s really important that you can code easily using the language you choose.

Scalability

Scalability is a technology stack. It determines the programming languages rather than the language itself. Some popular and battle-tested stacks that have proven to be capable of such scalability are Ruby on Rails (RoR), MEAN, .NET, Java Spring, and LAMP.

Maintainability

The cost of maintenance will depend on the maintainability of your technology stack, and what programming language you choose for web scraping. Based on your target and budget, you must choose a language that has maintainability that you can afford. 

Top 6 Programming Languages for Effective & Seamless Web Scraping

Python

When it comes to web scraping, the Python programming language is still the most popular choice. This language is a complete product, as it can handle almost all the processes that are related to data extraction smoothly. It’s very easy to understand for beginner coders, and it’s also easy to use for web scraping. You will be able to get up to speed on web scraping with Python if you are new to this. 

Core Features

  • Easy to understand
  • Follows Javascript in terms of availability of online community and resources
  • Comes with highly useful libraries
  • Pythonic Idioms works great for searching, navigating, and modifying
  • Advanced web scraping libraries that come in really handy while scraping web pages

Read More: Tutorial on Web Scraping using Python

Built-In Libraries/Advantages

Selenium – It’s a highly advanced library of Python that helps a lot with data extraction and web scraping.

BeautifulSoup – It’s a Python library designed for really efficient and fast data extraction. 

Scrapy – Scrapy is a popular web crawler and web scraping, which helps a lot with its twisted library and a set of amazing tools for debugging. Since Python provides an effective Scrapy, it is highly effective and popular for web scraping.

Limitations

  • Too many options for data visualization can be confusing
  • Can be slow due to being too dynamic and line-by-line execution of codes
  • Weaker database access protocols

Ruby

Ruby is an open-source programming language. Its user-friendly syntax is easy to understand, and you will be able to practice and apply this language without any hassle. This language consists of multiple languages like Smalltalk, Perl, Ada, Eiffel, etc. Ruby is highly aware of the need for functional programming to be balanced with the help of imperative programming. 

Core Features

  • HTTParty, Pry, and NokoGiri enable the setting up of your web scraper without hassles.
  • NokoGiri is a specific Rubygem, which offers XML, HTML, SAX, and Reader parsers with CSS and XPath selector support.
  • HTTParty helps send the HTTP requests to the pages from where a developer wants to extract data. It furnishes all the HTML of the page as a string.
  • Debugging a program is enabled by Pry
  • No code repetition 
  • Simple syntax
  • Convention over configuration

Ruby (programming language): What is a gem?

A Ruby Gem is a library that’s built by the Ruby Community. It can also be referred to as a package of codes, which are configured in a way that it complies with the software development in the Ruby style. These gems contain classes and modules that can be used in your applications. You can also use them in your code by installing them through RubyGems first.

RubyGems is a manager of packages for the Ruby language, and it provides a standard format for distributing programs and libraries. 

Ruby Scraping (How To Do It And Why It’s Useful)

Ruby is popular for creating web scraping tools and for building internationalizing SaaS solutions. Ruby is used for web scraping a lot, as it’s an effective web scraping solution for extracting information for businesses. It is secure, cost-effective, flexible, and highly productive too. The steps of Ruby Scraping are-

  • Creating the Scraping file
  • Sending the HTTP queries
  • Launching NokoGiri
  • Parsing
  • Export

Read More: Web Scraping with Ruby | Tips & Techniques for Seamless Scraping

Limitations

  • Relatively slower than other languages
  • Supported by a user community only, not a company
  • Difficult to locate good documentation, especially for less-known libraries and gems
  • Inefficient multithreading support

Javascript

Javascript is mainly built for front-end web development. Node.JS works as the web scraping language here that uses Javascript for functioning. Node.JS comes with libraries like Nightmare and Puppeteer that are used commonly for web scraping. 

Read More: Puppeteer Web Scraping Using Javascript

Node.JS

Node.Js is a highly preferred programming language when it comes to web page crawling that practices dynamic coding activities. It also supports practices of distributed crawling.

Node.JS uses Javascript for conducting non-blocking applications that can help enhance multiple simultaneous events. 

Framework

ExpressJS works as a flexible and minimal web application framework of Node.JS that has features for mobile and web applications. Node.JS also allows making easy and quick HTTP calls. It also helps traverse the DOM and extract data through Cheerio, which is an implementation of core jQuery.

Read More: Step-by-Step Guide for Web Scraping with Node JS

Features

  • Conducts APIs and socket-based activities
  • Performs basic data extraction and web scraping activities
  • Good for streaming activities
  • Has a built-in library
  • Comes with a stable and basic communication
  • Good for scraping large-scale data

Limitations

  • Best suited for basic web scraping works
  • Requires multiple code changes because of unstable API
  • Not good for long-running processes
  • Stability is not that good
  • Lacks maturity

 

PHP

PHP might not be much of an ideal choice when it comes to creating a crawler program. You can go for the CURL libraries while web scraping with PHP, or extracting any kind of information such as images, graphics, videos, or any other visual forms. If you’re looking for help building tailored scraping solutions, exploring Custom PHP Development can be a smart move

Read More: Web Scraping with PHP

Core Features

  • Helps transfer files with the help of protocol lists consisting of HTTP and FTP
  • Helps create web spiders that can be utilized to download any information online
  • Uses 3% of CPU usage
  • Open-source
  • Free of Cost
  • Simple to Use
  • Used 39 MB of RAM
  • It can run 723 pages per 10 minutes

Limitations

  • Not suitable for large-scale data extraction
  • Weak multithreading support

C++

C++ offers an outstanding execution for web scraping with its unique setup for this task, but it can be quite costly to set up your web scraping solution with this programming language. Make sure that your budget suits using this language for scraping the web. This language shouldn’t be used if you are not highly focused on extracting data only. 

Core Features

  • Quite a simple user interface
  • Allows for efficiently parallelizing the scraper
  • Works great for extracting data
  • Conducts great web scraping if paired with dynamic coding
  • Can be used to write an HTML parsing library and fetch URLs

Limitations

  • Not great for just any web-related project, as it works better with a dynamic language
  • Expensive to use
  • Not best suited for creating crawlers

Which is better for web scraping Python or JavaScript?

I would say Python is the better language for web scraping due to its ease of use. It comes with a large number of libraries and frameworks, and strong support for data analysis and visualization. Python’s BeautifulSoup and requests libraries are widely used for web scraping, and they provide a simple and powerful way to extract data from HTML documents.

But there is a catch in all of this noise. Python is very bad at handling concurrent threads. Your server will overload itself when you are scraping some websites at a very high volume. Python works in a synchronous mode which might be the only disadvantage of using Python in production scraper.

Example of Extracting title tag using requests and BS4.

				
					import requests
          from bs4 import BeautifulSoup

          url = 'https://www.scrapingdog.com/'

          # Send a GET request to the URL
          response = requests.get(url)

          # Parse the HTML content using Beautiful Soup
          soup = BeautifulSoup(response.content, 'html.parser')

          # Extract the title tag
          title = soup.title.string

          # Print the title
          print(title)
				
			

On the other hand, Javascript is a programming language that can be used at the front end and at the back end too. With the combination of Cheerio and Axios, you can scrape any website in seconds. But the learning curve is steeper when it comes to javascript. And hence the beginner might get demotivated while scraping the website with Javascript. Javascript can also handle multiple requests with ease due to its asynchronous(task can be handled concurrently) nature. So, if you want to scrape millions of pages then Javascript will be the best choice.

Example of Extracting title tag using Axios and Cheerio.

				
					  const axios = require('axios');
          const cheerio = require('cheerio');

          const url = 'https://www.scrapingdog.com/';

          // Send a GET request to the URL using Axios
          axios.get(url)
            .then(response => {
              // Load the HTML content into Cheerio
              const $ = cheerio.load(response.data);

              // Extract the title tag
              const title = $('title').text();

              // Print the title
              console.log(title);
            })
            .catch(error => {
              console.error(error);
            });
				
			

Alternative Solution: Readily Available Tools for Web Scraping

You can go for various open-source tools for web scraping that are free to use. While some of these tools require a specific amount of code modification, some don’t require any coding at all. Most of these tools have limitations to only scrape the page a user is on, and can’t be scaled to scrape web pages in thousands in an automated way.

You can also use these readily available tools like Scrapingdog to work with external web scrapers. They can offer proxy services for scraping, or scrape the data directly and deliver it in the needed format. It allows for allocating time to other development priorities instead of data pulling. Especially companies with no developers or data engineers that can support data analytics can highly benefit from these readily available tools and data. 

Final Verdict: Who’s the Winner

No doubt, all the languages are great for web scraping. The best one entirely depends on your project requirements and skills. If you need a more powerful tool to handle complexities, go for C++ or Ruby. If ease of use and versatility is your thing, go for Python. And, if you want something in between, go for PHP, and its CURL library.

Frequently Asked Questions

Short Answer: Python.

Python is the flexible and easy to learn. Moreover, it is fastest of all the programming languages.

Yes, PHP is a back end scripting language. You can web scrape using plain PHP coding.

No, It is not possible for a Java developer to switch the codes in Python.

Python has huge collection of libraries for web scraping. Hence, extracting data from python is suitable and fast.

Scrapy is a more complex tool and thus can be used for large projects. On the other hand, BeautifulSoup can be used for small projects.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/best-language-for-web-scraping/feed/ 9
12 Use Cases of Web Scraping for Businesses in 2025 https://www.scrapingdog.com/blog/web-scraping-use-cases/ https://www.scrapingdog.com/blog/web-scraping-use-cases/#comments Wed, 17 Sep 2025 05:20:10 +0000 https://scrapingdog.com/?p=8483

TL;DR

  • 12 business use cases for web scraping in 2025.
  • Highlights: PR monitoring, data science, marketing / sales, customer sentiment, product development, lead gen, data enrichment, SEO / rank tracking, influencer discovery, price tracking, content repurposing.
  • Includes real examples using Scrapingdog APIs and no-code flows (Maps leads, rank tracker, pricing, transcripts → blogs).

Web Scraping is an important and smart solution for every other industry, irrespective of their domain. The crucial information it delivers is that it provides actionable insights to gain a competitive edge over its competitors.

If you are still skeptical about web scraping uses, we have piled all the industries in which the tool has successfully displayed its application. In this article, we have mentioned web scraping use cases and applications from the market to help you take note of its usage.

Web Scraping Use Cases & Applications in Different Areas

Web Scraping Software has drastically changed the entire working process of multiple businesses. The different areas in which web scraping can be used is.

Public Relations

Every brand needs to maintain its public relations properly so that it remains in the good books of the customers.

Data scraping helps companies to collect and gather crucial information about their customer’s review, complaints, and praises, through different platforms.

The quicker you respond to the different outlooks of the customers, the easier it is to manage your brand image. By providing real-time information on such aspectsweb scraping tools help you to successfully foster smooth public relations and build a strong brand reputation.

Recently, using Scrapingdog’s suite of APIs, we built a workflow that helps to monitor brand sentiment across the web. You can watch the video below to understand how web scraping helps to monitor it: –

Data Science and Analytics

As the name suggests, the entire industry is dependent largely on the amount of efficient data provided on time. Web scraping helps data scientists acquire the required data set to further use in different business operations.

They might use such crucial information in building machine-learning algorithms and thus require a large volume of data to improve the accuracy of outputs. The presence of different data scraping tools has made the process much simpler by helping them extract relevant data quickly.

Marketing and Sales

Every other business is dependent on its marketing and sales strategies. But to build an effective strategy, businesses need to catch up with the recent industry trends and prepare for different market scenarios.

Web scraping helps them to collect price intelligence data, and product data, understand market demands, and conduct a competitive analysis.

It’s an excellent lead generation strategy that helps them to reach out to prospects by scraping the contact details of potential customers and makes the process easier.

A quick fill-up on all this essential information can alone provide them with the advantage to gain a competitive edge over their competitors.

The data extracted can be further used in product development and setting effective pricing strategies to make a difference in their industry. It also helps them to maximize their revenue generation and achieve high profits.

Also, with a thorough knowledge of the market and its expectations, a business can successfully take hold of its marketing and sales strategy.

Monitoring Consumers Sentiment

Customers are the core of any business on which every company builds itself. Thus, to make any venture successful, it is first important to understand customers’ sentiments thoroughly.

Data extracted from relevant platforms can help you get access to reviews, expectations, and their outlook on any idea in a real-time scenario so that you can accordingly optimize your functions.

You can constantly keep track of your customer’s changing expectations by collecting both historical and present data to make your forecasts and predictions much stronger.

Analyzing consumer feedback and reviews can help you understand different opportunities to improve your services as well as instantly take hold of a situation to put it to your advantage.

Understanding consumers’ sentiments in providing them with the best of facilities will eventually help you to stay one step ahead of your competitors.

Product Development

For any business to be successful, it is important that your product is user-friendly and as per the needs and wants of your customers. Thus, product development requires huge data to research their market and customers’ expectations.

Web scraping can help researchers enhance their product development process by providing them with detailed insights through the acquired data. You can successfully extract the data quickly to make the process much more efficient and smoother.

Lead Generation

A great lead-generation process can help businesses reach additional customers. Especially in the case of startups who rely heavily on their lead generation and conversion process to sustain themselves in the market, data scraping software has proven to be a boon. B2B lead generation services, supported by data scraping tools, streamline client identification and outreach.

It helps them to reach out to leads by scraping the contact details of potential customers and makes the process easier. Earlier, manually collecting and gathering such information took a lot of time and effort, which is now reduced with the help of the automated solution, web scraping.

Read More: How Web Scraping Can Help in Lead Generation

Here’s a simple automation that we built using Scrapingdog API & a no-code automation tool (Make) to get leads from Google Maps: –

Data Enrichment

Data enrichment is a technique to freshen up your old data with new data points. Web scraping when done on correct data sources can be used to get the latest data thus eliminating the risk of not reaching the right audience.

Data enrichment should not be confused with data cleaning which is altogether a different concept where the changes are made within the available set of data itself.

There are many use cases for data enrichment including marketing, sales engagement, investment, and many more.

To Build SEO/Rank Tracking Tools

According to a study by Scribd, B2B businesses rely on 34% of their traffic from this channel alone. (We have discussed this in the blog here)

If you are looking to build an SEO tool, you would certainly need a Google Search Scraping API. This API web scrapes Google search results.

You would get real-time data every time you scrape. This process can be automated too, for example, we recently built a Google Rank Tracking Tool in Google Sheets, which uses an API and is automated to track keywords daily. 

Also, we recently used n8n to show how you can build a rank tracker that too. Here is the link to that post. The example of a rank tracking tool given was to help you understand how web scraping can be helpful in SEO.

There can be many more instances when web scraping could help you with real-time data of competitors and the market.

Other than the search engines, you can scrape data from Google Images, News, Scholar, etc., too.

Web Scraping to Identify Niched Influencers

If you are into marketing you might be aware of what influencer marketing is. Collecting a database when you are starting an influencer marketing campaign can be difficult. One such application of web scraping can be to collect an influencer list for your next marketing campaign.

Knowing which platforms your target audience hangs out on and scraping those particular platforms might help you to get niched influencers in no time. Although, there are many influencer marketing tools to find them, with web scraping you can do them at a much lower cost.

Price Monitoring

How would you keep track of prices if you sell online on multiple e-commerce platforms?

There is a good chance that you might have competitors selling on those platforms, too?

A simple way to keep track of them is by price scraping. We recently built a workflow for tracking prices from Amazon & we used the Amazon product data scraper API here from Scrapingdog.

Repurpose Content

This might not be a traditional use case, but web scraping can also be used to repurpose content.

Yes, there are tools to repurpose your content, but with AI, it is now 10x faster and cheaper.

Think about this, if you created a YouTube video, and for the same topic, you would love to have a blog post.

Would you create a blog from the very scratch?

Well, there is a smarter way: you can scrape a transcript of your own video and convert it into a well-structured blog, which you can, of course, edit before the final posting.

This would save up a lot of time structuring your blog from the very start.

Here’s a video wherein we have used the YouTube Transcript API & given this to our AI model to help us create a blog.

This is just one repurpose use case; depending on the platforms you use, it would differ. But yes, scraping can be the core in there!

Feeding Content To LLMs

Web scraping can be used to train your LLMs. The web still holds a massive amount of data that LLMs have no idea of.

Not only that, you can build your own LLM models by passing structured data to them.

Companies often need models specialised in law, finance, healthcare, or e-commerce. Scraping helps pull domain-rich corpora from case law repositories, stock sites, medical journals, or product reviews.

Another angle is freshness. Since websites constantly update, scraping ensures your LLM doesn’t lag behind real-world changes, especially in quick-moving industries like finance or tech. And because scraping can provide both raw text and structured data (tables, FAQs, metadata), it enriches the training material beyond simple plain text.

Final Words

Web data extraction is expected to grow to 16 Billion by 2035. It is growing steadily at ~16% CAGR.

The use cases of web scraping will increase as the data demand rises. 

If you are a business, you can either avail it as a service. Or can use an API for this task. We at Scrapingdog provide a Web Scraping API that can help you to extract data from the desired platform. 

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/web-scraping-use-cases/feed/ 5
How To Extract Data from Websites to Google Sheets https://www.scrapingdog.com/blog/scrape-website-with-google-sheets/ https://www.scrapingdog.com/blog/scrape-website-with-google-sheets/#comments Tue, 09 Sep 2025 06:44:49 +0000 https://scrapingdog.com/?p=10179

TL;DR

  • No-code scraping in Google Sheets with IMPORTXML / IMPORTHTML (IMPORTDATA noted).
  • IMPORTXML + XPath: start with one item, then generalize the selector to pull all names / prices; fix quote errors.
  • Scale via a dynamic URL using a page-number cell for pagination.
  • IMPORTHTML: pull a full table (Wikipedia demo) into a new sheet.

Web scraping is collecting data from the Internet for price aggregation, market research, lead generation, etc. However, web scraping is mainly done by major programming languages like Python, Nodejs, or PHP, and because of this, many non-coders find it very difficult to collect data from the Internet. They have to hire a developer to complete even small data extraction tasks.

In this article, we will learn how we can scrape a website using Google Sheets without using a single line of code. Google Sheets provides built-in functions like IMPORTHTMLIMPORTXML, and IMPORTDATA that allows you to import data from external sources directly into your spreadsheet. It is a great tool for web scraping. Let’s first understand these built-in functions one by one.

Google Sheets Functions

It is better to discuss Google Sheets’ capabilities before scraping a live website. As explained above, it offers three functions. Let’s discuss those functions in a little detail.

IMPORTHTML– This function provides you with the capability to import a structured list or a table from a website directly into the sheet. Isn’t that great?

				
					=IMPORTHTML("url", "query", index)
				
			
      • "url" is the URL of the webpage containing the table or list you want to import data from.
      • "query" specifies whether to import a table (“table”) or a list (“list”).
      • index the index of the table or list on the webpage. For example, if there are multiple tables on the page, you can specify which one to import by providing its index (e.g., 1 for the first table).

      IMPORTXML– This function can help you extract text/values or specific data elements from structured HTML or XML.

				
					=IMPORTXML(url, xpath_query)
				
			
  • url is the URL of the webpage or XML file containing the data you want to import.
  • xpath_queryis the query used to specify the data element or value you want to extract from the XML or HTML source.

IMPORTDATA– This function can help you import data from any external CSV or a TSV file directly into your Google sheet. It will not be discussed in this article later because the application of this function in web scraping is too small.

				
					=IMPORTDATA(url)
				
			

Scraping with Google Sheets

This section will be divided into two parts. In the first part, we will use IMPORTXML for scraping, and in the next section, we will use IMPORTHTML for the same.

Scraping with IMPORTXML

The first step would be to set up an empty or blank Google Sheet. You can do it by visiting https://sheets.google.com/.

You can click on Blank Spreadsheet to create a blank sheet. Once this is done we have to analyze the structure of the target website. For this tutorial, we are going to scrape this website https://scrapeme.live/shop/.

We are going to scrape the name of the Pokemon and its listed price. First, we will learn how we can scrape data for a single Pokemon and then later we will learn how it can be done for all the Pokemons on the page.

Scraping data for a single Pokemon

First, we will create three columns Name, Currency, and Price in our Google Sheet.

As you know IMPORTXML function takes two inputs as arguments.

  • One is the target URL and in our case the target URL is https://scrapeme.live/shop/
  • Second is the xpath_query which specifies the XPath expression used to extract specific data from the XML or HTML source.

I know you must be wondering how you will get this xpath_query, well that is super simple. We will take advantage of Chrome developer tools in this case. Right-click on the name of the first Pokemon and then click on Inspect to open Chrome Dev Tools.

Now, we need an XPath query for this element. Well this can be done by a right click on that h2 tag and then click on the Copy button and finally click on the Copy XPath button. ⬇

This is what you will get once you copy the XPath.
				
					//*[@id="main"]/ul/li[1]/a[1]/h2
				
			
We can use this XPath query to get the name of the first Pokemon. Remember to replace any double quotes in the xpath_query with single quotes otherwise, you will get this error in Google Sheets like the one in the image below.

Formula parse error can be resolved by passing single quotes in xpath_query. So, once you type the right function, Google Sheets will pull the name of the first Pokemon.

				
					=IMPORTXML("https://scrapeme.live/shop/", "//*[@id='main']/ul/li[1]/a[1]/h2")
				
			

We can see Bulbasaur being pulled from the target web page in the A2 cell of the sheet. Well, this was fast and efficient too!

Now, the question is how to pull all the names. Do we have to apply a different xpath_query for each Pokemon present on the target page? Well, the answer is NO. We just have to figure out an XPath query that selects all the names of the Pokemon at once.

If you notice our current xpath_query you will notice that it is pulling data from the li element with an index 1. If you remove that index you will notice that it selects all the name tags.

Great! Now, our new xpath_query will look like this.
				
					//*[@id='main']/ul/li/a[1]/h2
				
			

Let’s change our xpath_query in the IMPORTXML function.

				
					=IMPORTXML("https://scrapeme.live/shop/", "//*[@id='main']/ul/li/a[1]/h2")
				
			
Let’s use this in the Google Sheet now.
In just a few seconds Google Sheet was able to pull all the data from the target page and populate it in the sheet itself. This was super COOL! Similarly, you can pull the currency and price.
xpath_query for all the price tags will be //*[@id=’main’]/ul/li/a[1]/span/span.
				
					=IMPORTXML("https://scrapeme.live/shop/", "//*[@id='main']/ul/li/a[1]/span/span")

				
			

Let’s apply this to our currency column.

Let’s see whether we can scale this process by scraping more than one page. When you click on the II page by scrolling down you will notice that the website URL changes to https://scrapeme.live/shop/page/2/ and when you click on the III page the URL changes to https://scrapeme.live/shop/page/3/. We can see the pattern that the number after page/ increases by 1 on every click. This much information is enough for us to scale our current scraping process.

Create another column Page in your spreadsheet.

We have to make our target URL dynamic so that it can pick the page value from the E2 cell. This can be done by changing our target URL to this.

				
					"https://scrapeme.live/shop/page/"&E2
				
			

Remember you have to change the target URL to the above URL for both the Name and Price columns. Now, the target URL changes based on the value you provide to the E2 cell.

This is how you can scale the web scraping process by concatenating the static part of the URL with the cell reference containing the dynamic part.

Scraping with IMPORTHTML

Create another sheet within your current spreadsheet by clicking the plus button at the bottom.

For this section, we are going to use https://en.wikipedia.org/wiki/World_War_II_casualties as our target URL. We are going to pull country-wise data from this table.

				
					=IMPORTHTML("https://en.wikipedia.org/wiki/World_War_II_casualties", "table", 1)

				
			

The above function will pull this data.

This function helps you quickly import the data from a table.

Overall, IMPORTHTML is a versatile function that can save you time and effort by automating the process of importing data from HTML tables or lists on web pages directly into your Google Sheets. It’s especially useful for tasks that involve data scraping, reporting, analysis, and monitoring of external data sources.

However, IMPORTHTML may not always format imported data as expected. This can result in inconsistent formatting or unexpected changes to the data once it’s imported into Google Sheets. Users may need to manually adjust formatting or use additional formulas to clean up the imported data.

Limitations of using IMPORTXML and IMPORTHTML

  • IMPORTXML and IMPORTHTML are designed for simple data extraction tasks and may not support advanced scraping requirements such as interacting with JavaScript-generated content, handling dynamic web pages, or navigating complex website structures.
  • Google Sheets imposes rate limits on the frequency and volume of requests made by IMPORTXML and IMPORTHTML functions. Exceeding these limits can result in errors, delays, or temporary suspensions of the functions. This makes it challenging to scrape large volumes of data or scrape data from multiple websites rapidly.
  • Imported data may require additional formatting, cleaning, or transformation to make it usable for analysis or integration with other systems. This can introduce complexity and overhead, particularly when dealing with inconsistent data formats or messy HTML markup.

An alternative to scraping with Google Sheets – Scrapingdog

As discussed above, scraping with Google Sheets at scale has many limitations, and Scrapingdog can help you bypass all of those limitations.

Recently, we have introduced an add-on to Google Sheets that helps all non-coders to scrape data from different platforms.

Here is how you install this: go to extensions, search for Scrapingdog, and finally install the add-on.

Not A Developer? Scrapingdog Can Help You Extract Data in Your Desired Format

Contact us Today with Your Needs & Our Team Will Connect You Shortly!!

google sheet gif

Conclusion

We’ve explored the capabilities of IMPORTXML and IMPORTHTML functions in Google Sheets for web scraping. They provide a convenient and accessible way to extract data from websites directly into your spreadsheets, eliminating the need for complex coding or specialized software.

However, it’s important to be mindful of the limitations of IMPORTXML and IMPORTHTML, such as rate limits, HTML structure dependencies, and data formatting challenges. And for those, you can always use Scrapingdog’s Web Scraping APIs to scale your data extraction.

If you need any help integrating our APIs in your workflow, we are ready available to help you out on chat.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/scrape-website-with-google-sheets/feed/ 6
How To Extract Data From Any Website https://www.scrapingdog.com/blog/how-to-extract-data-from-website/ https://www.scrapingdog.com/blog/how-to-extract-data-from-website/#comments Mon, 08 Sep 2025 06:03:40 +0000 https://scrapingdog.com/?p=8396

TL;DR

  • Ways: manual, extensions, no-code tools, official APIs, services, or a custom script (pros / cons).
  • Demo: Python (requests + BS4) grabs book title, price, rating; code + output.
  • Workflow: choose targets → inspect DOM (SelectorGadget) → fetch & parse; watch for IP blocks.
  • Scale: Scrapingdog handles proxies; AI scraper returns structured JSON; 1,000 free credits.

Extracting data from a website can be a useful skill for a wide range of applications, such as data mining, data analysis, and automating repetitive tasks.

With the vast amount of data available on the internet, being able to get fresh data and analyze it can provide valuable insights and help you make informed & data-backed decisions.

Pulling information can help finance companies whether to buy or sell.

The travel industry can scrape prices & track from their niche market to get a competitive advantage.

Restaurants can use the data as reviews and make necessary changes if some stuff is inappropriate.

Job seekers can scrape resume examples from various sites, which could help them format their resumes.

So, there are endless applications when you pull data from relevant websites.

In this article, we will see various methods for extracting data from a website and provide a step-by-step guide on how to do so.

Methods for extracting data from a website

There are several methods for extracting data from a website, and the best method for you will depend on your specific needs and the structure of the website you are working with.

Here are some common methods for extracting data:

Data Extraction methods

Manual copy and paste

One of the simplest methods for extracting data from a website is to simply copy and paste the data into a spreadsheet or other document. This method is suitable for small amounts of data and can be used when the data is easily accessible on the website.

ProsCons
No risk of violating website terms of serviceProne to human error
Ideal for ad-hoc, one-time data extractionsNot scalable for ongoing or large tasks

 

By Using Web browser extensions

Several web browser extensions can help you in this process. These extensions can be installed in your web browser and allow you to select and extract specific data points from a website.

ProsCons
Easy to install and use directly in the browserLimited customization options
Often free or low-cost solutions availableCan be blocked by websites or outdated with browser updates

Web scraping tools

There are several no-code tools available that can help you extract data from a website. These tools can be used to navigate the website and extract specific data points based on your requirements.

ProsCons
No coding skills required, making it accessibleOften requires a paid subscription
Can handle large amounts of data efficientlyLimited flexibility compared to custom scrapers

Official Data APIs

Many websites offer APIs (Application Programming Interfaces) that allow you to access their data in a structured format. Using an API for web scraping can be a convenient way to extract data from a website, as the data is already organized and ready for use.

However, not all websites offer APIs, and those that do may have restrictions on how the data can be used.

ProsCons
Provides structured and reliable data accessLimited to data the API provider chooses to share
Typically complies with website terms of serviceOften has usage restrictions and rate limits

Web scraping services

If you don’t want to handle proxies and headless browsers, you can use a web scraping service to extract data from a website. These services handle the technical aspects of web scraping and can provide you with data in your desired output format.

ProsCons
Outsources technical complexities, saving timeCan be costly for large-scale projects
Handles proxies and IP rotation automaticallyLimited control over scraping process

Creating your own scraper

You can even code your own scraper. Then you can use libraries like BS4 to extract necessary data points out of the raw data.

But this process has a limitation and that is IP blocking. If you want to use this process for heavy scraping then your IP will be blocked by the host in no time. But for small projects, this process is cheaper and more manageable. Many developers combine this approach with ETL tools to efficiently extract, transform, and load data at scale while avoiding common scraping limitations.

Using any of these methods you can extract data and further can do data analysis.

ProsCons
Highly customizable to specific data needsRequires programming skills and maintenance
Can bypass limitations of pre-built toolsRisk of IP blocking or website detection

Creating Our Scraper Using Python to Extract Data

Now that you have an understanding of the different methods for extracting data from a website, let’s take a look at the general steps you can follow to extract data from a website.

  1. Identify the data you want: Before you start with the process, it is important to have a clear idea of what data you want to extract and why. This will help you determine the best approach for extracting the data.
  2. Inspect the website’s structure: You will need to understand how the website is structured and how the data is organized. You can use extensions like Selectorgadget to identify the location of any element.
  3. Script: After this, you have to prepare a script through which you are going to automate this process. The script is mainly divided into two parts. First, you have to make an HTTP GET request to the target website and in the second part, you have to extract the data out of the raw HTML using some parsing libraries like BS4 and Cheerio.

 

Let’s understand with an example. We will use Python for this example.

Also if you are new to Web scraping or Python, I have a dedicated guide on it. Do check it out!!

I am assuming that you have already installed Python on your machine.

The reason behind selecting Python is that it is a popular programming language that has a large and active community of developers, and it is well-suited for web scraping due to its libraries for accessing and parsing HTML and XML data.

For this example, we are going to install two Python libraries.

  1. Requests will help us to make an HTTP connection with Bing.
  2. BeautifulSoup will help us to create an HTML tree for smooth data extraction.

At the start, we are going to create a folder where we will store our script. I have named the folder “dataextraction”.

				
					>> mkdir dataextraction
>> pip install requests 
>> pip install beautifulsoup4
				
			

We will scrape this webpage. We will extract the following data from it:

  • Name of the book
  • Price
  • Rating

Let’s import the libraries that we have installed.

				
					import requests
from bs4 import BeautifulSoup
				
			

The next step would be to fetch HTML data from the target webpage. You can use the requests library to make an HTTP request to the web page and retrieve the response.

				
					l=[]
o={}

target_url="http://books.toscrape.com/"



resp = requests.get(target_url)
				
			

Now let’s parse the HTML code using Beautiful Soup. You can use the BeautifulSoup constructor to create a Beautiful Soup object from the HTML, and then use the object to navigate and extract the data you want.

				
					soup = BeautifulSoup(resp.text,'html.parser')
				
			

Before moving ahead let’s find the DOM location of each element by inspecting them.

article tag holds all the book data. So, it will be better for us to extract all these tags inside a list. Once we have this we can extract all the necessary details for any particular book.

Rating is stored under the class attribute of tag p. We will use .get() method to extract this data.

				
					o["rating"]=allBooks[0].find("p").get("class")[1]
				
			

The name of the book is stored inside the title attribute under the h3 tag.

				
					o["name"]=allBooks[0].find("h3").find("a").get("title")
				
			

Similarly, you can find the price data stored inside the p tag of class price_color.

				
					o["price"]=allBooks[0].find("p",{"class":"price_color"}).text
				
			

Complete Code

Using a similar technique you can find data from all the books. Obviously, you will have to run for a loop for that. But the current code will look like this.

				
					import requests
from bs4 import BeautifulSoup

l=[]
o={}

target_url="http://books.toscrape.com/"



resp = requests.get(target_url)


soup = BeautifulSoup(resp.text,'html.parser')

allBooks = soup.find_all("article",{"class":"product_pod"})

o["rating"]=allBooks[0].find("p").get("class")[1]
o["name"]=allBooks[0].find("h3").find("a").get("title")
o["price"]=allBooks[0].find("p",{"class":"price_color"}).text
l.append(o)

print(l)
				
			

The output will look like this.

				
					[{'rating': 'Three', 'name': 'A Light in the Attic', 'price': '£51.77'}]
				
			

How Scrapingdog can help you extract data from a website?

You can scrape data using any programming language. We used Python in this blog, however, if you want to scale up this process you would need proxies.

Scrapingdog solves the hassle of integrating proxies and gives you a pretty straightforward Web Scraping API

You can watch the video tutorial below to understand more on how Scrapingdog can help you pull the data from any website. ⬇

Using the API you can create a seamless unbreakable data pipeline that can deliver you data from any website. We use a proxy pool of over 10M IPs which rotates on every request, this helps in preventing any IP blocking.

We offer 1000 free credits to spin it for testing purpose. You can sign up from here and check the API on your desired website.

How To Use Scrapingdog’s AI Web Scraping API To Extract Structured Data

With the general web scraping, Scrapingdog also provides an AI-enabled web scraper that can be used to feed data to LLMs. The data it gives is in structured JSON or Markdown format.

You can easily test this API on Scrapingdog’s dashboard. ⬇

Scrapingdog dashboard

In the general scraper section, you can put in a URL as an input from which you want to take structured data out. And in the parameter “AI Query” you can tell AI to get you the desired data. (I am hoping you are signed up for Scrapingdog to test it)

To better understand this, suppose I want to extract a summary of a webpage, I will put the URL (in URL Parameter)— https://www.searchenginejournal.com/google-says-gsc-sitemap-uploads-dont-guarantee-immediate-crawls/554747/

And in the AI query param, I will write “Give the summary of the webpage in JSON, summarize the page in 5 points.”

The output returned is JSON as we asked: –

				
					{
  "points": [
    "Google's John Mueller explained that uploading sitemaps does not guarantee immediate crawling of URLs and there are no fixed timelines for recrawling.",
    " Submitting the main sitemap.xml file is sufficient; individual granular sitemaps are not necessary according to Mueller.",
    "Using the URL Inspection tool can help request crawling for specific pages, but it only supports one URL at a time.",
    " While uploading all sitemaps containing changed URLs may provide reassurance, it is not mandatory for indexing.",
    " There is no guarantee or specific timeframe for when Google will crawl URLs listed in sitemaps."
  ]
}
				
			

Below is a quick video that shows how our dashboard works while using this scraper ⬇

Via this way, you can summarize web pages at scale, and each time you can get structured data in the output.

You can further add rules by using “AI Extract Rules” to get data with desired data points. This feature can also be used to keep an eye on competitors for price monitoring.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/how-to-extract-data-from-website/feed/ 11