Scraping Dog https://www.scrapingdog.com Wed, 26 Nov 2025 13:51:00 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 https://www.scrapingdog.com/wp-content/uploads/2024/09/fav-150x150.png Scraping Dog https://www.scrapingdog.com 32 32 Introducing: Pay As You Go Pricing Model https://www.scrapingdog.com/blog/introducing-pay-as-you-go-pricing-model/ https://www.scrapingdog.com/blog/introducing-pay-as-you-go-pricing-model/#respond Wed, 26 Nov 2025 13:37:19 +0000 https://www.scrapingdog.com/?p=31590

A lot of our current users have been asking for PAYG plans, so recently, on our dashboard, we have provided this option.

Any user who signs up for Scrapingdog services can find options for PAYG there.

How Many Credits Do You Get In PAYG?

For every $10, you will receive 25,000 credits; the minimum top-up is $10. You can then keep on increasing with a multiplier of 10.

For $50, you will get 125,000 credits & for $100, you will get 250,000 credits.

PAYG credits don’t have an expiration date, so you can consume them as needed.

For Whom PAYG Plans Suits Best?

PAYG plans are made for users who don’t want a fixed monthly bill and prefer using credits only when needed. These include:

  • Freelancers who work on client projects from time to time
  • One-time users who only need data for a single task or project
  • Small teams that are just getting started and don’t have steady API usage
  • Students or hobby users who want to test things without a monthly plan
  • Agencies that handle seasonal or unpredictable workloads
  • Anyone who wants full flexibility — top-up when you need it, use it at your own pace

Note: With PAYG plans, you still get access to all the APIs.

Just keep in mind that every API has a different credit cost.

You can check the documentation to understand how each API consumes credits and plan your usage easily.

If you are already on a subscription plan, but due to some reasons, it has expired, you can add more credits from this option in the dashboard.

Note: You cannot use PAYG & Subscription plans together.

For any more queries, you can definitely reach out to us on our chat or email us at info@scrapingdog.com.

Why Subscriptions are Still Better Than PAYG

Subscriptions are minimum ~50% more economical when compared with PAYG. For production work, you should always prefer subscriptions. They give you higher concurrency, more stable usage, and predictable monthly billing. If your team uses the API every day or your traffic keeps growing, subscriptions save a lot of money in the long run. PAYG is great for small or irregular tasks, but for anything serious or ongoing, subscriptions are the better choice.

]]>
https://www.scrapingdog.com/blog/introducing-pay-as-you-go-pricing-model/feed/ 0
Scrape Amazon Using Python (Updated) https://www.scrapingdog.com/blog/scrape-amazon/ https://www.scrapingdog.com/blog/scrape-amazon/#comments Mon, 24 Nov 2025 02:20:46 +0000 https://scrapingdog.com/?p=5524

TL;DR

  • Walks you through how to scrape product pages on Amazon using Python with requests + BeautifulSoup (for title, images, price, rating, specs).
  • Shows how to mimic browser-like headers to bypass Amazon’s anti-bot mechanisms.
  • Details how to extract high-resolution images via regex search for hiRes in the page’s <script> content.
  • Provides a full example script with rotating user-agents for basic scraping.
  • Explains when you need to scale: using a proxy/API solution (specifically Scrapingdog’s Amazon Scraper API) to avoid IP blocks and handle high volume.
  • Covers how to call that API (by ASIN, domain, postal-code-based locale) and other related endpoints (offers, autocomplete) for richer Amazon data.

The e-commerce industry has grown in recent years, transforming from a mere convenience to an essential facet of our daily lives.

As digital storefronts multiply and consumers increasingly turn to online shopping, there’s an increasing demand for data that can drive decision-making, competitive strategies, and customer engagement in the digital marketplace.

Additionally, scraped Amazon product data can significantly enhance customer service automation by providing customer service teams with real-time product information, pricing details, and availability status, enabling them to respond more efficiently to customer inquiries and resolve issues faster.

If you are into an e-commerce niche, scraping Amazon can give you a lot of data points to understand the market.

In this guide, we will use Python to scrape Amazon, do price scraping from this platform, and demonstrate how to extract crucial information to help you make well-informed decisions in your business.

Setting up the prerequisites

I am assuming that you have already installed python 3.x on your machine. If not then you can download it from here. Apart from this, we will require two III-party libraries of Python.

  • Requests– We will use this library to connect HTTP with the Amazon page. This library will help us to extract the raw HTML from the target page.
  • BeautifulSoup– This is a powerful data parsing library. Using this we will extract necessary data out of the raw HTML we get using the requests library.

Before we install these libraries we will have to create a dedicated folder for our project.

				
					mkdir amazonscraper
				
			

Now, we will have to install the above two libraries in this folder. Here is how you can do it.

				
					pip install beautifulsoup4
pip install requests
				
			
Now, you can create a Python file by any name you wish. This will be the main file where we will keep our code. I am naming it amazon.py

Downloading raw data from amazon.com

Let’s make a normal GET request to our target page and see what happens. For GET request we are going to use the requests library.
				
					import requests
from bs4 import BeautifulSoup

target_url="https://www.amazon.com/dp/B0BSHF7WHW"

resp = requests.get(target_url)

print(resp.text)
				
			

Once you run this code, you might see this.

This is a captcha from amazon.com and this happens once their architecture observes that the incoming request is from a bot/script and not from a real human being.

To bypass this on-site protection of Amazon we can send some headers like User-Agent. You can even check what headers are sent to amazon.com once you open the URL in your browser. You can check them from the network tab.

Once you pass this header to the request, your request will act like a request coming from a real browser. This can melt down the anti-bot wall of amazon.com. Let’s pass a few headers to our request.

				
					import requests
from bs4 import BeautifulSoup

target_url="https://www.amazon.com/dp/B0BSHF7WHW"

headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}

resp = requests.get(target_url, headers=headers)

print(resp.text)
				
			

Once you run this code you might be able to bypass the anti-scraping protection wall of Amazon.

Now let’s decide what exact information we want to scrape from the page.

What are we going to scrape from Amazon?

It is always great to decide in advance what are you going to extract from the target page. This way we can analyze in advance which element is placed where inside the DOM.

Product details we are going to scrape from Amazon

We are going to scrape five data elements from the page.

  • Name of the product
  • Images
  • Price (Most important)
  • Rating
  • Specs

First, we are going to make the GET request to the target page using the requests library and then using BS4 we are going to parse out this data. Of course, there are multiple other libraries like lxml that can be used in place of BS4, but BS4 has the most powerful and easy-to-use API.

Before making the request we are going to analyze the page and find the location of each element inside the DOM. One should always do this exercise to identify the location of each element.

We are going to do this by simply using the developer tool. This can be accessed by right-clicking on the target element and then clicking on the inspect. This is the most common method, you might already know this.

Identifying the location of each element

Location of the title tag

Identifying location of title tag in source code of amazon website

Once you inspect the title you will find that the title text is located inside the h1 tag with the id title.

Coming back to our amazon.py file, we will write the code to extract this information from Amazon.

				
					import requests
from bs4 import BeautifulSoup

l=[]
o={}


url="https://www.amazon.com/dp/B0BSHF7WHW"

headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}

resp = requests.get(url, headers=headers)
print(resp.status_code)

soup=BeautifulSoup(resp.text,'html.parser')


try:
    o["title"]=soup.find('h1',{'id':'title'}).text.strip()
except:
    o["title"]=None





print(o)
				
			

Here the line soup=BeautifulSoup(resp.text,’html.parser’) is using the BeautifulSoup library to create a BeautifulSoup object from an HTTP response text, with the specified HTML parser.

Then using soup.find() method will return the first occurrence of the tag h1 with id title. We are using .text method to get the text from that element. Then finally I used .strip() method to remove all the whitespaces from the text we receive.

Once you run this code you will get this.

				
					[{'title': 'Apple 2023 MacBook Pro Laptop M2 Pro chip with 12‑core CPU and 19‑core GPU: 16.2-inch Liquid Retina XDR Display, 16GB Unified Memory, 1TB SSD Storage. Works with iPhone/iPad; Space Gray'}]
				
			

If you have not read the above section where we talked about downloading HTML data from the target page then you won’t be able to understand the above code. So, please read the above section before moving ahead.

Location of the image tag

This might be the most tricky part of this complete tutorial. Let’s inspect and find out why it is a little tricky.

Inspecting image tag in the source code of amazon website
As you can see the img tag in which the image is hidden is stored inside div tag with class imgTagWrapper.
				
					allimages = soup.find_all("div",{"class":"imgTagWrapper"})
print(len(allimages))
				
			

Once you print this it will return 3. Now, there are 6 images and we are getting just 3. The reason behind this is JS rendering. Amazon loads its images through an AJAX request at the backend. That’s why we never receive these images when we make an HTTP connection to the page through requests library.

Finding high-resolution images is not as simple as finding the title tag. But I will explain to you step by step how you can find all the images of the product.

  1. Copy any product image URL from the page.
  2. Then click on the view page source to open the source page of the target webpage.
  3. Then search for this image.

You will find that all the images are stored as a value for hiRes key.

All this information is stored inside a script tag. Now, here we will use regular expressions to find this pattern of hiRes”:”image_url”

We can still use BS4 but it will make the process a little lengthy and it might slow down our scraper. For now, we will use (.+?) non-greedy matches for one or more characters. Let me explain what each character in this expression means.

  • The . matches any character except a newline
  • The + matches one or more occurrences of the preceding character.
  • The ? makes the match non-greedy, meaning that it will match the minimum number of characters needed to satisfy the pattern.

The regular expression will return all the matched sequences of characters from the HTML string we are going to pass.

				
					images = re.findall('"hiRes":"(.+?)"', resp.text)
o["images"]=images
				
			

This will return all the high-resolution images of the product in a list. In general, it is not advised to use regular expression in data parsing but it can do wonders sometimes.

Parsing the price tag

There are two price tags on the page, but we will only extract the one which is just below the rating.

We can see that the price tag is stored inside span tag with class a-price. Once you find this tag you can find the first child span tag to get the price. Here is how you can do it.
				
					try:
    o["price"]=soup.find("span",{"class":"a-price"}).find("span").text
except:
    o["price"]=None
				
			

Once you print object o, you will get to see the price.

				
					{'price': '$2,499.00'}
				
			

Extract rating

You can find the rating in the first i tag with class a-icon-star. Let’s see how to scrape this too.

				
					try:
    o["rating"]=soup.find("i",{"class":"a-icon-star"}).text
except:
    o["rating"]=None
				
			

It will return this.

				
					{'rating': '4.1 out of 5 stars'}
				
			

In the same manner, we can scrape the specs of the device.

Extract the specs of the device

These specs are stored inside these tr tags with class a-spacing-small. Once you find these you have to find both the span under it to find the text. You can see this in the above image. Here is how it can be done.

				
					specs_arr=[]
specs_obj={}

specs = soup.find_all("tr",{"class":"a-spacing-small"})

for u in range(0,len(specs)):
    spanTags = specs[u].find_all("span")
    specs_obj[spanTags[0].text]=spanTags[1].text


specs_arr.append(specs_obj)
o["specs"]=specs_arr
				
			

Using .find_all() we are finding all the tr tags with class a-spacing-small. Then we are running a for loop to iterate over all the tr tags. Then under for loop we find all the span tags. Then finally we are extracting the text from each span tag.

Once you print the object o it will look like this.

Throughout the tutorial, we have used try/except statements to avoid any run time error. We have not managed to scrape all the data we decided to scrape at the beginning of the tutorial.

Complete Code

You can of course make a few changes to the code to extract more data because the page is filled with large information. You can even use cron jobs to mail yourself an alert when the price drops. Or you can integrate this technique into your app, this feature can mail your users when the price of any item on Amazon drops.

But for now, the code will look like this.

				
					import requests
from bs4 import BeautifulSoup
import re

l=[]
o={}
specs_arr=[]
specs_obj={}

target_url="https://www.amazon.com/dp/B0BSHF7WHW"

headers={"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}

resp = requests.get(target_url, headers=headers)
print(resp.status_code)
if(resp.status_code != 200):
    print(resp)
soup=BeautifulSoup(resp.text,'html.parser')


try:
    o["title"]=soup.find('h1',{'id':'title'}).text.lstrip().rstrip()
except:
    o["title"]=None


images = re.findall('"hiRes":"(.+?)"', resp.text)
o["images"]=images

try:
    o["price"]=soup.find("span",{"class":"a-price"}).find("span").text
except:
    o["price"]=None

try:
    o["rating"]=soup.find("i",{"class":"a-icon-star"}).text
except:
    o["rating"]=None


specs = soup.find_all("tr",{"class":"a-spacing-small"})

for u in range(0,len(specs)):
    spanTags = specs[u].find_all("span")
    specs_obj[spanTags[0].text]=spanTags[1].text


specs_arr.append(specs_obj)
o["specs"]=specs_arr
l.append(o)


print(l)
				
			

Changing Headers on every request

With the above code, your scraping journey will come to a halt, once Amazon recognizes a pattern in the request.

To avoid this you can keep changing your headers to keep the scraper running. You can rotate a bunch of headers to overcome this challenge. Here is how it can be done.

				
					import requests
from bs4 import BeautifulSoup
import re
import random

l=[]
o={}
specs_arr=[]
specs_obj={}

useragents=['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4894.117 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4855.118 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4892.86 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4854.191 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4859.153 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.79 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36/null',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36,gzip(gfe)',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4895.86 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 12_3_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_13) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4860.89 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4885.173 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4864.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_12) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4877.207 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 12_2_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML%2C like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.133 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_16_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4872.118 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 12_3_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_13) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4876.128 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML%2C like Gecko) Chrome/100.0.4896.127 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36']

target_url="https://www.amazon.com/dp/B0BSHF7WHW"

headers={"User-Agent":useragents[random.randint(0,31)],"accept-language": "en-US,en;q=0.9","accept-encoding": "gzip, deflate, br","accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"}

resp = requests.get(target_url,headers=headers)
print(resp.status_code)
if(resp.status_code != 200):
    print(resp)
soup=BeautifulSoup(resp.text,'html.parser')


try:
    o["title"]=soup.find('h1',{'id':'title'}).text.lstrip().rstrip()
except:
    o["title"]=None


images = re.findall('"hiRes":"(.+?)"', resp.text)
o["images"]=images

try:
    o["price"]=soup.find("span",{"class":"a-price"}).find("span").text
except:
    o["price"]=None

try:
    o["rating"]=soup.find("i",{"class":"a-icon-star"}).text
except:
    o["rating"]=None


specs = soup.find_all("tr",{"class":"a-spacing-small"})

for u in range(0,len(specs)):
    spanTags = specs[u].find_all("span")
    specs_obj[spanTags[0].text]=spanTags[1].text


specs_arr.append(specs_obj)
o["specs"]=specs_arr
l.append(o)


print(l)
				
			

We are using a random library here to generate random numbers between 0 and 31(31 is the length of the useragents list). These user agents are all latest so you can easily bypass the anti-scraping wall.

But again this technique is not enough to scrape Amazon at scale. What if you want to scrape millions of such pages? Then this technique is super inefficient because your IP will be blocked. So, for mass scraping one has to use a web scraping proxy API to avoid getting blocked while scraping.

Using Scrapingdog for scraping Amazon

The advantages of using Scrapingdog’s Amazon Scraper API are:

  • You won’t have to manage headers anymore.
  • Every request will go through a new IP. This keeps your IP anonymous.
  • Our API will automatically retry on its own if the first hit fails.
  • Scrapingdog will handle issues like changes in HTML tags. You won’t have to check every time for changes in tags. You can focus on data collection.

Let me show you how easy it is to scrape Amazon product pages using Scrapingdog with just an ASIN code. It would be great if you could read the documentation first before trying the API.

Before you try the API you have to signup for the free pack. The free pack comes with 1000 credits which is enough for testing Amazon scraper API.

				
					import requests

url = "https://api.scrapingdog.com/amazon/product"
params = {
    "api_key": "Your-API-Key",
    "domain": "com",
    "asin": "B0C22KCKVQ"
}

response = requests.get(url, params=params)

if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f"Request failed with status code {response.status_code}")
				
			

Once you run this code you will get this beautiful JSON response.

This JSON contains almost all the data you see on the Amazon product page. 

Scraping Amazon data based on Postal Codes

Now, let’s scrape the data for a particular postal code. For this example, we are going to target New York. 10001 is the postal code of New York.

				
					 import requests
  
  api_key = "Your-API-Key"
  url = "https://api.scrapingdog.com/amazon/product"
  
  params = {
      "api_key": api_key,
      "asin": "B0CTKXMQXK",
      "domain": "com",
      "postal_code": "10001",
      "country": "us"
  }
  
  response = requests.get(url, params=params)
  
  if response.status_code == 200:
      data = response.json()
      print(data)
  else:
      print(f"Request failed with status code: {response.status_code}")
				
			

Once you run this code you will get a beautiful JSON response based on the New York Location.

 

I have also created a video to guide you using Scrapingdog to scrape Amazon.

Scraping Amazon Offers Data Using Scrapingdog

This data will help you identify details about the seller, delivery options, pricing, etc.

				
					import requests

url = "https://api.scrapingdog.com/amazon/offers"

params = {
    "api_key": "your-api-key",
    "asin": "B0BVJT3HVN",
    "domain": "com",
    "country": "us"
}

try:
    response = requests.get(url, params=params)
    response.raise_for_status()  # Raise error for bad responses
    data = response.json()
    print("<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> API Response:")
    print(data)
except requests.exceptions.RequestException as e:
    print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/274c.png" alt="❌" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Request failed: {e}")
				
			

After running this code you will get this JSON response.

In addition to this, if you’re building a keyword research tool, validating product ideas, or running sentiment analysis, you can use Scrapingdog’s Amazon Autocomplete API for these use cases.

You just have to make a GET request to this endpoint https://api.scrapingdog.com/amazon/autocomplete and pass your target keyword. For example let’s say you are looking for a pen holder then you will pass a prefix by the name “pen holder”.

				
					import requests

# API URL and key
api_url = "https://api.scrapingdog.com/amazon/autocomplete"
api_key = "your-api-key"

# Search parameters
domain = "com"
prefix = "pen holder"

# Create a dictionary with the query parameters
params = {
    "api_key": api_key,
    "prefix": prefix
}

# Send the GET request with the specified parameters
response = requests.get(api_url, params=params)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f"HTTP Request Error: {response.status_code}")
				
			

This will generate a list of keywords associated with the prefix.

Conclusion

Over 80% of the e-commerce businesses today rely on web scraping. If you’re not using it, you’re already falling behind. 

There are many marketplaces that you can scrape & extract data from. Having a strategy to scrape e-commerce data for your product can take you far ahead of your competitors. 

In this tutorial, we scraped various data elements from Amazon. First, we used the requests library to download the raw HTML, and then using BS4 we parsed the data we wanted. You can also use lxml in place of BS4 to extract data. Python and its libraries make scraping very simple for even a beginner. Once you scale, you can switch to web scraping APIs to scrape millions of such pages.

Combination of requests and Scrapingdog can help you scale your scraper. You will get more than a 99% success rate while scraping Amazon with Scrapingdog.

If you want to track the price of a product on Amazon, we have a comprehensive tutorial on tracking Amazon product prices using Python

I hope you like this little tutorial. If you do, please don’t forget to share it with your friends and on your social media.

You can combine this data with business plan software to offer different solutions to your clients.

If you are a non-developer and wanted to scrape the data from Amazon, here is a good news for you.
We have recently launched a Google Sheet add-on Amazon Scraper. 

Here is the video 🎥 tutorial for this action.

Frequently Asked Questions

 

Amazon detects scraping by the anti-bot mechanism which can check your IP address and thus can block you if you continue to scrape it. However, using a proxy management system will help you to bypass this security measure.

]]>
https://www.scrapingdog.com/blog/scrape-amazon/feed/ 15
Zenrows vs Scrapingbee vs Scrapingdog: Which One To Choose & Why https://www.scrapingdog.com/blog/zenrows-vs-scrapingbee-vs-scrapingdog/ https://www.scrapingdog.com/blog/zenrows-vs-scrapingbee-vs-scrapingdog/#respond Mon, 17 Nov 2025 10:38:36 +0000 https://www.scrapingdog.com/?p=31451

In this post, we’ll walk through a detailed comparison of three popular web-scraping API providers: ZenRowsScrapingBee, and Scrapingdog. We’ll examine pricing, performance, success rates, and key features so you can decide which fits your needs.

We’ll be testing these APIs across multiple domains before sharing our final verdict. This report aims to help you identify the most suitable scraping service for your specific project needs

Criteria To Test These APIs

We are going to scrape a few domains like AmazoneBay, and Google. We will judge each scraper on the basis of these points.

  • Speed
  • Success Rate
  • Support
  • Scalability
  • Developer friendly

We are going to use this Python code to test all the APIs.

				
					import requests
import time
import random
import urllib.parse

# List of search terms
amazon_urls = ['https://www.amazon.de/dp/B0F13KXRG8','https://www.amazon.com.au/dp/B0D8V3N28Z','https://www.amazon.in/dp/B0FHB5V36G','https://www.amazon.com/dp/B0CDJ4LS6X','https://www.amazon.com.br/dp/B0FQHRR7L7/']

ebay_url=['https://www.ebay.it/usr/elzu51','https://www.ebay.com/sch/i.html?_nkw=watch','https://www.ebay.com/itm/324055713627','https://www.ebay.com.au/b/Smarthome/bn_21835561','https://www.ebay.com/p/25040975636']

serp_terms = ['burger','bat','beans','curd','meat']

# Replace with your actual API endpoint
# Make sure it includes {query} where the search term should be inserted
base_url = "https://app.example.com/scrape"


total_requests = 10
success_count = 0
total_time = 0
apiKey=your-api-key
for i in range(total_requests):
    try:
        search_term = random.choice(ebay_url)
        search_term = random.choice(serp_terms)

        

        params = {
    "api_key": apiKey,
    "results": 10,
    "query": search_term,
    "country": "us",
    "advance_search": "true",
    "domain": "google.com"
}

        # params={
        #     'api_key': apiKey,
        #     'search': search_term,
        #     'language': 'en'
        # }



        # url = base_url.format(query=search_term)

        start_time = time.time()
        response = requests.get(base_url,params=params)
        end_time = time.time()

        request_time = end_time - start_time
        total_time += request_time

        if response.status_code == 200:
            success_count += 1
        print(f"Request {i+1}: '{search_term}' took {request_time:.2f}s | Status: {response.status_code}")

    except Exception as e:
        print(f"Request {i+1} with '{search_term}' failed due to: {str(e)}")

# Final Stats
average_time = total_time / total_requests
success_rate = (success_count / total_requests) * 100

print(f"\n<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f50d.png" alt="🔍" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Total Requests: {total_requests}")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Successful: {success_count}")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/23f1.png" alt="⏱" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Average Time: {average_time:.2f} seconds")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f4ca.png" alt="📊" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Success Rate: {success_rate:.2f}%")
				
			

Let’s first test how Zenrows performs in this test across different platforms.

Zenrows

A platform built for developers to scrape public data at scale. They have been a popular option among the community. Let’s test how this API performs against all of the targets we choose to test.

Feature & Pricing of Zenrows

  • You get free credits worth $1 on signup.
  • The credit cost changes from website to website. But the starting pack will cost you around $70 per month which includes 250000 credits.
  • Documentation is clear and the API can be integrated very easily.
  • Customers can contact them via instant chat support or email.

Test Result with Amazon

Test Result with eBay

Test Results with Google Search

Summary of All Tests (Zenrows)

  • ZenRows achieved a 40% success rate on Amazon, with an average response time of 19.48 seconds.
  • While scraping eBay we got a success rate of 90% with average response time of 3.93 seconds.
  • Scraping Google with ZenRows resulted in a 90% success rate and an average response time of 18.81 seconds.

Scrapingbee

ScrapingBee is a web-scraping API service designed to simplify and streamline data extraction from modern websites.

Features & Pricing of Scrapingbee

  • They offer 1000 free credits on signup.
  • Their basic plan costs around $49 per month and includes 250,000 credits.
  • The documentation is clear, and the APIs can be seamlessly integrated into any development environment.
  • You can contact them via chat support or through email.

Test Results with Amazon

Test Results with eBay

Test Results with Google Search

Summary of All Tests (Scrapingbee)

  • Scrapingbee achieved a 100% success rate on Amazon, with an average response time of 5.82 seconds.
  • While scraping eBay, we got a success rate of 80% with an average response time of 3.85 seconds.
  • Scraping Google with Scrapingbee resulted in a 100% success rate and an average response time of 7.02 seconds.

Read More: How Scrapingdog is a Better Alternative To Scrapingbee

Scrapingdog: A Better Alternative to Zenrows & Scrapingbee

Scrapingdog is a web-scraping API platform that lets you extract data from websites without worrying about proxies, CAPTCHA, or browser automation.

scrapingdog homepage

Features & Pricing of Scrapingdog

  • Scrapingdog provides free 1000 credits on signup.
  • The entry-level plan costs around $40 per month and includes 200,000 credits.
  • The documentation is developer-friendly, making it easy to integrate the API into any project.
  • You can contact us via chat support or by email support.

Test Results with Amazon

 

Test Results with eBay

 

Test Results with Google Search

 

Summary of All Tests (Scrapingdog)

  • Scrapingdog achieved a 100% success rate on Amazon, with an average response time of 4.27 seconds.
  • While scraping eBay, we got a success rate of 100% with an average response time of 3.14 seconds.
  • Scraping Google with Scrapingdog resulted in a 100% success rate and an average response time of 3.49 seconds.

Success Rate Comparison (Zenrows vs Scrapingbee vs Scrapingdog)

Provider Amazon eBay Google
ZenRows 40% 90% 90%
ScrapingBee 100% 80% 100%
Scrapingdog 100% 100% 100%

When comparing success rates across all three APIs, ScrapingDog delivered flawless performance, achieving a 100% success rate on Amazon, eBay, and Google. 

ScrapingBee performed reliably overall, maintaining 100% on Amazon and Google but dropping slightly to 80% on eBay. 

ZenRows, on the other hand, struggled with Amazon, managing only 40% success, though it performed much better on eBay and Google with 90% success each.

Speed Comparison

Provider Amazon eBay Google
ZenRows 19.48 s 3.93 s 18.81 s
ScrapingBee 5.82 s 3.85 s 7.02 s
ScrapingDog 4.27 s 3.14 s 3.49 s

In terms of speed, ScrapingDog once again led the pack with the fastest average response times across all three platforms, staying under 4.5 seconds, even on Google, which is typically the most challenging site to scrape.

 

ScrapingBee demonstrated stable performance, averaging between 3.8 and 7 seconds, but lagged slightly behind on Google. 

ZenRows was considerably slower on Amazon and Google, taking nearly 19 seconds per request, though it performed well on eBay.

Conclusion

After testing all three web scraping APIs, ZenRowsScrapingBee, and ScrapingDog across AmazoneBay, and Google, here’s the takeaway:

  • ScrapingDog consistently came out on top, offering 100% success rates and the fastest response times across all platforms. It’s highly optimized for performance and reliability, making it the best choice for large-scale or production-grade scraping.
  • ScrapingBee delivered strong, stable results with solid success rates and good speed. It’s a balanced option if you prioritize simplicity and consistency.
  • ZenRows performed decently on eBay and Google but struggled significantly with Amazon, both in speed and success rate, suggesting its infrastructure isn’t yet fully tuned for heavy e-commerce scraping.

Additional Resources

]]>
https://www.scrapingdog.com/blog/zenrows-vs-scrapingbee-vs-scrapingdog/feed/ 0
10 Best Google SERP APIs in 2026 to Scale Data Extraction from Google Search https://www.scrapingdog.com/blog/best-serp-apis/ https://www.scrapingdog.com/blog/best-serp-apis/#comments Mon, 17 Nov 2025 00:09:28 +0000 https://scrapingdog.com/?p=12082

TL;DR

  • Benchmarks 10 SERP APIs on speed, price and scale.
  • Times: Scrapingdog 1.83 s; Serper 2.87 s; Bright Data 5.58 s; SearchAPI 2.96 s; ScraperAPI 33.6 s.
  • Verdict: Scrapingdog & Serper are fastest; ScraperAPI slowest.
  • Pricing: Scrapingdog is economical at scale (~$0.00029 / request); most offer free trials.

Search engines hold a massive amount of data, just to be specific in number there are around 8.5 billion searches per day, and Google alone caters it.

Scraping Google or any other search engine is worth considering if you need the data for SEO tools, lead generation, and price monitoring.

I’ve analyzed the best SERP APIs that deserve to be on this list. Each API has been tested on key factors like speed, scalability, and pricing. 

I’ve shared my results at the very end of this article.

Let’s get started!!

10 Best APIs for Scraping Google in 2026

We will be judging these APIs based on 5 attributes.

  • Scalability means how many pages you can scrape in a day.
  • Pricing of the API. What is the cost of one API call?
  • Speed means how fast an API can respond with results.
  • Developer-friendly refers to the ease with which a software engineer can use the service.
  • Stability refers to how much load a service can handle or for how long the service is in the market.
				
					import requests
import time
import random

# List of random words to use in the search query

search_terms_google = [
    "pizza", "burger", "sushi", "coffee", "tacos", "salad", "pasta", "steak",
    "sandwich", "noodles", "bbq", "dumplings", "shawarma", "falafel",
    "pancakes", "waffles", "curry", "soup", "kebab", "ramen"
  ];


# base_url = Your-API-URL

total_requests = 50
success_count = 0
total_time = 0

for i in range(total_requests):
    try:
        # Pick a random search term from the list
        search_term = random.choice(search_terms)
        url = base_url.format(query=search_term)

        start_time = time.time()  # Record the start time
        response = requests.get(url)
        end_time = time.time()  # Record the end time

        # Calculate the time taken for this request
        request_time = end_time - start_time
        total_time += request_time

        # Check if the request was successful (status code 200)
        if response.status_code == 200:
            success_count += 1
        print(f"Request {i+1} with search term '{search_term}' took {request_time:.2f} seconds, Status: {response.status_code}")

    except Exception as e:
        print(f"Request {i+1} with search term '{search_term}' failed due to {str(e)}")

# Calculate the average time taken per request
average_time = total_time / total_requests
success_rate = (success_count / total_requests) * 100

# Print the results
print(f"\nTotal Requests: {total_requests}")
print(f"Successful Requests: {success_count}")
print(f"Average Time per Request: {average_time:.2f} seconds")
print(f"Success Rate: {success_rate:.2f}%")
				
			

We will test the APIs with the above Python code.

Scrapingdog’s Google SERP API

Scrapingdog’s Google Search API provides raw and parsed data from Google search results.

Now, we might be biased for including our API on top (and yes, it’s what I get paid for — JK, I’m the CTO!). But honestly, all the APIs are tested, I have the results in the screenshots all through this article.

Scrapinggod google scraper API

Details

  • With this API you get more than a billion API requests every month which makes this API a healthy choice.
  • Per API call cost for scraping Google starts from $0.003 and goes below $0.00125 for higher volumes.
  • For testing the speed of the API we are going to test the API on POSTMAN.
test screen

It took around 1.83 seconds to complete the request.

  • Has documentation in multiple languages. From curl to Java, you will find a code snippet in almost every language.
  • Scrapingdog has been in the market for more than 5 years now and you can see how customers have reviewed so far Scrapingdog on Trustpilot. The API is stable.
  • You can even test the API for free, we provide 1000 free credits to spin it.

Here’s a quick video tutorial on how you can use Scrapingdog’s Google Search Scraper API.

Recently, we have introduced a new endpoint for scraping all major search engines via one call. We are calling it Universal Search API. If you are someone looking to get data from these engines, this API would fit in, wherein you get filtered results, so you don't have to omit repetitive results.

Further, using this API instead of calling each out would be a much more economical way.

Data For SEO

Data for SEO provides the data required for creating any SEO tool. They have APIs for backlinks, keywords, search results, etc.

Details

  • Documentation is too noisy, which makes integration of the API time-consuming.
  • The pricing is not clear. Their pricing changes based on the speed you want. But the high-speed pack will cost $0.002 per search. The minimum investment is $2k per month.
  • They have been into scraping for so long and hence they have optimized it for scalability and stability.
  • Cannot comment on the speed as we were unable to test the API because of the very confusing documentation.

Apify

Apify is a web scraping and automation platform that provides tools and infrastructure to simplify data extraction, web automation, and data processing tasks. It allows developers to easily build and run web scrapers, crawlers, and other automation workflows without having to worry about infrastructure management.

Apify

Details

  • The documentation is pretty clear and makes integration simple.
  • The average response time was around 8.2 seconds.
apify results
  • Pricing starts from $0.003 per search and goes below $0.0019 per search in their Business packs.
  • They have been in this industry for a very long time, which indicates they are reliable and scalable.

SearchAPI

SearchAPI is another popular option among developers to scrape Google search results at scale.

This product has been there for a while now and its worth mentioning it in the list for the same reason it performed well in our test.  

Details

  • When you sign up, you get 100 free credits to test the API.
  • Documentation is clear, and the API can be easily integrated into any environment.
  • Pricing per page starts from $0.004 and drops below $0.002.

Testing

 

  • We got 100% success rate with an average response time of 2.96 seconds.

Bright Data

Bright Data as we all know is a huge company focused on data collection. They provide proxies, data scrapers, etc.

Brightdata Google Search API

Details

  • Their documentation is quite clear and testing is super simple.
  • We tested their API, and the average response time was close to 5.58 seconds, which is good.
  • Per API call cost starts from $0.005. The success rate is pretty great, which makes this API scalable and stable. The service is top-notch and again, any product you use is good.
  • The only downside with Brightdata is that it’s a bit more expensive compared to other providers.

Hasdata

Hasdata is another great option if you are looking for a search engine API. Their dashboard makes your onboarding training pretty simple.

Hasdata google search api

Details

  • Documentation is pretty simple and easy to understand.
  • Per API call response time is around 3.80 seconds.
  • In my testing, I observed that APIs slow down if you hit the same API multiple times, which the API would not perform well when chosen for scalability.
  • Per API call price starts from $0.003 and goes around $0.0004 with higher volumes.

Serper

Serper provides a dedicated solution for scraping all the Google products.

Serper google search api

Details

  • The documentation is clear, and the API can be integrated with ease.
  • It’s a new service, and the people behind it don’t seem to have much public presence.
  • Pricing per scrape starts from $0.001 and drops below $0.00075 with high volume.
  • If you need more than 10 results per query in its SERP API, then you will be charged 2 credits, so the pricing automatically doubles.
  • You can only contact them through email.

Testing

Serper testing
  • So, the API took around 2.87 seconds to scrape a single Google page.

SerpAPI

SerpAPI is the fastest Google search scraper API with the highest variety of Google-related APIs.

SerpAPI Google Search API

Details

  • The documentation is very clear and concise. You can quickly start scraping and Google service within minutes.
  • The average response time was around 5.49 seconds. API is fast and reliable. This API can be used for any commercial purpose which requires a high volume of scraping.
Serp api testing
  • Pricing starts at $0.01 per request and it goes down to $0.0083!
  • SerpAPI has been in this industry since 2016 and they have immense experience in this market. If you have a high-volume project then you can consider them.

Decodo(Smartproxy)

Decodo is another Google search API provider in this list.

Decodo google search api

Details

  • Documentation is simple. Integration with them is super simple.
  • They have a great proxy infrastructure, which ultimately assures a seamless data pipeline.
  • Pricing for Google scraping starts from $0.00125 and drops below $0.00095 with high volume.
  • You can contact them via chat or email.
  • It was not possible to test their API in our environment, so we tested it on the dashboard itself. So, their API took around 4 to 5 seconds to scrape a single Google Page.

ScraperAPI

ScraperAPI was initially launched as a free web scraping API but now it also offers multiple dedicated APIs around Google and its other services like SERP, News, Jobs, Shopping, etc.

ScraperAPI

Details

  • Documentation is very clear and has code snippets for all major languages like Java, NodeJS, etc. This makes testing this API super easy.
  • The average response time was around 33.6 seconds, and it might go up for high concurrency.

  • Pricing starts from 0.00196$ per search and goes up to $0.0024 for bigger packs.
  • They have been in the market for a long time but the SERP API doesn’t meet the expectations.

Overall Results

Provider Response Time (s) Pricing ($ per request)
Scrapingdog 1.83 0.001 → 0.00029
Serper 2.87 0.001 → 0.00075
SearchAPI 2.96 0.004 → 0.002
Hasdata 3.8 0.00245 → 0.00083
Decodo 4.5 0.00125 → 0.00095
Brightdata 5.58 0.0011
SerpAPI 5.49 0.015 → 0.0075
Apify 8.0 0.003 → 0.0019
ScraperAPI 33.6 0.00196 → 0.0024
Dataforseo N/A 0.002

At first glance, many of the APIs we’ve discussed may appear quite similar. But once you dig deeper and start testing, you’ll notice that only a few (specifically two or three) are truly stable and suitable for production use.

serp api response time comparison bar graph

🚀 Conclusion: Serper, Scrapingdog & SearchAPI are the fastest, while ScraperAPI is the slowest among the tested services

The report above is based on a thorough analysis of each API, focusing on factors like speed, scalability, and pricing.

Almost all the APIs mentioned here offer free trials, so you can test them yourself firsthand and see which one fits your needs best.

Price Comparison

Scrapingdog offers the lowest effective pricing, dropping to $0.00029 per request at scale, far cheaper than competitors like SerpAPI ($0.015) or Apify ($0.003). Most other providers range between $0.0008 and $0.002 per request.

Why You Should Choose A SERP API Instead of Building Your Own Scraper

While you could build your scraper to extract Google search results, maintaining them over time can be quite challenging.

Search engines, including Google, often block scrapers after approximately 100 requests, making it difficult to scale without hitting roadblocks.

You’d need to constantly update your scraper to bypass these restrictions, which can be time-consuming and inefficient.

For production purposes, using an API is a much better option.

Here’s why:

  1. Anonymity: With these APIs, you stay anonymous. Every request is made using a different IP address, so your IP is always hidden, preventing any blocks or restrictions from Google.

  2. Cost-Effective: These APIs are far more affordable than Google’s official API. You can scrape search results at a fraction of the cost.

  3. Parsed Data Options: Whether you need parsed JSON data for easy integration or raw HTML data for flexibility, these APIs offer both.

  4. Customization: Many API vendors offer customization options to tailor the API exactly to your needs, making it easier to extract the exact data you want.

  5. Reliability for Production: Unlike self-built scrapers that might get blocked or require constant maintenance, these APIs are designed to be stable, scalable, and perfect for production use.

  6. 24X7 Support: Round-the-clock support to help you solve any issues or queries, ensuring smooth operations.

What Data Other Then Google Search You Can Scrape From Google Products?

Search engine scraping is one of the most common ways to collect valuable data.

But search results aren’t the only data you can access. Other valuable sources can be scraped for more data. To name a few:

Scraping Google AI mode to keep track of your brand visibility, if SEO is one of the channels through which your brand acquires customers.

Scraping Google Maps opens up valuable opportunities to gather business details, reviews, and location data. This information is useful for local SEO, lead generation, and market analysis.

On the other hand, you can scrape Google News to do content analysis or monitor news coverage.

You can also collect data from other Google products, such as Google Scholar and Google Images.

I’ll continue to add more details and use cases for scraping these products as I write articles on them using Python. 

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/best-serp-apis/feed/ 1
3 Best Web Scraping APIs to Train Your LLMs https://www.scrapingdog.com/blog/best-web-scraping-apis-to-train-your-llms/ https://www.scrapingdog.com/blog/best-web-scraping-apis-to-train-your-llms/#respond Tue, 11 Nov 2025 12:35:34 +0000 https://www.scrapingdog.com/?p=31273

If you’re training large language models (LLMs) or fine-tuning retrieval-augmented generation (RAG) systems, you need one thing above all: data at scale.

Clean, structured, and diverse data is what separates an average model from a competent one.

Websites today utilize dynamic content, JavaScript rendering, and bot protection layers that render traditional scraping ineffective.

In this guide, we will explore some of the best APIs that can be used to extract data and provide the output data in Markdown format.

Why Markdown Format Works Best for LLMs

For LLMs to train, not all data formats are equal. Markdown is lightweight like plain text yet structured like HTML, which makes it a sweet spot format.

This structure facilitates models’ understanding of context, hierarchy, and semantics. For example, distinguishing between a title, a subheading, or a list of steps. That is exactly why APIs that output Markdown are becoming the preferred choice for creating LLM-ready datasets.

Let’s now jump into the APIs that can extract clean, structured content ready for use in LLM training pipelines.

Best Web Scraping APIs for Training LLMs

Scrapingdog

Scrapingdog is a comprehensive web scraping API designed to handle large-volume, JavaScript-heavy pages with ease. It supports real browser rendering, automatic CAPTCHA solving, and IP rotation which all are crucial for building large datasets reliably.

LLm ready data

With our general scraper you you can get the output in Markdown format, making it immediately usable for model ingestion. Developers can scrape articles, documentation, or entire websites while preserving structure and hierarchy without HTML clutter.

The API can be integrated into your system easily & you can scale to millions of requests, and covers all essential parameters like geo-targeting, headers, and cookies. Be it domain-specific data or general web content, Scrapingdog helps ensure you get clean, structured, and LLM-ready data.

Scrapegraphai

Scrapegraph AI is a relatively new player in the web scraping space, and it now offers Markdown output through a feature called Markdownify. This service transforms webpages into well-formatted Markdown by extracting only the relevant text and structural elements like headings, lists, and links.

While testing it, I found the API to be stable, responsive, and production-ready. It handles general-purpose content extraction well and delivers results in a predictable format.

Markdown is returned by default when using the Markdownify route, but developers also have the flexibility to switch between HTML and JSON formats by adjusting a simple parameter, useful if you want to run multiple post-processing pipelines from the same API.

From a cost-to-value perspective, it is an economical option. The Markdownify endpoint is especially helpful for quickly converting large volumes of web content into training-friendly input without needing to clean raw HTML or parse messy layouts.

All in all, it’s a lightweight but practical solution that fits neatly into any pipeline.

Firecrawl

Firecrawl has positioned itself as a specialized tool for extracting clean, LLM-ready data from websites. It supports structured output in Markdown format and allows developers to configure the format via a simple parameter during the request, making it quick to plug into any AI training pipeline.

In testing, the API showed strong consistency. It successfully scraped and converted content-heavy pages into well-structured Markdown without missing key elements. The output was clean, readable, and required minimal post-processing. Firecrawl’s documentation is developer-friendly, and the setup flow is smooth, especially for teams looking to move fast.

One point to note: while Firecrawl delivers reliable results, it sits slightly on the higher end in terms of pricing compared to other tools. That said, for teams prioritizing data quality and clarity in their LLM pipelines, the tradeoff may be worth it.

Conclusion

Each of the mentioned API has pros & cons of its own. The good thing is that you can test each of them out & see which one would fit in your budget & use case the best.

In case you need any help to integrate Scrapingdog’s APIs into your workflow, do reach out to us on Chat or email us at info@scrapingdog.com.

FAQs

Raw HTML includes scripts, navigation, ads, and other noise that can dilute training data quality. Markdown or cleaned formats are easier for models to learn from.

Long-form articles, technical documentation, FAQs, product pages, and tutorials — anything with structured, explanatory content.

Look for structural consistency, low noise, semantic accuracy (e.g., heading levels make sense), and absence of boilerplate like nav bars or footers.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/best-web-scraping-apis-to-train-your-llms/feed/ 0
5 Best Indeed Scrapers To Test Out in 2025 https://www.scrapingdog.com/blog/best-indeed-scrapers/ https://www.scrapingdog.com/blog/best-indeed-scrapers/#respond Thu, 06 Nov 2025 12:05:06 +0000 https://www.scrapingdog.com/?p=31132

TL;DR

  • Compares 5 Indeed scrapers — ScraperAPI, Scrapingdog, ZenRows, Bright Data, ScrapingBee using each product’s general scraper.
  • Criteria: speed, success rate, support, scalability, dev-friendliness; simple test harness shown.
  • Scrapingdog is featured with a free 1k-credit trial to test reliability on Indeed.
  • Bottom line: choose based on your success-vs-cost needs at target scale.

If you’re planning to scrape job-listing sites like Indeed (or similar platforms) at scale, choosing the right web scraping API can make a big difference. You’ll typically need:

  • Reliable JavaScript rendering (many job portals use dynamic loading)
  • Anti-bot & CAPTCHA handling
  • Proxy rotation / geo-flexibility
  • Predictable costs and data structure output

In this article we compare five major scraping APIs: ScraperAPI, Scrapingdog, ZenRows, Brightdata, and ScrapingBee. The goal is to help you decide which is best for scraping Indeed.com with high reliability and minimal fuss.

Criteria

We are going to compare 3 APIs from each product and then compare them on the basis of:

  • Speed
  • Success rate
  • Support
  • Scalability
  • Developer friendly

We will use general web scraper of each product to scrape Indeed.

We are going to use this nodejs code to test different products.

				
					import requests
import time
import random
import urllib.parse

# List of search terms
indeed_urls = ['https://www.indeed.com/jobs?q=Software+Engineer&l=New%20York',"https://www.indeed.com/jobs?q=python&l=New+York%2C+NY","https://il.indeed.com/jobs?q=&l=israel&fromage=1&vjk=3e2c3c5a7577fa90","https://www.indeed.com/jobs?q=python&l=New+York%2C+NY","https://www.indeed.com/jobs?q=Assistant+Restaurant+Manager&start=0&l=Chicago%2C+IL"]



# Replace with your actual API endpoint
# Make sure it includes {query} where the search term should be inserted
base_url = "https://api.example.com/"

total_requests = 10
success_count = 0
total_time = 0

for i in range(total_requests):
    try:
        search_term = random.choice(indeed_urls)

        
  
        params={ 'api_key': 'e021d6abdf4575687890e10deb3189c8', 
         'url': search_term}



        # url = base_url.format(query=search_term)

        start_time = time.time()
        response = requests.get(base_url,params=params)
        end_time = time.time()

        request_time = end_time - start_time
        total_time += request_time

        if response.status_code == 200:
            success_count += 1
        print(f"Request {i+1}: '{search_term}' took {request_time:.2f}s | Status: {response.status_code}")

    except Exception as e:
        print(f"Request {i+1} with '{search_term}' failed due to: {str(e)}")

# Final Stats
average_time = total_time / total_requests
success_rate = (success_count / total_requests) * 100

print(f"\n<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f50d.png" alt="🔍" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Total Requests: {total_requests}")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Successful: {success_count}")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/23f1.png" alt="⏱" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Average Time: {average_time:.2f} seconds")
print(f"<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/1f4ca.png" alt="📊" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Success Rate: {success_rate:.2f}%")
				
			

Scrapingdog

Scrapingdog provides powerful web scrapers to scrape websites with CAPTCHA and bot protection.

  • You get 1000 free credits when you signup for the free pack.
  • To scrape Indeed, you’ll need to enable Stealth Mode. Pricing begins at roughly $0.002 per request and can go as low as $0.000583 on larger plans.
  • Documentation is pretty clear and API can be integrated easily with any system.
  • Support is available 24*7through chat and email.

Testing Indeed

Summary

  • Scrapingdog scraped indeed with 100% success rate with an average response time of 14.47 seconds.

Scraperapi​

Scraperapi provides web scraping api to scrape any website. The scraper will respond with the html data of the target website.

  • On new sign up you get free 5000 credits to test the API.
  • Each successful response will cost you around $0.0049 but the pricing drops to $0.00095 on their biggest pack.
  • The documentation is very clear and you can easily integrate their APIs in your system.
  • Customer support is only available through email. No instant chat support is available.

Testing Indeed

Summary

  • Scraperapi scraped indeed at an average response time of 50.43 seconds with 50% success rate.

Zenrows

Zenrows is another web scraping API in the market which offers general scraper for scraping websites.

  • On signup you get 1000 credits which remains active until the next 14 days.
  • Every request to indeed.com will cost you $0.025 and it goes down with bigger packs.
  • Dashboard is a little confusing to operate but documentation is clear and the API can be integrated easily.
  • Instant customer support through chat and email is available.

Testing Indeed

Summary

  • We got 100% success rate with Zenrows with 22.23 seconds as average response time.

Brightdata

This is one of the pioneer company in the scraping industry. They provide powerful scrapers and proxies to scrape websites.

Brightdata dashboard

  • You have to go through their KYC process in order to test the APIs and proxies.
  • You have to use their web unblocker to scrape indeed at scale.
  • Pricing starts from $0.0015 and drops to $0.001.
  • You can easily integrate their proxies in your system.
  • Support is available 24*7 and literally waiting for your query.

Testing Indeed

Summary

  • We got 100% success rate with Brightdata with an average response time of 6.36 seconds.

Read More: 5 Economical Brightdata Alternatives You Can Try

Scrapingbee

Scrapingbee also provides a general scraper to scrape websites at scale. Using their extract rule feature you can extract parsed JSON data from raw html data.

  • On signup you get free 1000 credits to test the API.
  • You’ll need to use their Stealth Proxy mode to scrape Indeed. The pricing starts at $0.0147 per request and drops to $0.00562 on their largest available plan.
  • APIs can be easily integrated in any working environment.
  • Support is available 24*7 through chat and email.

Testing Indeed

Summary

  • We got 98% success rate with an average response time of 15.88 seconds.

Price Comparison

Provider Starting Price / Request Lowest Price (High Volume) Approx. Cost per 1K Requests
Scrapingdog $0.002 $0.000583 ~$0.58 – $2.00
ScraperAPI $0.0049 $0.00095 ~$0.95 – $4.90
ZenRows $0.025 $0.022 ~$22 – $25
Bright Data $0.0015 $0.001 ~$1.00 – $1.50
ScrapingBee $0.0147 $0.00562 ~$5.62 – $14.70

When it comes to pricing, Scrapingdog clearly leads the pack, offering one of the lowest per-request costs in the industry, especially at scale.
While Bright Data remains competitive on volume, most other providers like ScraperAPIZenRows, and ScrapingBee are considerably more expensive for large-scale scraping operations.

If your use case involves frequent or high-volume scraping (like tracking Indeed job listings), Scrapingdog delivers the best balance between cost efficiency and scalability.

Speed Comparison

Provider Success Rate Average Response Time (seconds)
Scrapingdog 100% 14.47 s
ScraperAPI 50% 50.43 s
ZenRows 100% 22.23 s
Bright Data 100% 6.36 s
ScrapingBee 98% 15.88 s

When it comes to speedBright Data tops the chart with an impressive 6.36-second average response time, followed by Scrapingdog at 14.47 seconds, maintaining strong performance even under consistent 100% success.

In terms of reliabilityScrapingdogZenRows, and Bright Data all achieved perfect 100% success rates, while ScrapingBee performed well at 98%, and ScraperAPI lagged behind with only 50% reliability.

Final Verdict

All five providers delivered usable results, but their performance varied across speed and consistency. Bright Data was the fastest in response time, while ScrapingdogZenRows, and Bright Data maintained perfect success rates. ScrapingBee also performed reliably with only a slight dip in success, and ScraperAPI showed room for improvement in stability.

Ultimately, the best choice depends on your specific needs. Whether that’s speedscalability, or cost efficiency. Each provider has its strengths, and the right fit comes down to balancing performance with your project’s priorities.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/best-indeed-scrapers/feed/ 0
How Geogen Uses Scrapingdog’s API to Power Their AI Tracking Tool https://www.scrapingdog.com/blog/how-geogen-uses-scrapingdogs-api-to-power-their-ai-tracking-tool/ https://www.scrapingdog.com/blog/how-geogen-uses-scrapingdogs-api-to-power-their-ai-tracking-tool/#respond Wed, 05 Nov 2025 12:18:48 +0000 https://www.scrapingdog.com/?p=31103

As Google rolls out AI Overviews and other generative features in Search, visibility is no longer limited to traditional blue links.

Brands now need to understand how their pages appear within AI-generated summaries.

To track this, businesses need a reliable mechanism to monitor their presence in both Google’s organic results and AI-driven overviews.

Some call it advanced SEO, while others name it GEO (Generative Engine Optimization). Geogen helps companies measure and improve that visibility. Their platform tracks how often and where brands are mentioned within Google’s AI Overviews and compares it with traditional organic rankings.

To power their product, they utilize APIs from Scrapingdog, specifically the Google Search & AI Overviews API.

We recently asked Patrick Dewald, the CEO of Geogen, about their challenges, how they chose Scrapingdog.

Here’s a testimonial that they were happy to give us in the video below: –

Challenges That the Geogen Team Had

Unlike traditional search tracking, analyzing AI results requires massive, clean, and contextual data.
Geogen team needed:

  • Real-time Google Search data to compare conventional SEO visibility.
  • Access to AI Overview responses to understand how AI systems interpret and rank brands.
  • A fast, scalable, and accurate data provider to keep their insights reliable and current.

These requirements meant handling high concurrency, low latency, and large data volumes without compromising quality.

Why Geogen Chose Scrapingdog

After evaluating multiple vendors, Geogen found that Scrapingdog catered to all of their challenges:

  • ⚡ Fast Response Times: Ideal for large-scale crawls and AI data extraction.
  • 💪 High Success Rate: Even with complex queries and multiple geolocations.
  • 🔄 Concurrent Processing: Allowed Geogen to collect massive datasets in minutes.
  • 🧩 Intuitive API Playground: Enabled quick testing and seamless integration into existing pipelines.

Patrick Dewalt, CEO of Geogen.io, shared:

Our requirements were stringent. Scrapingdog exceeded our expectations in success rates, concurrency, and speed. Their API playground made integration effortless.

Scrapingdog APIs Behind Geogen’s Insights

Geogen uses two endpoints from Scrapingdog at the core of its data operations:

A. Google Search API

This API helps Geogen gather real-time SERP data from multiple regions, devices, and query types.
It’s used to:

  • Monitor traditional keyword rankings.
  • Compare performance with AI search results.
  • Track competitors’ presence across geographies.

B. AI Overview API

The AI Overview API gives Geogen access to Google’s new AI-generated results, offering a first look at how AI summaries and recommendations mention brands.

By combining both APIs, Geogen can bridge the gap between old SEO and new AI visibility metrics, providing clients with deeper and more predictive insights.

Results Achieved by the Geogen Team

With Scrapingdog powering its backend, Geogen has achieved:

  • 99% request success rates across thousands of queries.
  • 3× faster data retrieval than previous providers.
  • Greater accuracy in mapping AI-based brand mentions and context.

These improvements allow Geogen to deliver actionable GEO analytics furtherenabling their users to optimize for both traditional search engines and AI overviews simultaneously.

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/how-geogen-uses-scrapingdogs-api-to-power-their-ai-tracking-tool/feed/ 0
Automation to Convert YouTube Shorts to LinkedIn Post Using n8n & Scrapingdog https://www.scrapingdog.com/no-code-tutorials/automation-to-convert-youtube-shorts-to-linkedin-post-using-n8n-scrapingdog/ https://www.scrapingdog.com/no-code-tutorials/automation-to-convert-youtube-shorts-to-linkedin-post-using-n8n-scrapingdog/#respond Wed, 05 Nov 2025 09:17:47 +0000 https://www.scrapingdog.com/?p=31069

If you are a content creator and YouTube is where you produce your content, then this automation can help you a lot.

Today, with a lot of buzz around & the attention span going down & down every day, you must get the eyeballs to your content from wherever you can.

If you have already built YouTube shorts, why not share them on other platforms as well?

This workflow takes the link of your YouTube short and, with the help of AI, converts it into a LinkedIn post without you lifting a finger.

Let’s start building this automation. In the very end, I will also give you the blueprint for this so that you can use it as is in your workflows.

Tools used to build this automation

  1. Scrapingdog YouTube Transcript API (To extract the transcript of any video or shorts)
  2. n8n (To build our workflow)
  3. Google Sheets ( To Maintain the database)

Let’s start building it from scratch!

Building our Google Sheets

So, this database will hold the record for all our LinkedIn posts. It will have the link to the YouTube short, the LinkedIn post link that will get posted, and the Date when it is posted.

This is how our Google Sheet looks like.

It has a link to the YouTube shorts, a column for automation to update the LinkedIn URL once it is posted. And one column for the date, to know which post went live.

Now let’s head back to our n8n canvas.

Connecting nodes in n8n

So our workflow starts with a ‘Scheduled Trigger’. And the next node is ‘Google Sheets’, wherein we will take the YouTube Shorts link.

Here is the configuration of this node.

Now let’s test this node and see what output we get.

And as you can see, we can get all the YouTube Shorts links to the output of the sheets.

Now we have a limit node, which would help us to take one URL at a time when this workflow runs, in case there is more than one URL in the spreadsheet.

Further, we have a code node that converts the YouTube URL to the video ID.

Let me first explain to you what a video ID is, in a typical url like this one — https://www.youtube.com/shorts/dI19zJiH5ok is ‘dI19zJiH5ok’. Because the Scrapingdog’s YouTube Transcript API takes video ID as one of the input parameters.

You can read more about that in the Scrapingdog documentation here — https://docs.scrapingdog.com/youtube-scraper-api/youtube-transcripts-api

Let’s get back to our Code node. Here is the configuration of this node:

 

Here is the code that is used in the node:-

				
					// Get input data from previous node
const items = $input.all();

// Loop through items and extract video ID
return items.map(item => {
  const url = item.json["Video URL"];
  
  // Regex pattern to match common YouTube video URL formats
  const match = url.match(/(?:youtu\.be\/|youtube\.com\/(?:shorts\/|watch\?v=|embed\/|v\/))([\w-]{11})/);

  // Extract the video ID if matched
  const videoId = match ? match[1] : null;

  return {
    json: {
      ...item.json,
      video_id: videoId
    }
  };
});
				
			

And when you run this node, you will get the one data point video ID in the output.

And now we will use the Scrapingdog’s YouTube Transcript API using the HTTP node.

 

You can see the configuration; the necessary parameters are api_key and v. You will get the api_key in your Scrapingdog dashboard. You can copy and paste it here.

Let’s test this node & see how we get the output.

We have got the transcript in chunks in the output. Now, since this output is in the form of an array, we will split it out and then aggregate to make it a single paragraph.

Let’s see the output after the aggregator.

Now we will feed this data to our AI node, wherein we will have some system & user prompts.

The system prompt that I am using is: –

				
					You are a sharp LinkedIn copywriter. Style: simple, confident, no fluff, short sentences, varied lengths, UK English. Avoid clichés and corporate buzzwords. HARD RULES: Plain text only. No Markdown or formatting symbols of any kind (no *, **, _, # headers, > quotes, `code`). No bolding. Use normal sentences or a simple numbered list. 1–2 emojis max (optional). ≤ 2,800 characters.

				
			

& the user prompt that I am using is

				
					YouTube Short transcript (cleaned): {{ $json.text }}

Context:
- Source video URL: {{ $('Get row(s) in sheet').item.json["Video URL"] }}
- Goal: turn this into a high-signal LinkedIn post.
- If there is a single standout insight, highlight it early.
- If a stat or quote appears, include it (once) with quotes.

Output:
- Hook (1–2 lines)
- Body (3–6 short paragraphs or a 4–7 point list)
- Total length ≤ 2,800 chars
- End the post with a blank line, then:
- Watch: {{ $('Get row(s) in sheet').item.json["Video URL"] }}
				
			

In the user prompt, I am also giving it the aggregated text and the video link at the very last, mentioning that the link should come in the LinkedIn post.

Now let’s test this node and see what output we get.

We will post this on LinkedIn; you can also post this on any other platform. But for the sake of this tutorial, we will keep it to LinkedIn.

Now, we will post this text on LinkedIn & the live URL gets updated on our Google Sheets.

And now the final step would be to update our database (Google Sheets) with the URL of LinkedIn.

Here’s the configuration of our Google Sheets node.

Finally, here is the blueprint for this automation that you can use as is in your n8n canvas.

The only thing you would need is access to Scrapingdog’s API Key, as well as n8n.

Here is a video tutorial of this workflow.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/no-code-tutorials/automation-to-convert-youtube-shorts-to-linkedin-post-using-n8n-scrapingdog/feed/ 0
Building Automation To Get The Best Tools in Any Category from YouTube https://www.scrapingdog.com/no-code-tutorials/building-automation-to-get-the-best-tools-in-any-category-from-youtube/ https://www.scrapingdog.com/no-code-tutorials/building-automation-to-get-the-best-tools-in-any-category-from-youtube/#respond Tue, 28 Oct 2025 06:55:16 +0000 https://www.scrapingdog.com/?p=30953

As a busy founder, you might have a lot of tasks in your pipeline. You might have heard that ‘Time is Money’ & saving it for important tasks is something that would be on your priority list.

With this automation, you can get the best of the best tools from YouTube videos from the top 5 videos.

This way, every time you search for keywords like ‘Best CRM Tools’, the automation would run, scan the top 5 videos from YouTube, and give you the best 10 tools that are in there. And thus save you a lot of time researching manually for each tool.

In the end, I will give you a blueprint for this automation so that you can download & use this as is in your n8n.

Let’s get started!

Tools We Will Be Using To Build This Workflow

  1. Scrapingdog’s YouTube Search & Transcript API
  2. n8n

The logic on which our automation works is simple, we use the YouTube Search API to get the top videos, the top 5 videos’ transcript is fetched and then aggregated. This is then sent to AI Model with prompts to get the best tools with priority of mentions & link to each tool.

Building Workflow

This workflow starts with a Chat Node, wherein we will type in our category as I told earlier (best CRM tools).

The message is taken as a keyword & sent to our next node, which is an HTTP request. Here we are taking the YouTube Search API from Scrapingdog.

To understand more about this API, you can refer to the documentation here.

Now, all the videos are in the array, so we will split them out using the Split node to get data points on each video.

Let’s test this node as well.

There you go, each video’s data points are now separated.

We only want to analyze the top 5 videos, and therefore, we will now use the ‘Limit’ node to select only the top 5 videos.

Testing this node, as you can see, we have got the results for the limit we set. Here, you can set any limit, since 5 is a good number. Therefore, for this tutorial, we have kept it to this number.

Now we have to scrape the transcript of each video. For this, on one route, we will use the YouTube Transcript API, which will get us the captions of each video. Furthermore, we will loop over them to get transcript of every video.

After the loop over items, we have a JavaScript code set that converts the link of a YouTube video into only the video ID (v), since this is one of the parameters used by the API.

To read more about this API, you can refer to the documentation here.

Here is that JavaScript code:

				
					// Input: item.json.link (YouTube URL)
// Output: { "videoId": "2gTzid5Jl-w" }

function getYouTubeId(url) {
  if (!url) return null;

  try {
    // Try URL parser first
    const u = new URL(url.trim());
    const v = u.searchParams.get('v');
    if (v) return v;

    const host = u.hostname.replace(/^www\./, '');
    const parts = u.pathname.split('/').filter(Boolean);

    if (host === 'youtu.be' && parts[0]) return parts[0];                                // youtu.be/ID
    if (host.endsWith('youtube.com') && parts[0] === 'embed' && parts[1]) return parts[1]; // /embed/ID
    if (host.endsWith('youtube.com') && parts[0] === 'shorts' && parts[1]) return parts[1]; // /shorts/ID
  } catch { /* fall back to regex */ }

  // Fallback: regex for v=... anywhere in the string
  const m = String(url).match(/[?&]v=([^&#]+)/);
  if (m) return m[1];

  return null;
}

const url = $json.link;                // change if your field path differs
const videoId = getYouTubeId(url);

// Return ONLY the ID
return { json: { videoId } };
				
			

It returns a video ID, that we will use in our next HTTP request, where we will use YouTube Transcript API.

When we test this module, we get the transcript of the video.

We will now aggregate this transcript and then attach the loop end.

This sums up as a summary of one video; the workflow will do it for all five videos.

Once we get a summary of all five transcript, we aggregate them all in route -2 to further process in AI.

As you can see, the first node is Aggregate; the configuration for the same is here.

And with this, the output you will get is: –

output

It’s time to feed this into our AI to get the best out of the best videos.

I am using the ‘Basic LLM Chain’ node here, and for the model, I am using the open-router with gpt-4o mini.

The system prompt that I have used is: –

				
					You are a precise synthesis assistant. From an aggregated YouTube transcripts 
on one topic from 5 videos and for each there is a transcript but aggreagted 
in the input given to you, identify distinct tools/products/services that are 
mentioned. Compute mentions as the number of distinct summaries that 
referenced the tool (not raw word frequency). Deduplicate to canonical names. 
Select exactly 10 tools: rank by highest mentions; for ties sort 
alphabetically; if fewer than 10 multi-mention tools exist, fill remaining 
slots with single-mention tools in transcript order. For each tool, write one 
factual takeaway (≤18 words). Add the official homepage URL only if you are 
highly confident; otherwise set "url": null. Return STRICT JSON only that 
matches the schema shown by the user. No prose or markdown.
				
			

The user prompt is simple, wherein we are feeding the aggregated text from our previous module.

This is the output you get in JSON: –

Now we will need to send this to our email, but before that a code node takes care of the formatting and delivers the output in HTML.

The JavaScript code used is: –

				
					// n8n Code node (JavaScript)
// Input shape (from previous node): items[0].json.output = { topic, tools: [...] }

const data = items[0].json.output || items[0].json;

// tiny helpers
const esc = (s) => String(s ?? "").replace(/[&<>"']/g, m => ({'&':'&amp;','<':'&lt;','>':'&gt;','"':'&quot;',"'":'&#39;'}[m]));
const fmtDate = new Date().toLocaleDateString(undefined, { year:'numeric', month:'short', day:'numeric' });

// Build table rows
const rows = (data.tools || []).map((t, i) => {
  const name = esc(t.name);
  const url  = t.url ? esc(t.url) : null;
  const takeaway = esc(t.takeaway || "");
  const mentions = Number(t.mentions ?? 0);

  const nameCell = url
    ? `<a href="${url}" style="color:#0b69c7;text-decoration:none;" target="_blank" rel="noopener noreferrer">${name}</a>`
    : `<span>${name}</span>`;

  return `
  <tr>
    <td style="padding:12px 14px;border-bottom:1px solid #eef2f7;color:#111827;font-weight:600;">${i+1}</td>
    <td style="padding:12px 14px;border-bottom:1px solid #eef2f7;color:#111827;">${nameCell}</td>
    <td style="padding:12px 14px;border-bottom:1px solid #eef2f7;">
      <span style="display:inline-block;background:#eef2ff;color:#1e40af;font-weight:600;border-radius:999px;padding:2px 10px;">${mentions}</span>
    </td>
    <td style="padding:12px 14px;border-bottom:1px solid #eef2f7;color:#374151;">${takeaway}</td>
  </tr>`;
}).join("");

// HTML email (inline styles for best compatibility)
const html = `
<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="x-apple-disable-message-reformatting">
  <meta name="format-detection" content="telephone=no, date=no, address=no, email=no">
  <title>${esc(data.topic || "AI Tools Summary")}</title>
</head>
<body style="margin:0;padding:0;background:#f6f8fb;">
  <table role="presentation" width="100%" cellspacing="0" cellpadding="0" style="background:#f6f8fb;">
    <tr>
      <td align="center" style="padding:28px 16px;">
        <table role="presentation" width="720" cellspacing="0" cellpadding="0" style="max-width:720px;background:#ffffff;border-radius:12px;overflow:hidden;box-shadow:0 2px 8px rgba(16,24,40,.06);">
          <tr>
            <td style="padding:24px 24px 12px 24px;">
              <div style="font:700 20px/1.3 -apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,Inter,'Helvetica Neue',Arial,sans-serif;color:#111827;">
                ${esc(data.topic || "AI Tools Summary")}
              </div>
              <div style="margin-top:6px;font:400 13px/1.4 -apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,Inter,'Helvetica Neue',Arial,sans-serif;color:#6b7280;">
                Compiled on ${fmtDate}
              </div>
            </td>
          </tr>

          <tr>
            <td style="padding:0 24px 8px 24px;">
              <table width="100%" cellspacing="0" cellpadding="0" style="border-collapse:separate;border-spacing:0;width:100%;font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,Inter,'Helvetica Neue',Arial,sans-serif;">
                <thead>
                  <tr>
                    <th align="left" style="padding:12px 14px;background:#f9fafb;border-bottom:1px solid #eef2f7;color:#4b5563;font-weight:600;font-size:12px;text-transform:uppercase;letter-spacing:.03em;">#</th>
                    <th align="left" style="padding:12px 14px;background:#f9fafb;border-bottom:1px solid #eef2f7;color:#4b5563;font-weight:600;font-size:12px;text-transform:uppercase;letter-spacing:.03em;">Tool</th>
                    <th align="left" style="padding:12px 14px;background:#f9fafb;border-bottom:1px solid #eef2f7;color:#4b5563;font-weight:600;font-size:12px;text-transform:uppercase;letter-spacing:.03em;">Mentions</th>
                    <th align="left" style="padding:12px 14px;background:#f9fafb;border-bottom:1px solid #eef2f7;color:#4b5563;font-weight:600;font-size:12px;text-transform:uppercase;letter-spacing:.03em;">Takeaway</th>
                  </tr>
                </thead>
                <tbody>
                  ${rows || `<tr><td colspan="4" style="padding:18px 14px;color:#6b7280;border-bottom:1px solid #eef2f7;">No tools found.</td></tr>`}
                </tbody>
              </table>
            </td>
          </tr>

          <tr>
            <td style="padding:18px 24px 24px 24px;">
              <div style="font:400 12px/1.5 -apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,Inter,'Helvetica Neue',Arial,sans-serif;color:#9ca3af;">
                Links go to official homepages when confidently identified; otherwise left blank.
              </div>
            </td>
          </tr>
        </table>

        <div style="height:24px;"></div>
      </td>
    </tr>
  </table>
</body>
</html>
`;

const subject = `Top tools summary — ${data.topic || "AI Tools"}`;

return [{ json: { subject, html } }];

				
			

It returns both the subject and the body of the email, which we finally map to email module.

When this last step runs, you get an email to the specified email address with all details like this.

Pretty cool, right?

And as I promised, here is the blueprint for this automation that you can use as is in your n8n canva as is.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/no-code-tutorials/building-automation-to-get-the-best-tools-in-any-category-from-youtube/feed/ 0
How to Take Screenshot with Puppeteer (Step-by-Step Guide) https://www.scrapingdog.com/blog/how-to-take-screenshot-with-puppeteer/ https://www.scrapingdog.com/blog/how-to-take-screenshot-with-puppeteer/#respond Fri, 24 Oct 2025 09:55:25 +0000 https://www.scrapingdog.com/?p=30922

TL;DR

  • Quick setup: install puppeteer, launch headless Chrome, take a basic screenshot.
  • Full-page capture: pass { fullPage: true }; save to file.
  • Stability: await page.waitForSelector(...) before shooting to ensure the UI is ready.
  • For scale / rotation and hands-off rendering, use Scrapingdog’s Screenshot API instead of running your own browsers.

Capturing screenshots with Puppeteer is one of the easiest and most useful ways to automate browser tasks. Whether you’re testing UI changes, generating website previews, or scraping visual data, Puppeteer gives developers precise control over how to capture a page.

In this guide, we’ll walk through everything you need to know about taking screenshots using Puppeteer, from simple single-page captures to full-page.

What is Puppeteer?

Puppeteer is a Node.js library developed by Google that provides a high-level API to control Chrome or Chromium through the DevTools Protocol. It’s widely used for:

  • Web scraping and automation
  • End-to-end testing
  • PDF generation
  • Visual regression testing
  • Screenshot capture

When you install Puppeteer, it automatically downloads a compatible version of Chromium, so you can get started right away.

Prerequisites

Create a folder by any name your like. I am naming the folder as screenshot.

				
					mkdir screenshot

				
			

Now, inside this folder install puppeteer with this command.

				
					npm init -y
npm install puppeteer
				
			

Now, create a js file where you will write your code. I am naming the file as puppy.js. That’s all, our environment is ready.

Taking Our First Screenshot with Puppeteer

				
					let puppeteer = require('puppeteer');

(async () => {
  let browser = await puppeteer.launch();
  let page = await browser.newPage();
  await page.goto('https://www.scrapingdog.com');
  await page.screenshot({ path: 'screenshot.png' });
  await browser.close();
})();
				
			

The code is pretty simple but let me explain it step by step

  • Import Puppeteer — Loads the Puppeteer library to control a headless Chrome browser.
  • Start an async function — Allows the use of await for smoother asynchronous execution.
  • Launch the browser — Opens a new headless (invisible) Chrome instance.
  • Create a new page — Opens a fresh browser tab for interaction.
  • Go to the target URL — Navigates the page to https://www.scrapingdog.com.
  • Capture a screenshot — Takes the screenshot and saves it locally as screenshot.png.
  • Close the browser — Ends the session and frees up system resources.

Once you execute the code you will find the screenshot inside your folder screenshot.

How to Capture a Full-Page Screenshot

				
					let puppeteer = require('puppeteer');

(async () => {
  let browser = await puppeteer.launch();
  let page = await browser.newPage();
  await page.goto('https://www.scrapingdog.com');
  await page.screenshot({ path: 'screenshot.png' , fullPage: true});
  await browser.close();
})();
				
			

This ensures Puppeteer scrolls through the page and stitches everything into a single image.

If you don't want to use Puppeteer or any other toolkit for that matter to scale your screenshot generation, you can use Screenshot API. We manage proxies, headless browsers and other corner cases to maintain blockage free screenshots of any number of URLs.

Wait for Elements Before Taking Screenshot

Let’s take a screenshot of google home page once the search box appears.

				
					let puppeteer = require('puppeteer');

(async () => {
  // 1. Launch a browser
  let browser = await puppeteer.launch({ headless: true});

  // 2. Open a new page
  let page = await browser.newPage();

  // 3. Navigate to the website
  await page.goto('https://www.google.com', { waitUntil: 'domcontentloaded' });

  // 4. Wait for a specific element (Google search box)
  await page.waitForSelector('textarea[name="q"]');

  // 5. Take the screenshot
  await page.screenshot({
    path: 'google.png',
    fullPage: true
  });

  console.log("<img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2705.png" alt="✅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Screenshot taken after search box loaded!");

  // 6. Close the browser
  await browser.close();
})();
				
			

The code is almost similar we have just used waitForSelector to pause execution until a particular element appears in the DOM.

Conclusion

Puppeteer makes taking screenshots in Node.js fast, flexible, and reliable — whether you’re capturing a simple webpage, an entire site, or specific UI components.

With just a few lines of code, you can automate screenshot generation for monitoring, reporting, or testing.

If you’re already using automation tools or APIs, Puppeteer integrates perfectly into your workflow for capturing website visuals at scale.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
]]>
https://www.scrapingdog.com/blog/how-to-take-screenshot-with-puppeteer/feed/ 0