MITB Banner

Guide To ScrapingBee: Universal Web API for Web Scraping

Share

ScrapingBee web API for web scraping

Web scraping is a technique of extracting data from the internet, and it is the main source of information, as the data is increasing the Web scraping techniques are trending more and more, you can see below how the worldwide popularity of Web Scraping has increased over the years.


Now there are many tools for web scraping in the market some provide free service, some paid and other Graphical interfaces for noncoders now based on different needs we have already covered many web scraping tools as follows:

ScrapingBee

Today we are going to discuss ScrapingBee that is very popular and used by many fortune companies on a daily basis for Web scraping tasks.

ScrapingBee is a web scraping tool created by Kevin Sahin and Pierre de Wulf, Kevin is a web scraping expert and author of the Java web scraping book, Pierre is a data scientist. ScrapingBee makes scraping the web easy and also they provide API so no need to worry about programming languages. It works well with every language. They solved many problems like headless chrome surfing on the server and dynamically changing proxy IP to never get blocked, ScrapingBee waits for 2000 milliseconds before returning the Source code, because it scrapes web pages HTML like a real browser in the headless environment does. Here are some of the features you should know before getting started.

Features of ScrapingBee

  • Used for price monitoring and other web scraping stuff.
  • Extracting data without getting blocked.
  • Uses a large proxy pool.
  • No rate limit barrier due to dynamic proxies.
  • Lead generation directly from Google Sheets.
  • No worrying about running headless chrome on a server

Getting Started

Go to the ScrapingBee website and signup, and they provide a free plan which includes 1000 free API calls, that’s enough to learn and test this API.

signup window scrapingbee

Now access the dashboard and copy the API key we needed later in this tutorial. ScrapingBee supports multi-language support so you can directly use the API key in your projects from now on.

dashboard API_Key for Scrapingbee

Installation

Scaping bee provides REST API support so it can be used with any programming language like CURL, Python, NodeJS, Java, Ruby, Php, and Go. We are going to use Python with the Request framework and BeautifulSoup for further Scraping. Install them using PIP as follows:

#  Install the Python Requests library:
pip install requests

# Additional modules we needed during this tutorial:
pip install BeautifulSoup

Quickstart

Use the below code to initiate ScrapingBee web API, here we are creating a Request call with parameters URL, API key and in return, the API responds with an HTML content of the target URL.

Python

import requests
def send_request():
    response = requests.get(
        url="https://app.scrapingbee.com/api/v1/",
        params={
            "api_key": "INSERT-YOUR-API-KEY",
            "url": "https://example.com/",
        },
    )
    print('Response HTTP Status Code: ', response.status_code)
    print('Response HTTP Response Body: ', response.content)
send_request()
quickstart web scraping using python

We can use BeautifulSoup to make this output more readable by just adding a prettify code, learn more about BeautifulSoup.

Encoding

You can also encode the URL you want to scrape by using urllib.parse as follows:

import urllib.parse
encoded_url = urllib.parse.quote("YOUR URL")

ScrapingBee API also supports other languages too like: 

JAVA

Examples are taken from here

import java.io.IOException;
import org.apache.http.client.fluent.*;

public class SendRequest
{
  public static void main(String[] args) {
    sendRequest();
  }

  private static void sendRequest() {

    // Classic (GET )

    try {

      // Create request
      Content content = Request.Get("https://app.scrapingbee.com/api/v1/?api_key=YOUR-API-KEY&url=YOUR-URL")

      // Fetch request and return content
      .execute().returnContent();

      // Print content
      System.out.println(content);
    }
    catch (IOException e) { System.out.println(e); }
  }
}

PHP

Example credit

<?php

// get cURL resource
$ch = curl_init();

// set url
curl_setopt($ch, CURLOPT_URL, 'https://app.scrapingbee.com/api/v1/?api_key=YOUR-API-KEY&url=YOUR-URL');

// set method
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

// return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);



// send the request and save response to $response
$response = curl_exec($ch);

// stop if fails
if (!$response) {
  die('Error: "' . curl_error($ch) . '" - Code: ' . curl_errno($ch));
}

echo 'HTTP Status Code: ' . curl_getinfo($ch, CURLINFO_HTTP_CODE) . PHP_EOL;
echo 'Response Body: ' . $response . PHP_EOL;

// close curl resource to free up system resources
curl_close($ch);
?>

Let’s scrape data from OLX using ScrapingBee API

We are going to use python language and a simple code using Request, Beautifulsoup, and SrapingBee API for URL requesting and we will extract all the smartphones more specifically tablets from OLX with their names and price:

  1. Import the modules
#IMPORT MODULES
Import requests
from bs4 import BeautifulSoup
from time import sleep
  1. Initialize URL and API parameter for our Web API, and it will return the web page source code.   
KEY = 'Your_API_key'
URL = 'https://www.olx.in/tablets_c1455'
params = {'api_key': KEY, 'url': URL, 'render_js': 'False'}
  1. Request to ScrapingBee web API and return the content in variable “r”
r = requests.get('http://app.scrapingbee.com/api/v1/', params=params, timeout=20)
  1. Inspect OLX and find where the product name and prices are by going to this URL: https://www.olx.in/tablets_c1455 and anywhere on page click Right Click-> Inspect to open the page in developer mode and start inspecting the page using Cursor icon at the left corner above source code.
inspecting OLX products
  1. Let’s check the API, by returning status code from it and use Beautiful Soup to scrape all the classes having the name ‘EIR5N’ and 
if r.status_code == 200:
    html = r.text
    soup = BeautifulSoup(html, 'lxml')
    links = soup.select('.EIR5N')
  1. Loop inside the classes and find the product name and price by using command find(), find all the span tags where data-aut-id=itemTitle and itemPrice
for span in links:

        product_name = span.find('span', {'data-aut-id': 'itemTitle'})
        print(product_name.text)

        price = span.find('span', {'data-aut-id': 'itemPrice'})
        print(price.text)
  1. Full code
import requests
from bs4 import BeautifulSoup
from time import sleep

def main():
    KEY = 'G2B2GSAPF1LTBJBAR7F0UT8H0VSLC6V7V6EGJRO3MFWFO3EH'
    URL = 'https://www.olx.in/tablets_c1455'

    params = {'api_key': KEY, 'url': URL, 'render_js': 'False'}

    r = requests.get('http://app.scrapingbee.com/api/v1/', params=params, timeout=20)

    if r.status_code == 200:
        html = r.text
        soup = BeautifulSoup(html, 'lxml')
        classes = soup.select('.EIR5N')
        for span in classes:
            
            product_name = span.find('span', {'data-aut-id': 'itemTitle'})
            print(product_name.text)
            
            price = span.find('span', {'data-aut-id': 'itemPrice'})
            print(price.text)
main()

Output

Conclusion

In this tutorial, we learned about ScrapingBee: an API used for Web scraping, this API is special because it provides you Javascript rendering of pages for which you need tools like Selenium that supports headless browsing. Javascript rendering is based on the DOM model. Also, we have seen an example where we scraped the product’s name and price from OLX using this API.

Remember ScrapingBee is not a scraping tool it’s a web API that works with other scraping scripts when there are many restrictions on websites, and we need a solution that can still give us the output and never get blocked. ScrapnigBee API can request a single URL 1000 times without getting blocked, It returns source code very fast, and it is also very simple to use. For more information about this API, you can follow the official documentation.

Share
Picture of Mohit Maithani

Mohit Maithani

Mohit is a Data & Technology Enthusiast with good exposure to solving real-world problems in various avenues of IT and Deep learning domain. He believes in solving human's daily problems with the help of technology.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.