Guides
Last Updated Aug 02, 2023

Web Scraping with PHP

Shyam Purkayastha

Table of Contents:

Get your free
API
key now
4.8 from 1,863 votes
See why the best developers build on Abstract
START FOR FREE
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
No credit card required
Get your free
Web Scraping API
key now
4.8 from 1,863 votes
See why the best developers build on Abstract
START FOR FREE
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
No credit card required

Web scraping is one of the fundamental ways of achieving automation in data gathering for internet research. But performing web scraping at scale is challenging. Performing simple web scraping operations is a no-brainer, but when it comes to scraping large websites, there are several restrictions on the origin of the request based on the country and IP addresses.

The Abstract Web Scraping API is a scalable and reliable option for undertaking web scraping projects. It is simple to use, yet very powerful. That’s because, behind the scenes, it supports millions of proxies and IP addresses that are constantly rotated and validated to ensure the uninterrupted extraction of data from web pages. In this blog post, we will build a demo app to showcase a web application for web scraping with PHP, designed using Abstract Web Scraping API for scraping a single web page.

Let’s send your first free
API
Web Scraping API
call
See why the best developers build on Abstract
Get your free api

Sign Up for Abstract Web Scraping API

If you are not familiar with Abstract Web Scraping API, then signup for your free Abstract account to get access to all the APIs. Once logged in, you can access the Web Scraping API from the dashboard.

You can access your API key from the Web Scraping API console.

You can make a test request from the console to scrape the URL provided by default. You should see a large blob of HTML returned by the API. This ensures that your API key works fine and you are now ready to integrate this API into the PHP application.

Demo Web Scraping App with PHP

Let’s put this Web Scraping API to some use by building a demo application. PHP is one of the most widely used programming languages for building web applications, and Laravel is a popular PHP web framework. We will leverage both of these technologies to show you how to build a basic web scraping application.

Follow these steps in the rest of this post to witness how you can create a Laravel based web app for web scraping in a few steps. But first here are a few prerequisites you should have to ensure that you have the right developer environment for creating this application.

Prerequisites

  1. PHP Runtime: Make sure you have a PHP 8 runtime available on your developer environment
  2. PHP Toolchain for Laravel: Make sure you also have the composer package manager 

Step 1 - Create a New Laravel Project

Open a terminal and run the following composer command to create a new Laravel project named php_abstractapi.


composer create-project --prefer-dist laravel/laravel php_abstractapi

This will create a directory named php_abstractapi under the present working directory where the command is executed. This is the project directory of this demo app containing all the boilerplate code and dependencies. Make sure to change to this directory for executing all further commands from the terminal.

Open your favorite IDE and check out the directory structure of the project directory

Step 2: Test the Default Laravel App

The empty Laravel project can be tested by launching it from the terminal.


php artisan serve

This will start a development web server that hosts the default Laravel app at https://127.0.0.1:8000. You can check out the default landing page for this app on the browser.

Step 3: Add the API Credentials for Demo App

Open the environment file for the project and add two new environment variable entries for the Abstract API URL and Abstract API key.

File: .env


ABSTRACT_API_KEY=<YOUR_ABSTRACT_API_KEY>
ABSTRACT_API_URL=<ABSTRACT_WEB_SCRAPING_API_ENDPOINT>

Replace the placeholder <YOUR_ABSTRACT_API_KEY> with your Abstract API key and <ABSTRACT_WEB_SCRAPING_API_ENDPOINT> with the API URL. The URL can be found in the live test console within the Abstract API console. 

As of now, this URL is set to https://scrape.abstractapi.com/v1

Step 4: Add the HTTP Helper Class for Handling Abstract API

Create a helper class, AbstractAPI.php under the Http subdirectory.

File: app/Http/Helpers/AbstractAPI.php 

Add the following PHP code snippet within this file:


<?php
namespace App\Http\Helpers;

use Illuminate\Support\Facades\Http;

class AbstractAPI
{
     public function make_request(string $url)
     {
    	$api_url = env('ABSTRACT_API_URL', '');
    	$api_key = env('ABSTRACT_API_KEY', '');
    	$api_url = trim($api_url);
    	$api_key = trim($api_key);

    	if( $api_url == "" || $api_key == "" )
    	{
    	      return "Error Occured: ABSTRACT API configuration error.";
    	}

    	try
    	{
	     $request_url = $api_url."?api_key=".$api_key."&url=".$url;
	     $res = Http::withHeaders([
	            "Content-Type" => "application/plain-text",
	      ])->get($url);

	     return $res;
    	} 
    	catch (Exception $e)
    	{
    	     throw new ErrorException($e->getMessage());
    	}
       }
}

This helper class handles the call to Abstract Web Scraping API from the PHP backend.

Step 5: Add a New Controller Named WebscrapeController

From the terminal, add a new controller named WebscrapeController to the project.


php artisan make:controller WebscrapeController

This will create a new PHP file 

File: app/Http/Controllers/WebscrapeController

Replace the default content of the file with the following code:


<?php

namespace App\Http\Controllers;

use Illuminate\Http\Client\ConnectionException;
use Illuminate\Http\Request;
use App\Http\Helpers\AbstractAPI;

class WebscrapeController extends Controller
{
    public function index()
    {
        return view('webscrape');
    }

    public function requestapi(Request $request)
    {
        if ($request->isMethod('post')) 
        {
            $url = $request->request->get('url');
            if( $url == "" )
            {
                return response("Invalid Url", 400)->header('Content-Type', 'text/plain');
            } 
            $url_valid = filter_var($url, FILTER_VALIDATE_URL);
            if( !$url_valid )
            {
                return response("Bad Request", 400)->header('Content-Type', 'text/plain');  
            }
            try{
                $response = (new AbstractAPI())->make_request($url);
                return $response;
            }
            catch(ConnectionException $e)
            {
                return response("Host could not be resolved", 400)->header('Content-Type', 'text/plain');
            }
            catch(Exception $e)
            {
                return response("Error : ".$e->getMessage(), 400)->header('Content-Type', 'text/plain');   
            }
        } 
        else 
        {
            return response("Invalid Method", 400)->header('Content-Type', 'text/plain');
        }        
    }
}

This controller defines a custom API endpoint, ‘/requestapi’. This API accepts the URL from the frontend UI and passes it to Abstract Web Scraping API for scraping the contents of the URL. As part of handling the Abstract API call, this controller also defines ConnectionException to catch invalid URLs. 

This controller also defines the home page view for the UI which is labeled as ‘webscrape’.

Step 6: Update the App Routes

The app has two routes. One is ‘/’ for displaying the home page of the frontend UI of the demo app. And the other is ‘/requestapi’ for triggering the scraping request.

You must register these routes for the demo app in Laravel. To achieve this, replace the content of the routes definition.

File: routes/web.php


<?php

use App\Http\Controllers\WebscrapeController;
use Illuminate\Support\Facades\Route;

Route::get('/', [WebscrapeController::class,'index']);

Route::post('/requestapi', [WebscrapeController::class,'requestapi']);

Step 7: Create the HTML and JavaScript for the Demo App UI

At this point, all the backend PHP logic is built for the demo app. Now the last thing is the graphical user interface (UI) which is an HTML page. 

For this, create a new view file of the Laravel app under the resource subdirectory

File: resources/view/webscrape.blade.php

Add the following content inside this view:


<!DOCTYPE html>
<html lang="{{ str_replace('_', '-', app()->getLocale()) }}">
    <head>
        <meta charset="utf-8">
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <title>Php Web Scraper</title>
        <!-- Fonts -->
        <link href="https://fonts.googleapis.com/css2?family=Nunito:wght@400;600;700&display=swap" rel="stylesheet">
        <style>
            body {
                font-family: 'Nunito', sans-serif;
                height: 100%;
                font-size: 14px;
            }
            h5{
                font-weight: bold;
            }
            .gap-20{
                gap:20px;
            }
            textarea:focus { 
                outline: none !important;
                border-color: #dee2e6;
                box-shadow: 0 0 10px #dee2e6;
            }
            #response_status_code{
                left: 0;
                background-color: #dee2e6;
                border: 1px solid #dee2e6;
                color: black;
                font-weight: bold;
                text-align: center;
            }
        </style>
        <link rel="stylesheet" href="{{asset('css/bootstrap.min.css')}}">
    </head>
    <body>
        <header class="nav-header py-2 px-2 d-flex align-items-center justify-content-start shadow-md border-bottom">
            <h5>PHP Web Scraper using Abstract API</h5>
        </header>
        <main class="container-fluid" style="min-height: 500px;">
            <div class="row py-1 justify-content-center">
                <div class="col-12 d-flex align-items-center justify-content-center">
                    <span>PHP Web Scraper Request</span>
                </div>
                <div class="d-flex col-12 justify-content-center p-1">
                    <form method="post" class="col-6" id="request_form" action="{{url()->current()}}/requestapi">
                        @csrf
                        <div class="row mb-2 align-items-center justify-content-center">
                            <div class="col-12">
                                <input type="text" required class="form-control" id="url" name="url" value="" placeholder="Enter Url here">
                            </div>
                        </div>
                        <div class="row mb-2 align-items-center justify-content-center">
                            <div class="col-auto">
                                <button id="submit_button" class="btn btn-sm btn-primary" type="submit">Submit</button>
                            </div>     
                            <div class="col-auto">
                                <button id="reset_button" class="btn btn-sm btn-danger" onclick="resetform()" type="button">Reset</button>
                            </div>     
                        </div>
                    </form>
                </div>
                <div id="loading" class="col-12 justify-content-center gap-20 w-100" style="display:none;">
                    <div class="d-flex justify-content-center">
                        <div class="spinner-border" role="status"><span class="visually-hidden">Loading...</span></div>
                    </div>
                    <div class="d-flex justify-content-center">
                        <div id="request_status">Loading...</div>
                    </div>
                </div>
                <div id="error_response" class="col-12 w-100 justify-content-center px-4" style="display:none;">
                    <div id="error_response_text" class="alert-danger text-center"></div>
                </div>
                <div id="apiresponse" style="display:none;" class="col-12 py-2 px-4 align-items-center justify-content-center position-relative">
                    <textarea id="response" style="height:500px;" class="p-2 container overflow-auto border bg-light">
                    </textarea>
                </div>
            </div>
        </main>

 <footer class="container-fluid">
            <div class="w-100 py-2 px-2 text-center text-sm">
                Laravel v{{ Illuminate\Foundation\Application::VERSION }} (PHP v{{ PHP_VERSION }})
            </div>
        </footer>
        <script type="text/javascript">

            var spinnerHtml = '<div class="d-flex justify-content-center"><div class="spinner-border" role="status"><span class="visually-hidden">Loading...</span></div></div>';
            var form_el = document.getElementById("request_form");
            document.getElementById("url").focus();
            
            function resetform()
            {
                document.getElementById("url").value = '';
                document.getElementById("url").focus();
                document.getElementById("loading").style.display = 'none';
                document.getElementById("submit_button").classList.remove('disabled');
                document.getElementById("reset_button").classList.remove('disabled');
                document.getElementById("apiresponse").style.display = "none";
                document.getElementById("error_response").style.display = "none";
                document.getElementById("error_response_text").innerHTML = '';
                document.getElementById("response").value = '';
                return true;
            }

            function make_request(url)
            {
                const xhttp = new XMLHttpRequest();
                xhttp.onload = function() 
                {
                    if (this.status != 200) 
                    {
                        document.getElementById("loading").style.display = 'none';
                        document.getElementById("submit_button").classList.remove('disabled');
                        document.getElementById("reset_button").classList.remove('disabled');
                        document.getElementById("apiresponse").style.display = "none";
                        var responseText = "Server Http Status: "+this.status+" : "+this.responseText;
                        document.getElementById("error_response_text").innerHTML = responseText;
                        document.getElementById("error_response").style.display = "flex";
                    } 
                    else 
                    { 
                        document.getElementById("loading").style.display = 'none';
                        document.getElementById("submit_button").classList.remove('disabled');
                        document.getElementById("reset_button").classList.remove('disabled');
                        document.getElementById("apiresponse").style.display = "flex";
                        document.getElementById("error_response").style.display = "none";
                        document.getElementById("error_response_text").innerHTML = '';
                        document.getElementById("response").value = this.responseText;
                    }
                }

                xhttp.open("POST", form_el.action);
                xhttp.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
                xhttp.setRequestHeader("Cache-Control", "no-cache, no-store, must-revalidate");
                xhttp.setRequestHeader("X-CSRF-TOKEN", "{{ csrf_token() }}");
                var url = document.getElementById("url").value.trim();
                url = encodeURIComponent(url);
                xhttp.send("url="+url);
            }

            form_el.addEventListener("submit", function(evt) 
            {
                evt.preventDefault();
                var url = document.getElementById("url").value.trim();
                document.getElementById("loading").style.display = 'flex';
                
                document.getElementById("error_response_text").innerHTML = "";
                document.getElementById("apiresponse").style.display = "none";
                document.getElementById("error_response").style.display = "none";

                document.getElementById("request_status").innerHTML = "Making Request..";
                
                document.getElementById("response").value = 'Processing...';
                document.getElementById("submit_button").classList.add('disabled');
                document.getElementById("reset_button").classList.add('disabled');
                make_request(url);
            });

        </script>
    </body>
</html>

This is a Bootstrap based HTML code for a web form that lets the user input a URL and submit the form.  The JavaScript code links the form to the ‘/requestapi’ endpoint of the PHP backend to send web scraping requests with the URL.

Step 8: Adding Bootstrap CSS to the code

To ensure that Bootstrap CSS styles are applied to the frontend UI, get the Bootstrap.min.css from the link below:

https://getbootstrap.com/docs/5.0/getting-started/download/

Create a sub-directory ‘css’ within the public sub-directory of the project and copy the downloaded bootstrap.min.css into it.

Alternatively, you can also add a link to the CDN source of bootstrap in the HTML file header.


<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous">

With this step, we are done with all the code development for this demo app. Make sure to save all the files before proceeding with the next steps.

Step 9: Relaunch the Laravel Server

Relaunch the Laravel development server which was earlier run in step 2 to test the default Laravel app. 

Now you should see the demo app UI on the browser as per the view created in step 7.

Step 10: Test the Demo App

Now you are ready to test the app. 

Enter any website URL in the form and click submit. The UI requests the PHP backend and displays the loading icon while waiting for the response.

Behind the scenes, the Laravel framework will call the Webscrare Controller to get the scraped webpage content from Abstract Web Scraping API and display it in the front end. 

Here is what the scraped content looks like for example.com

That’s it !!

If you have followed the steps this far then pat yourself on your back. You have triumphantly built and tested a PHP demo app for web scraping.

As you can witness, all the heavy lifting of scraping the content was taken over by the Abstract Web Scraping API, while you focussed on building the UI and backend logic for handling user requests.

FAQs

How To Scrape Data From Websites using PHP?

As a programming language suitable for building web applications, PHP is capable of web scraping. You can use an in-built PHP library to run the scraping tasks within the PHP runtime, or leverage an external service. For large-scale scraping projects, it is recommended to use a third-party API. The Abstract Web Scraping API offers a full-fledged, secure and scalable solution for web scraping. It is easy to integrate which API within a PHP application. This API also offers a free tier for basic scraping chores.  

How To Crawl A Website in PHP?

There are many ways to crawl a website. You can write the business logic in PHP using the built-in cURL library to scrape the homepage of a website and then parse all the internal links to crawl additional pages. Alternatively, you can also use an API for larger websites, which might be blocking scrape requests from the same IP address. With Abstract Web Scraping API, you can undertake large-scale website crawling tasks such that the API will spread the scraping requests across a pool of large IP addresses. Moreover, it is super easy to integrate this API within PHP using cURL or other HTTP client libraries. 

Which Tool is best for Web Scraping?

There are many tools available for performing web scraping. However, if you want to have complete control over it, it is better to write your own web scraping tool. You can easily build a demo web scraping web application using PHP and Abstract API. PHP handles the web app interactions and accepts web scraping requests, while the Abstract Web Scraping API does the heavy lifting of scraping and returning the scraped content. Integrating the API within a PHP application is easy with a plethora of PHP HTTP client libraries, such as cURL. The Abstract Web Scraping API offers a free tier of 1000 requests per month to try the API.

5/5 stars (5 votes)

Shyam Purkayastha
Shyam Purkayastha is a proficient web developer and PHP maestro, renowned for his expertise in API creation and integration. His deep knowledge of PHP scripting and backend development enables him to build scalable server-side applications. Shyam is particularly passionate about RESTful API design, significantly enhancing web service functionality and data interoperability.
Get your free
Web Scraping API
API
key now
Abstract's free web scraping API comes with Node.js code snippets, libraries, guides, and more.
get started for free

Related Articles

Get your free
API
Web Scraping API
key now
4.8 from 1,863 votes
See why the best developers build on Abstract
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
No credit card required