loading

How to Extract Text from PDF using PHP

How to Extract Text from PDF using PHP

How to Extract Text from PDF using PHP

0 Sales

Free

The PDF (Portable Document Format) format is being used to preserve text and graphic data for later use. A PDF file is sometimes used to show text/graphics material on a web page for internet and mobile phone. In most cases, a web reader is used to embed PDF files in a browser. The text/graphics content of a PDF file embedded on a web page is not added to the HTML page. The fact that the PDF material is not presented on the web page has a negative influence on SEO. To get around this problem, you can take text from a PDF and incorporate it on a web page.

 

The PDF Parser library is quite useful for extracting features from PDF files utilizing PHP. This PHP library decodes PDF files and retrieves text from all pages. PHP can read the object, headers, information, and text from a PDF file. This article will teach you how to use PHP to extract text from PDF files.

 

In this sample code, we will retrieve text from a PDF file using PHP and the PDF Parser package. In addition, we will demonstrate how to use PHP to upload Pdf documents and retrieve text data on the fly.

 

Install PDF Parser Library

 

To install the PDF Parser library using composer, run the following command.

composer require smalot/pdfparser

 

It is important to note that you do not need to install the PDF Parser library separately; all of the necessary files are included in source code. If you wish to install and then use PDF Parser without composer, download the source code.

 

Have included an autoloader in the PHP script to load the PDF Parser library and auxiliary functions.

include 'vendor/autoload.php';

 

Extract Text from PDF

 

Using PHP, the following code snippet pulls all of the text content from a PDF file.

 

- Load and initialise the PDF Parser library.


- Enter the URL of the original PDF file from which the text content will be retrieved.


- Use the PDF Parser class's parseFile() method to parse a PDF file.


- The getText() function of the PDF Parser class is used to extract text from a PDF file.

 

// Initialize and load PDF Parser library 
$parser = new \Smalot\PdfParser\Parser(); 
 
// Source PDF file to extract text 
$file 'path-to-file/Brochure.pdf'; 
 
// Parse pdf file using Parser library 
$pdf $parser->parseFile($file); 
 
// Extract text from PDF 
$textContent $pdf->getText();

 

Upload PDF File and Extract Text

 

This code sample demonstrates the step-by-step procedure of uploading PDF files and extracting the content using PHP.

 

PDF File Upload Form:

 

Create HTML components for a file upload form.

 

<form action="submit.php" method="post" enctype="multipart/form-data">
    <div class="form-input">
        <label for="pdf_file">PDF File</label>
        <input type="file" name="pdf_file" placeholder="Select a PDF file" required="">
    </div>
    <input type="submit" name="submit" class="btn" value="Extract Text">
</form>

 

When the form is submitted, the chosen file is sent to the server-side script to be processed further.

 

Text Extraction from an Uploaded PDF Using a Server-Side Script (submit.php):


The code below is used to upload the supplied file and extract the text from the PDF.

 

- In PHP, use $_FILES to get the file name.


- Get the file extension with the pathinfo() function and the PATHINFO EXTENSION filter.


- Validate the file to ensure that it is a valid PDF.


- In $_FILES, use tmp name to get the file path.


- Using the PDF Parser library, parse the uploaded PDF file and retrieve the text content.


- Using PHP's nl2br() function, modify text content by exchanging new lines (n) with line breaks (br/>).

 

$pdfText ''; 
if(isset($_POST['submit'])){ 
    // If file is selected 
    if(!empty($_FILES["pdf_file"]["name"])){ 
        // File upload path 
        $fileName basename($_FILES["pdf_file"]["name"]); 
        $fileType pathinfo($fileNamePATHINFO_EXTENSION); 
         
        // Allow certain file formats 
        $allowTypes = array('pdf'); 
        if(in_array($fileType$allowTypes)){ 
            // Include autoloader file 
            include 'vendor/autoload.php'; 
             
            // Initialize and load PDF Parser library 
            $parser = new \Smalot\PdfParser\Parser(); 
             
            // Source PDF file to extract text 
            $file $_FILES["pdf_file"]["tmp_name"]; 
             
            // Parse pdf file using Parser library 
            $pdf $parser->parseFile($file); 
             
            // Extract text from PDF 
            $text $pdf->getText(); 
             
            // Add line break 
            $pdfText nl2br($text); 
        }else{ 
            $statusMsg '<p>Sorry, only PDF file is allowed to upload.</p>'; 
        } 
    }else{ 
        $statusMsg '<p>Please select a PDF file to extract text.</p>'; 
    } 
} 
 
// Display text content 
echo $pdfText;

 

 

LICENSE OF USE

You can use it for personal or commercial projects. You can't resell it partially or in this form.

PRODUCT INFO

Create Date : Jan 20, 2022

Updated Date : Jan 23, 2022

Ratings

Comments : 0

Downloads : 0