The PDF (Portable Document Format) format is being used to preserve text and graphic data for later use. A PDF file is sometimes used to show text/graphics material on a web page for internet and mobile phone. In most cases, a web reader is used to embed PDF files in a browser. The text/graphics content of a PDF file embedded on a web page is not added to the HTML page. The fact that the PDF material is not presented on the web page has a negative influence on SEO. To get around this problem, you can take text from a PDF and incorporate it on a web page.
The PDF Parser library is quite useful for extracting features from PDF files utilizing PHP. This PHP library decodes PDF files and retrieves text from all pages. PHP can read the object, headers, information, and text from a PDF file. This article will teach you how to use PHP to extract text from PDF files.
In this sample code, we will retrieve text from a PDF file using PHP and the PDF Parser package. In addition, we will demonstrate how to use PHP to upload Pdf documents and retrieve text data on the fly.
To install the PDF Parser library using composer, run the following command.
composer require smalot/pdfparser
It is important to note that you do not need to install the PDF Parser library separately; all of the necessary files are included in source code. If you wish to install and then use PDF Parser without composer, download the source code.
Have included an autoloader in the PHP script to load the PDF Parser library and auxiliary functions.
include 'vendor/autoload.php';
Using PHP, the following code snippet pulls all of the text content from a PDF file.
- Load and initialise the PDF Parser library.
- Enter the URL of the original PDF file from which the text content will be retrieved.
- Use the PDF Parser class's parseFile() method to parse a PDF file.
- The getText() function of the PDF Parser class is used to extract text from a PDF file.
// Initialize and load PDF Parser library $parser = new \Smalot\PdfParser\Parser(); // Source PDF file to extract text $file = 'path-to-file/Brochure.pdf'; // Parse pdf file using Parser library $pdf = $parser->parseFile($file); // Extract text from PDF $textContent = $pdf->getText();
This code sample demonstrates the step-by-step procedure of uploading PDF files and extracting the content using PHP.
PDF File Upload Form:
Create HTML components for a file upload form.
<form action="submit.php" method="post" enctype="multipart/form-data">
<div class="form-input">
<label for="pdf_file">PDF File</label>
<input type="file" name="pdf_file" placeholder="Select a PDF file" required="">
</div>
<input type="submit" name="submit" class="btn" value="Extract Text">
</form>
When the form is submitted, the chosen file is sent to the server-side script to be processed further.
Text Extraction from an Uploaded PDF Using a Server-Side Script (submit.php):
The code below is used to upload the supplied file and extract the text from the PDF.
- In PHP, use $_FILES to get the file name.
- Get the file extension with the pathinfo() function and the PATHINFO EXTENSION filter.
- Validate the file to ensure that it is a valid PDF.
- In $_FILES, use tmp name to get the file path.
- Using the PDF Parser library, parse the uploaded PDF file and retrieve the text content.
- Using PHP's nl2br() function, modify text content by exchanging new lines (n) with line breaks (br/>).
$pdfText = ''; if(isset($_POST['submit'])){ // If file is selected if(!empty($_FILES["pdf_file"]["name"])){ // File upload path $fileName = basename($_FILES["pdf_file"]["name"]); $fileType = pathinfo($fileName, PATHINFO_EXTENSION); // Allow certain file formats $allowTypes = array('pdf'); if(in_array($fileType, $allowTypes)){ // Include autoloader file include 'vendor/autoload.php'; // Initialize and load PDF Parser library $parser = new \Smalot\PdfParser\Parser(); // Source PDF file to extract text $file = $_FILES["pdf_file"]["tmp_name"]; // Parse pdf file using Parser library $pdf = $parser->parseFile($file); // Extract text from PDF $text = $pdf->getText(); // Add line break $pdfText = nl2br($text); }else{ $statusMsg = '<p>Sorry, only PDF file is allowed to upload.</p>'; } }else{ $statusMsg = '<p>Please select a PDF file to extract text.</p>'; } } // Display text content echo $pdfText;
© ThemesGiant Copyright @2015-2022 | All rights reserved.