Extracting data from PDFs is a common need for businesses dealing with invoices, reports, forms, or other documents. Power Automate provides automated workflows to extract text, tables, and structured data from PDFs using AI Builder, Power Automate Desktop, and third-party connectors.
This guide covers:
✔ Methods to extract PDF data
✔ Step-by-step tutorial using Power Automate
✔ Best practices and troubleshooting
1. Methods to Extract Data from PDFs in Power Automate
A. AI Builder (Preferred for structured PDFs)
Uses AI-powered document processing to extract key fields.
Best for invoices, purchase orders, and structured documents.
Requires a premium Power Automate license.
B. Power Automate Desktop (For text-based extraction)
Uses OCR (Optical Character Recognition) for text extraction.
Works well for scanned PDFs and unstructured data.
Included in Windows 11 and Power Automate Desktop.
C. Third-Party Connectors (For complex documents)
Integrates with Adobe PDF Services, Encodian, or Cloudmersive.
Ideal for complex PDFs with images, handwriting, and tables.
Requires third-party licensing or API subscriptions.
2. Extracting Data Using AI Builder (Best for Structured PDFs)
Step 1: Set Up AI Builder in Power Automate
- Go to Power Automate (flow.microsoft.com).
- Click AI Builder → Explore → Form Processing.
- Click Create a New Model → Upload sample PDFs with consistent structure.
- Select the fields to extract (e.g., Invoice Number, Date, Total Amount).
- Train and publish the model.
Step 2: Create an Automated Flow
- Go to Power Automate → Click Create → Select Automated cloud flow.
- Choose Trigger: When a file is created in OneDrive or SharePoint.
- Add an AI Builder Action: Extract data from PDF → Select your AI model.
- Store extracted data in Excel, SharePoint List, or Dataverse.
- Save and test the flow.
Example Use Case:
- Extract invoice details from PDFs uploaded to OneDrive and save them in an Excel file.
3. Extracting Data Using Power Automate Desktop (OCR-Based)
Step 1: Create a Desktop Flow in Power Automate Desktop
- Open Power Automate Desktop.
- Click New Flow → Name it Extract PDF Data.
Step 2: Open the PDF File
- Add Launch Application action → Select a PDF viewer (Adobe, Edge).
- Use Send Keys (
Ctrl + A
,Ctrl + C
) to copy all text.
Step 3: Extract Specific Data
- Use Text Manipulation actions to extract required values.
- Save extracted data in a file or database.
Example Use Case:
- Extract names, addresses, or reference numbers from scanned PDF reports.
4. Extracting Data Using Third-Party PDF Connectors
Popular Connectors:
Adobe PDF Services – Extracts text, tables, and images.
Encodian – Converts PDFs to Excel, extracts structured data.
Cloudmersive OCR – Extracts text from scanned PDFs.
Step 1: Use Adobe PDF Services
- Go to Power Automate → Create a Cloud Flow.
- Select trigger: When a file is created in OneDrive.
- Add Adobe PDF Action: Extract PDF Text.
- Store extracted text in Excel, Dataverse, or SharePoint.
Example Use Case:
- Convert PDF tables into structured Excel data.
5. Best Practices for PDF Data Extraction
✔ Use AI Builder for structured PDFs (invoices, forms).
✔ Use OCR for scanned or image-based PDFs (reports, contracts).
✔ Optimize file storage (OneDrive, SharePoint) before extraction.
✔ Validate extracted data using Power Automate conditions.
✔ Test different methods to find the most reliable approach.
6. Troubleshooting Common Issues
❌ Extracted data is incorrect
Improve AI Builder training with more sample PDFs.
❌ Power Automate Desktop OCR fails
Increase image resolution or use third-party OCR tools.
❌ PDF structure is inconsistent
Use manual text parsing (Power Automate Desktop).
❌ Third-party connectors not working
Check API limits and authentication settings.