ParseXtract - PX
Automated data extraction.
Compatible with Image or PDF, PX allows you to extract structured data from semi-structured documents.
Features
InReal-time
Automatically extract document data through real-time processing
Improve quality
Avoid the manual processing related errors.
Reduce costs
No manual entry associated costs.
Easy integration
Smoothly integrate into your process with just 3 lines of code
Get structured data
Improve, simplify and increase productivity of data workflows with machine learning technology.
The ParseXtract API allows you to train and extract data in PDF document and transform it in a JSON structured format.
Pre-trained
PX can be integrated, easily and quickly, to extract data from your documents.
Invoices, bank statements and payslips have already been modeled and pre-trained.
{
"detailedLabelId": "3f18d4a6bb6979ea3e9f7bce6ac61abc",
"extractedData": [
{
"name": "Invoice.Type.Identifier",
"value": "Invoice"
},
{
"name": "Invoice.Date",
"value": "18/09/2019"
},
{
"name": "Invoice.Number.Identifier",
"value": "2234567"
},
{
"name": "Supplier.Name.Literal",
"value": "Ma Société SARL"
},
{
"name": "Supplier.National.Identifier",
"value": "000000000000"
},
{
"name": "Supplier.Siret.Identifier",
"value": "554 874 445"
},
{
"name": "Supplier.Vatnumber.Identifier",
"value": "FR 000000000000"
},
{
"name": "Invoice.Currency",
"value": "EUR"
},
{
"name": "Invoice.TotalAmount.WithoutTaxes.Amount",
"value": "276,00"
},
{
"name": "Invoice.VATTotal.Amount",
"value": "55,20"
},
{
"name": "Invoice.TotalAmount.WithTaxes.Amount",
"value": "331,20"
},
{
"name": "Customer.Contact.Name.Literal",
"value": "Pénélope D. Seguin"
},
{
"name": "Customer.VATNumber.Identifier",
"value": ""
},
{
"name": "Customer.Address.Line1",
"value": "51 rue Nationale"
},
{
"name": "Customer.Address.ZipCode",
"value": "75003"
},
{
"name": "Customer.Address.City",
"value": "Paris"
}
],
"id": "DemoTrial_20100831_Armstrong_Neil_0014.pdf",
"labelId": "FactureMaSociete"
}
{
"detailedLabelId": "3f18d4a6bb6979ea3e9f7bce6ac61abc",
"extractedData": [
{
"name": "Employee.Identifier",
"value": "078904"
},
{
"name": "Employee.Full.Name",
"value": "Pénélope D Séguin"
},
{
"name": "Employee.SocialSecurityNumber",
"value": "2651132254647 79"
},
{
"name": "Employee.Address.Line1",
"value": "51 rue Nationale"
},
{
"name": "Employee.Address.ZipCode",
"value": "75003"
},
{
"name": "Employee.Address.City.Name",
"value": "Paris"
},
{
"name": "Payslip.StartDate",
"value": "01/08/2019"
},
{
"name": "Payslip.EndDate",
"value": "31/08/2019"
},
{
"name": "Company.Name",
"value": "Ma Société SARL"
},
{
"name": "Company.SIRET.Identifier",
"value": "55487445"
}
],
"id": "Bulletin_de_paie.pdf",
"labelId": "MaSociete_label"
}
c
Discover our approach, evolve your business!
We do it our own way: based on our family of unsupervised classifiers, document query language and query generator engine.
— View documentationDivide et impera
The use of several uncorrelated unsupervised classifiers allow us to group similar documents together.
For instance, we are able to recognize a trademark in the document's header or a recurrent paragraph in the footer.Once the documents are grouped into the correct homogeneous collections, finding the right extraction rules is easier.
Whitebox
We have developed our own query language (PQL) that allows us to navigate the layout structure of the document, jumping to a specific point and use regex selectors.
Machine learning techniques are used to automatically generate the extraction rules.As these queries are human-readable, we can always correct or improve them in case of overfit or other issues.
Security,
always the priority.
Our website and application traffic run entirely over encrypted SSL and HTTP strict transport security to ensure that browsers interact with Securibox exclusively over HTTPS, meaning that credentials and other sensitive data is never leaked over the network.
“When accessing our application, along with each request, a unique token is sent thus protecting against Cross Site Request Forgery (CSRF). All the sensitive data stored within our servers is encrypted with AES 256-bit and rotating keys, so that the way the encryption is constantly changing."