When extracting information from documents data types like dates and numbers can be parsed in Duco Adaptive IDP. This has the advantage of:
- not needing to parse dates & numbers in your own application
- parsing rules that can be configured for each of your projects separately
- output provided by Duco Adaptive IDP being formatted the way you need it to be
General behavior
Under the hood the parser will always try to look at context within a document to parse ambiguous dates and numbers. This means it will try to find non-ambiguous dates within a document to learn the format and apply that format to the ambiguous dates and numbers within that same document.
Date parsing
Parsing dates can sometimes be a challenge and depending on the project it can be different for each project. Duco Adaptive IDP allows you to configure how you parse dates.
Missing data
When parts of a date are missing, you can define default rules on how to handle the situation.
When missing the day part of a date you can choose to:
- Go to human validation
- Use the first day of the month
- Use the last day of the month
When missing the year part of a date you can choose to:
- Go to human validation
- Use the closest year
- Use the current year
- Use the next year
- Use the previous year
AI parsing
You can enable the use of AI as a parsing fallback. This can be enabled when the parsing fails and/or when the parsing stops when dealing with ambiguous dates.
A text field is shown (when the AI functionality is enabled) to allow you to give extra instructions for parsing.
Failed parsing
Sometimes the parsing just fails. Here you can decide how to deal with that situation:
- Go to human validation
- Make the entity value blank
- Remove the entity
Parsing ambiguous two-part dates
Duco Adaptive IDP allows you to configure how to deal with two-part dates. You can treat the dates in following formats:
- day - month
- month - year
- year - month
- month-day
- week - year
- year - week
- Closest to upload date
There is a special option "Stop" that allows you to exclude parsing options. When the parsing stops this way you can:
- Go to human validation
- Make the entity value blank
Closest to upload date: will choose the date that is closest to the upload date, eg.
Upload date: 01-01-2023
Date on document: 01-03-2023
Ambgious date because it can be 01-03-2023 or 03-01-2023
This rule will choose 03-01-2023 as it is the closest date to the upload date out of the 2 possible dates
Parsing ambiguous three-part dates
Duco Adaptive IDP allows you to configure how to deal with three-part dates. You can treat the dates in following formats:
- day - month - year
- month-day - year
- year - month - day
- Closest to upload date
There is a special option "Stop" that allows you to exclude parsing options. When the parsing stops this way you can:
- Go to human validation
- Make the entity value blank
Closest to upload date: will choose the date that is closest to the upload date, eg.
Upload date: 01-01-2023
Date on document: 01-03-2023
Ambgious date because it can be 01-03-2023 or 03-01-2023
This rule will choose 03-01-2023 as it is the closest date to the upload date out of the 2 possible dates
Test parser
You are able to test your parser configuration. A default set of examples is provided, but you can fill in your own date value and press the "Test value" button to see how the parser parses your input.
Number parsing
Parsing numbers can sometimes be a challenge and depending on the project it can be different for each project. Duco Adaptive IDP allows you to configure how you parse numbers.
AI parsing
You can enable the use of AI as a parsing fallback. This can be enabled when the parsing fails and/or when the parsing stops when dealing with ambiguous numbers.
A text field is shown (when the AI functionality is enabled) to allow you to give extra instructions for parsing.
Failed parsing
Sometimes the parsing just fails. Here you can decide how to deal with that situation:
- Go to human validation
- Make the entity value blank
- Remove the entity
Parsing ambiguous numbers with decimals
Duco Adaptive IDP allows you to configure how to deal with ambiguous number formats. You can do the following:
- Treat decimal signs always as decimals
- Treat decimal signs always as thousand separators
- Go to human validation
- Make the entity value blank
- Set default settings
- Use one of the following as a thousand seperator
- Dot
- Comma
- Use one of the following as a decimal seperator
- Dot
- Comma
Test parser
You are able to test your parser configuration. A default set of examples is provided, but you can fill in your own number value and press the "Test value" button to see how the parser parses your input.
The parser currently supports parsing natural language dates and numbers in English, French, and Dutch. If you require natural language parsing in other languages, please contact us.