In today’s enterprises, the majority of data is unstructured - found in formats like PDF, emails or images. Traditionally, the industry has relied on Human APIs - people manually ingesting data from unstructured formats into automation platforms. This approach is slow, costly and prone to risk.
While Intelligent Document Processing (IDP) solutions have tried to address this challenge, they often fall short when faced with changing conditions or diverse document types and formats. Their lack of adaptability results in inefficiencies and missed opportunities for automation.
With Duco’s Adaptive Intelligent Document Processing (AIDP) technology, you can automate the extraction and ingestion of data from unstructured sources, overcoming your biggest reconciliation blocker - handling unstructured data. The extracted data can also be used to feed into downstream systems. Duco’s AIDP is designed to extract data fields with exceptional accuracy from a wide variety of unstructured formats and seamlessly organise this information for your reconciliation processes.
In this training manual, we will guide you through using AIDP as an input source for Data Prep within Duco, enabling you to consolidate data from different sources into a single, normalised data set for use in reconciliation or other processes, within or outside of Duco. By the end, you will be equipped to leverage AIDP to its fullest potential.
What is AIDP?
Adaptive Intelligent Document Processing (AIDP) is a powerful technology designed to automate the classification, data extraction, and processing of information from diverse unstructured data formats. Unlike traditional IDP systems that rely heavily on fixed rules and templates, AIDP leverages artificial intelligence to continuously learn and adapt to new data, evolving use cases and changing business requirements without the need for manual reconfiguration. This flexibility allows you to efficiently handle a wide range of document types and formats, significantly boosting operational efficiency and reducing the need for manual intervention.
Key features
The core features of AIDP are:
-
Artificial intelligence and machine learning capabilities
- Leverages artificial intelligence and machine learning to identify patterns and extract relevant information from documents, emails, images and more
- In AIDP, LLMs (large multi-modal models, including layout information) recognise document types and extract data from them. Duco uses private, supervised models that are fine-tuned for specific document type, which are a lot more accurate than generic LLMs like ChatGPT
-
Continuous learning
- AIDP continually learns from each document it processes and adapts to new documents and evolving business requirements without extensive manual (re)configuration
- Incorporates user corrections and validations into its learning process, avoiding repeat mistakes and enhancing future performance
-
Adaptive capabilities
- Capable of managing diverse document formats (scans, emails, images etc.), languages and layouts without needing extensive manual configuration
- Algorithms learn from context and interpret layout and text, not just memorise it, so it can adapt when slight changes occur without the need to retrain
-
Straightforward model training & transparent model analytics
- No-code training: AIDP does not require code to train the data models
- Active learning, model-assisted ‘autocorrect’ for annotations and user feedback enhance model trainings
- Transparent model analytics: document classification and extraction performance, straight through processing rates, and business rule performance
Benefits
Duco’s AIDP transforms the reconciliation experience and will help you overcome your biggest reconciliation challenge - the reconciliation of unstructured data. AIDP adds value to reconciliation in different ways:
- Handles levels of variety and change in documents and formats
- Reconciles multiple unstructured data inputs in a single process
- Increases accuracy, scalability and flexibility in document processing
- Removes risks related to manual processing (e.g., time-critical processes, human errors)
- Ensures transparency and lifecycle visibility
- Enhances governance and control
Example use cases
- Broker Statements
AIDP simplifies the processing of broker statements, which are often delivered in unstructured formats such as PDFs. By automatically extracting key financial data (such as trade details and fees), AIDP streamlines the reconciliation of these systems against internal systems, significantly reducing the manual effort and improving data accuracy.
- OTC Confirmations
Over-the-counter trade confirmations often come in varied formats, with important trade terms buried within lengthy documents. AIDP extracts essential details (such as national amounts, counterparties, settlement dates, and interest rates) and prepares this information for comparison against internal trade records. This ensures accurate reconciliation for complex OTC derivatives.
- CSA Documentation
Credit Support Annex documents, which govern collateral management for derivative transactions, are critical for compliance and risk mitigation. AIDP helps extract relevant clauses, collateral amounts, and legal terms, enabling financial institutions to reconcile collateral movements accurately and keep in line with regulatory requirements, all while reducing manual document handling.
- PE - Credit/capital call notices
Capital call notices from private equity funds can come in various unstructured formats, requiring detailed extraction of fund names, investor commitments, capital amounts, and due dates. AIDP automates the extraction of these key data points, allowing firms to streamline their response to capital calls and ensure reconciliation with internal cash flow forecasts and commitments.
These examples illustrate how AIDP enables automation of complex, unstructured documents across various use cases, increasing operational efficiency and ensuring accuracy in reconciliations.
Getting started with AIDP in Duco
What you will need to begin
Before diving into the AIDP capability within Duco, ensure you have the following in place:
- Data Prep should be enabled in your environment, as AIDP acts as an input source for this process
- The necessary permissions to submit and process documents for the relevant Data Prep workflow
These are the 5 key components involved when using AIDP in a Data Prep process:
1. Configure an unstructured data input
2. Map and transform your data
3. Submit data into the Data Prep process
4. Validate documents (if required)
5. Create a snapshot for reconciliation and view results
This guide will take you through each step to ensure you can fully harness the power of Duco’s AIDP and streamline your reconciliation workflows.
Let’s get started!
Configuring an input in Data Prep for unstructured data formats
Create process and define inputs
To begin, create a new Data Prep process in Duco:
After setting up the process, under General you’ll see a Description box. Here, you can add a description to provide context around the process's function for your peers.
Next, you’ll need to define the unstructured data input(s). This can be done by navigating to the Data Inputs section in the Settings of your Data Prep process.
To add an input, click ‘Add inputs’, and a drop-down with several format options will appear, including Standard, Unstructured, FCA, FIXML, ISO 20022, and Swift MT Messages, etc.
For this example, we’ll be adding an Unstructured data format.
Note that Duco supports reconciling multiple unstructured data inputs within a single reconciliation process.
Once you’ve selected an unstructured data input, you’ll need to specify the project and document type. Projects house the different document types that are defined and processed by your organisation, and can be set up based on geography, department or business unit, for example.
A project can contain multiple document types with different formats and lay-outs. In our example, we have simply named our Project ‘Broker invoices’, in which we process the document type ‘Broker Invoice’.
Duco’s AI models will extract and structure the data fields based on these document types. The extracted values from the data fields are then used for reconciliation.
In the interface, Duco will display the file type as ‘Unstructured’ in a grey box and mark it as ‘Incomplete’ until you define the Project and Document type, then click ‘Create input’.
Once you’ve selected a Project and Document type and clicked ‘Create input’, the corresponding data fields for your selection will populate on the right side of your screen. These are all the fields that have been configured, meaning they can be extracted and used as input for a Data Prep process. Each input in a Data Prep process represents one document type from your project. You can include multiple document types from the same project in a single Data Prep process.
For this unstructured input, we can now see five fields related to the input we selected. Duco will display the data type for each field in the grey box next to the field (e.g., text, decimal).
Next, we’ll add a second unstructured input, as Data Prep is commonly used to consolidate different files (often of different types) into one side of a reconciliation.
Once the second input is created, the data fields for this input will also populate on the right side of the screen. You can toggle between the two unstructured inputs to view their corresponding fields. In this example, the naming conventions for the fields as well as the brokers differ.
To clearly distinguish between the two inputs in the UI, we can adjust the naming conventions for the two brokers. Simply click on the file name and edit the label - for instance, ABN Amro and Bank of America in this case.
At this stage, we’ve successfully created our Data Prep process and defined the unstructured inputs. In the next section, we will explore how to map and transform the data to meet our specific requirements.
Mapping and transforming unstructured data inputs
After defining our unstructured data inputs, the next step is to select the necessary data fields and map them to our output for a two-sided reconciliation against our internal file. Mapping is an essential step in the Data Prep process, and the fields you choose to map will depend on your specific use case and requirements.
To begin, navigate to the settings of your Data Prep process and select ‘Map and transform’.
From here, we can add output fields by selecting an input we’ve defined earlier. Let’s start with mapping the fields for ABN Amro. A good practice is to begin with the primary file - i.e., the file containing the majority of output fields you intend to use. Depending on your needs, you can either map all data fields or only a select few. Once the fields are selected, you can map them by clicking the ‘Map’ button.
As fields are mapped, Duco will display your progress both for the specific input category and for the overall progress. For example, after mapping all fields for ABN Amro, your overall progress might show as 50%, indicating the fields for Bank of America still need to be mapped.
Mapping methods
When mapping fields for Bank of America, you have two options:
- Manual mapping: manually match the fields with those of ABN Amro. As you map each field, you will see the progress updated.
- Quick map functionality: use Duco’s quick mapping feature to quickly match and map the fields, which is especially useful for large files with many data fields. You can also toggle the ‘Show outstanding only’ option to focus only on the fields that haven’t been mapped yet. This will give you a sense of the mapping that still has to be done.
A best practice is to begin with the ‘quick wins’ by leveraging Duco’s quick map feature for fields that are obvious matches. For fields that don’t align perfectly due to naming conventions (such as ‘Description’ and ‘Product’ in our example), manual mapping will be required. Once all fields have been mapped, the progress bars will reflect this completion.
Data transformation and normalisation
When the necessary mapping has been completed, you can also apply transformation and normalisation rules to these data fields using Duco’s Natural Rule Language. To do so, simply click on the blue link for a specific data field, and a rule window will appear, allowing you to configure the necessary transformation.
Once we click on the blue link for a field, we can apply a transformation using NRL. In our example, we want to modify the invoice number by extracting only the first six characters. We will perform this transformation for both ABN Amro and Bank of America.
After creating and saving the rule, Duco will instantly apply the transformation to the data, indicating that a rule is now defined for that field. To verify that the rule is working as intended, we can review the transformed data. Additionally, we can click on the rule to view or edit its details at any time.
With our data fields mapped and the transformation rule in place, the process is fully configured. We are now ready to submit and process documents - in this case, the broker invoices. Let’s exit the settings and move on to the next step.
Submitting documents to your Data Prep process
After configuring inputs, mapping and transforming your data fields, the next step is to submit data to your Data Prep process. To begin, navigate to the ‘All submissions’ tab in your process.
Data submission
Here, you’ll see a submission page displaying any historical submissions, if available. To submit a new document, click the green ‘Submit data’ button located at the top left of the screen in the submissions section. A pop-up window will appear, where you can either click ‘Upload files’ or drag and drop your files directly into the window. Once you’ve selected the files you want to submit, choose the unstructured input from your Data Prep process and click ‘Submit data’.
As soon as the files are submitted, the submission page will begin populating with detailed information about the documents. This page contains various column headers that provide insights into each submission:
- Submission: displays the file name of the submitted document
- Input: the data input associated with the document
- Submission time: the date and time of the submission
-
Submission status: overall status of the submission
- Processed
- Not processed
- Queued
- Deleted
-
Document status: the status of the document itself, with a button to view submission details
- Processing
- Processed
- Action required
- Deleted
-
Submitted by: how the document was submitted
- Manual
- SFTP
- API
- Used in: indicates where the processed data is being used
- Total: The total number of documents
- Submitted: the amount of submitted documents
- Filtered: the number of filtered results
- Errors: the number of errors in a document
Data filters
You can apply filters to column headers based on your needs, such as filtering documents by submission time or document status. Additionally, you can sort columns by clicking and dragging them into your desired order.
When a filter is active, Duco will display a grey filter icon for that column. To remove a filter, simply click ‘Reset’ in the relevant column. Alternatively, you can click ‘View’ to reset filters and/or sorting for all columns. There is also an option to check a box next to the ‘View’ button to display only documents requiring user action. This will give you an overview of any tasks that need attention.
In the next section, we will cover how Duco processes and validates documents after submission.
Validating and processing documents
Duco’s AIDP simplifies the automated processing of unstructured data from your documents while leveraging user feedback to continuously enhance its performance. From day one, our Professional Services team will equip you with a trained AI model for data extraction that is production-ready, to accelerate time to value and increase ROI. Over time, this model will improve based on user interactions, reducing the need for manual intervention. Documents that require human review will help retrain the AI models, leading to better accuracy and less human intervention as you go. In this section, we’ll walk you through how documents are processed and validated by the user.
Document status
In the submissions page, you can access documents by clicking the arrow icon in the ‘Document status’ column. Keep in mind that you won’t be able to access documents that are still processing or have been deleted. The documents can have the following statuses:
- Processing: the OCR (Optical Character Recognition) is being performed, making the document’s contents machine-readable for further processing
- Processed: the document has been successfully processed, either automatically or after human validation
- Action required: the document needs human validation before it can be fully processed
- Deleted: the document has been removed from the submissions
Documents requiring validation will show an ‘Action required’ status in the ‘Document status’ column. To quickly locate these, you can filter the list by checking the ‘Action required’ box at the top of the column headers. This makes it easier to find documents that need immediate attention.
Document validation interface
To enter a document, simply click on the grey arrow icon next to ‘Action required’.
Once you click the icon, you’ll be taken to the document validation interface, where you can review and confirm the information extracted.
The view consists of three main sections:
- Left section: Document metadata and submissions
In this section, you’ll find key metadata about the document, such as its status, type, language, number of pages, name, document ID, upload date, and a description of the submission. You can also toggle between different documents within your upload. In our example, the submission contains a single Broker Invoice.
- Middle section: Entities, issues, annotations
Here, you can update the document’s status based on your review:
- Done: select this option once you’ve validated the document. This will update the document’s status to ‘Processed’
- Park: use this option to temporarily set the document aside for further review. You can add a reason for parking it (e.g., “Check with Middle Office team”)
- Reject: choose this option if the document should not be processed (irrelevant document, poor OCR results, etc.)
Below the status controls, you’ll see the different entities extracted from the document. In our case, these include Invoice Number, Amount, Description, Quantity and Unit Price. Each entity is colour-coded and includes a number indicating how many times it has been identified in the document. Entities in grey haven’t been found, which may be due to missed extraction or their absence from the document. The different types of entities include:
- Text entity: A value in textual form, like an account name or invoice number. When labelling documents, you can select one or more words to represent this entity.
- Paragraph entity: Optimised for longer text, these are used for labelling single paragraphs or a few lines of text. However, avoid using this for entire pages of text. Unlike text entities, paragraph entities cannot be part of composite entities (such as line items).
- Composite: A collection or group of related text entities. For instance, an invoice line item might consist of a description, unit price, quantity and amount.
Next, any issues with the document will be displayed. These are the reasons that the document is marked as ‘Action required’. In our example, the Invoice Number entity wasn’t found, which required manual validation. Other potential issues could include a confidence score below the threshold, non-unique values or failed parsing. Depending on the issue type, clicking on the issue item will take you to the relevant part of the document that needs validation. However, for required entities that were not found (like our missing invoice number), you won’t be able to click the issue since the value doesn’t exist in the document.
Below the issue section, you’ll see the annotations view, which can be displayed as either a list or a table. This section includes the extracted values for each entity, along with any parsed values if applicable. You can also see which user has made the annotations. In this example, all the annotations were made by the AI entity extraction model. If a user manually annotates an entity (such as the missing invoice number), their initials will appear next to the entity. Each annotation has a confidence score. If this score falls below the threshold, the document will require human validation. When a user manually annotates an entity, the confidence score is always set to 100%.
- Right section: Document view and annotations
This section displays the document that you are currently working on, with all the annotations for the extracted entities. To edit or add annotations, you’ll do so directly in this view.
By default, you’re viewing the document in Hybrid view, which shows the original PDF. You can switch to the Formatted view, which displays the OCR-processed document. On the top left, you can select the type of annotation you want to apply, such as Text Entity, Composite Entity, or Table Annotation. The top right provides options for annotation methods (click or drag), as well as zoom controls and full-screen mode for better navigation of the document.
Human validation
Now, we will have a closer look at document validation. When a document requires user validation, its status will be marked as ‘Action required’. By clicking on the corresponding icon, you can begin making changes to the submitted document.
You can switch between full view and partial view using the toggle in the middle section of your screen. For document validation tasks, it’s often more convenient to use the partial view for easier navigation and focus.
In our example, the required entity, ‘Invoice Number,’ was not detected by the entity extraction model. While the composite entities (which consist of items like description, unit price, quantity, and amount) were successfully extracted, the Invoice Number entity wasn’t found. Upon reviewing the document, we can see that the Invoice Number is present. The reason it was missed is likely related to the training of the extraction model - this model has been trained on a limited set of documents. To resolve this issue, the user must manually add an annotation.
There are three ways to annotate data within Duco:
- Text annotation
- Composite annotation
- Table annotation
We’ll explore each of these methods in detail, starting with text annotation.
Looking at our example, to annotate the ‘Invoice number’ entity, you have two options:
- Annotation via entities: begin by clicking the entity that you want to annotate (in the middle section, click the entity under ‘Entities’ or ‘Issues’), then click the text in the document
- Direct annotation: begin annotating by clicking or dragging the text in the document, then selecting the entity from the drop-down list that appears
Once the entity is annotated, the extracted value will appear in the ‘Annotations’ section. You can also click directly on the entity in the document view to view the extracted entity in the pop-up that appears. The annotations section will also show the initials of the user who made the annotation. You can edit the extracted value by typing in the parsed column.
All annotations can be edited and deleted in the document view:
- Editing: to edit an existing annotation, click on the entity and use the handle bars to adjust the selection
- Deleting: to delete an annotation, select the entity and click ‘Delete’ in the pop-up, or hover over the annotation and click the bin icon that appears
Next, let’s have a closer look at composite annotation. Composite entities are groups of related text entities. To perform a composite annotation, follow these steps:
- Select the composite icon at the top left of the document view
- Begin annotating the first entity by clicking the relevant text and selecting the appropriate entity from the drop-down
- Continue to add entities to your composite until completion
When annotating composites, the selected group will turn blue, indicating that all subsequent annotations will be added to that group. If you wish to deactivate the composite, you can:
- Click the composite annotation icon again, or
- Click the blue icon in front of the composite group, which will turn black, indicating that it’s deactivated
When annotating composites, it’s possible that you have to annotate information behind the black or blue composite icon in the document view. To this, click on the eye icon in the document view to make the composite icons disappear. Simply click on the icon again to make them appear.
Finally, you can use the table annotation method to efficiently annotate line items (composites) if needed for your use case. This convenient feature significantly accelerates the annotation process for line items. To begin, click the third annotation icon in the document view. Using your mouse, select the table you wish to annotate, and detect the rows and columns. Then, configure the rows and columns according to your specific requirements.
In the rare event that you are unable to annotate a data field due to poor OCR results, you can perform a manual annotation. While this method allows you to add a value to a data field when regular annotation is not possible, it is not considered a valid annotation method because the extraction models will not learn from manually annotated entities. Therefore, manual annotation should only be used as a last resort. To perform a manual annotation, simply click on ‘Manual Annotation’ in the middle section, then add the entity, value, and page number. For composite entities, first click the composite icon in the document view, click on ‘manual annotation’ and add the relevant information.
At any point in time during or after the validation process, you can access the history to review the actions taken on the document. This is useful for tracking changes, annotations, and ensuring that all issues have been properly addressed.
Once you’ve completed the validation and resolved any potential issues, simply click ‘Done’. Duco will notify you, and the document will be processed. Once the document is marked as done, you can close the window and return to Duco, confident that your data is now properly processed and ready for use. The extracted data will then populate in the pending ‘Results’ bucket.
It’s important to note that Duco’s AI models continuously improve based on user feedback. Each validation helps the AI learn, increasing accuracy and reducing the need for human intervention in future documents.
Create a snapshot and reconcile your data
Once your documents have been processed, either automatically or through user validation, the pending results bucket will begin to populate with the data fields that you previously mapped. Every submitted file will initially go into a ‘Pending’ status until you’ve reached a cut-off point, at which time you can proceed with reconciliation.
This cut-off point is flexible and will depend on your specific use case. Typically, it’s reached when you have enough submissions to proceed with reconciliation. When you’ve reached this point, the next step is to capture and package the data into a consumable unit for reconciliation.
Create a snapshot
Once the data is packaged, it’s called a snapshot. While we’ll walk you through creating snapshots manually, keep in mind that in a production environment, you can automate this process based on the following snapshot triggers:
- Time: packages the data at a specified time
- Completion: automatically creates a snapshot when all submissions arrive
- Submissions: generates a snapshot with each file submission
To manually create a snapshot, simply click the green ‘Create snapshot’ button in the pending results section.
You’ll receive a confirmation that the snapshot has been saved, and the data will be moved from the pending results section into ‘Snapshots’. You can view your captured data by navigating to the ‘Snapshots’ section and clicking the embedded snapshot link. You’ll find detailed information, including the number of inputs used, the timestamp, the trigger, and where the snapshot is being used. At this stage, your snapshot is ready to be used in reconciliation.
Moving to reconciliation
With the snapshot created, you’re now ready to move forward with reconciling the data against your internal file. To do this, create a new process for a two-sided reconciliation:
- Create new process: select the use case for your reconciliation, name the process, create a shortcode, and assign a data label
- Upload files: on the left side of your screen, upload your internal file (generic file), and on the right side, select the appropriate Data Prep process that includes the snapshot
Once the files are uploaded, manually match the fields between the two data sources. Note that Duco Alpha cannot predict field mappings if one side is a non-generic file.
After matching the field, you can exit the settings and run the process by submitting new data. Upload the internal file and choose the snapshot you want to use for reconciliation.
When the run is complete, you will receive a notification. Based on the results, you can further build and refine your reconciliation process in Duco.
In this training, we’ve covered the entire process of using Duco’s AIDP capability with Data Prep - from defining unstructured inputs and mapping data fields, to submitting and validating documents and creating snapshots for reconciliation. With this foundation, you’re now equipped to confidently automate your reconciliation workflows and processes.