Tasks is the way in Duco Adaptive IDP to efficiently label and review your training data.
Duco Adaptive IDP contains a task module that makes it possible to quickly examine the documents in the training data and correct them if necessary. For example, you can select all documents for one or more entities and step by step it will present the pages where those entities occur for checking.
This module has two panels: the first one shows the created tasks, the second one shows suggestions for creating new tasks.
Suggested tasks
Duco Adaptive IDP will suggest tasks for both data that was uploaded in the training module and data that was uploaded in the production module and needed human intervention:
The suggested tasks show the following columns:
- Task type - review or an annotation task
- Documents - the number of documents in the task
- Source - documents uploaded in training or production
- Model type - entity extraction or document classification
- Document type - corresponds with the document types that are defined in project settings
- Language - the language of the documents in the task
Suggested annotation tasks
Suggested annotation tasks help speed up the annotation process by choosing the most useful data to annotate from scratch. At the end of a model training, Duco Adaptive IDP will select, among the unlabelled documents present in the training module at that moment, those documents from which the model can learn the most. Annotating these documents will allow the model to become more accurate more quickly (active learning). It is thus recommended to have sufficient unlabelled data present in the training module when triggering a training. You will notice that the documents in the suggested annotation task already have some suggested annotations to speed up labelling.
In case you haven't trained a model yet, Duco Adaptive IDP will also bundle unlabelled documents for annotation. However the resulting task won't include model-assisted labelling or any pre-selection of particularly relevant documents.
By default, the following is enabled:
- Grouping of similar documents
- Documents that add the most value are ranked first (based on document confidence score)
- Model-assisted labelling (predictions are loaded based on the previous training of the model)
Suggested review tasks
Suggested review help you improve the annotation quality of existing documents that are already annotated. These documents can either already be in the training set or come from production.
Suggested review tasks for training documents
Only documents that had the status PROCESSED before the last training can be included in this task. Out of those documents, only the ones that are likely to contain annotation errors are part of the suggested tasks.
By default, the following is enabled:
- Grouping of similar documents
- Misannotation hints (based on annotation confidence score): in this section Duco Adaptive IDP will give suggestions about fields or document types that are likely to be misannotated. As a user you still need to validate those hints since they are merely suggestions.
Suggested review tasks for production documents
The following documents are included
- Documents that required human validation and for which the human validation was completed,
- Documents that were manually sent to training from production
In both cases, these documents are included if the production status is PROCESSED but the training status is Input required because these documents have not yet been approved as training data. By performing this task, you will approve these production documents as training data.
This tasks resets after a model training as the results would be outdated. So only documents that were uploaded after the last model training are included.
By default, the following is enabled:
- Grouping of similar documents
Creating a suggested task
By clicking on a suggested task, a pop-up window will open:
The documents for the task have already been added, the only thing left to do is add operators and fill in the optional information if you want. Note that you can split tasks among several operators to keep them short.
Custom tasks
When you create a new task, via the '+ Task' button, the following pop-up window will open:
In this window, you fill in fields for the task
Field |
Description |
Validation type |
Entity type Review or add entity annotations on documents
|
Languages |
Select which languages you want to allow for the task |
Filter options to select a group of documents for checking |
These will allow you to apply additional filters, such as the user who did the annotations or specific document statuses.
|
As a final step you can calculate how many documents meet these conditions. Just like the suggested tasks, you can assign operators and fill in optional information.
Working with active tasks
Once a task is created (suggestion or custom) the task will appear in the active tasks overview where all tasks are shown with their current progress.
Each task can be expanded to show the progress for each individual operator assigned to the task.
When a task is created to which you have been assigned as an operator, you get an extra link which will allow you to start or resume the task.
Document view
When the task is started, the first document will be opened in the labelling view and you can examine the different entities and/or the document type and language.
Task details
The task details show you a high level overview of the task which was created which is relevant to you. This means if a task of 400 documents is created and 100 documents were assigned to you, you will only see those 100 documents.
The task section displays information about the task:
Field |
Description |
Progress |
Your current progress in percentage and progress bar widget. |
Source |
The source of the documents in the task:
|
Task type |
Review Review documents that already have annotations
|
Deadline |
The deadline by when the task should be completed. |
Description |
The description of the task |
Assignee |
Assignee of for the current task |
Documents represented with colored and numbered buttons |
Grey Document needs to be processed
|
Metadata document
The 'metadata' shows all the properties of the document:
Field |
Description |
Status |
Processing Document is being processed
|
Type |
The document type |
Language |
The language of the document which was set manually or predicted by the OCR step |
Pages |
The number of pages in the document |
Name |
The name of the document |
ID |
The unique ID of the document with handy copy button |
Upload date |
The date and time the document was uploaded in Duco Adaptive IDP |
Actions |
Copy URL Copy the URL for easy sharing with colleagues or support when you have a problem
|
Edit document settings
The document settings allow you to change the name of the document, the language and the document type. Changing the document type will delete all the annotations because the entities are different for each document type.
Document actions
At the top of the document processing screen you have a number of buttons that help you navigate between documents and uploads.
- Back - When in human validation, this button puts the document back in the queue. When in training or production data modules, or in tasks, it will go back to the overview of uploads, documents or tasks.
- Upload done - This button is only available in human validation. You use this button when you have successfully finished processing all documents from an upload. The upload will then go to the next step 'output' to send the result to your system.
- Done - When you are done with your intervention, you can mark the document as done.
- Park - This button is only available in human intervention and in the tasks module. Use this button to park the document to handle it at a later time. A popup will be shown where you need to fill in a reason for parking, this will help you later when asking your colleagues or manager for feedback about the document. It reminds you of why you decided to put this specific document in parked.
- Reject - This button is used if there is a problem with the document. Besides some standard errors like 'bad OCR' or 'irrelevant document' you can define your errors in the project settings, see Custom errors.
Entities
This list shows each entity defined for this document together with the number of times this entity was found in the document.
Suggestions and misannotations
The list shows the following data:
Field |
Description |
Type |
The type of the suggestion, can be one of suggested annotation, missing annotations, wrong indices, wrong label or wrong composite group. |
Suggested |
The suggested entity with its value in the document |
Actions |
Click on a row Selects the entity in the document view, giving you the option to apply or ignore. |
Suggestion detail
The suggestion detail in the document view allows you to edit the annotation, validate or reject it.
The following actions are possible:
- Edit - Allows you to edit an annotation by dragging the start and end cursor
- Apply - Applies the suggestion and adds the annotation
- Reject - The suggestion will be ignored and not added as an annotation
Annotations
The list gives an overview of all entities with their values found in the document which have been predicted by a model or manually added by a person. An entity always has a color but if the entity was found by the model it has a lighter transparent color. If the entity was manually added by a person it has a darker color.
The list shows the following data:
Field |
Description |
Name |
The name of the entity |
Value |
The value of the entity |
Parsed |
The parsed value of the entity |
User |
The user who did the annotation. The value AI indicates that the annotation was predicted by the model. |
Score |
The confidence score of entities that were predicted by the model. A red color indicates that the score is lower than the threshold. |
Page |
The page on which the entity appears |
Actions |
Click on a row Selects the entity in the document view, giving you actions to perform on the entity |
You can click a row in this list to see the value in the document. You can add additional entity values in the document by clicking the first and the last word of an entity, or by selecting it with dragging, and then selecting the correct label in the dropdown menu that will open. More info on how to perform entity annotation can be found in Annotation of training data.
Checkbox actions
Selecting 1 or more entities through the checkboxes allows you to perform the following actions:
- Delete annotations - Removes the selected annotations
- Enrich annotation - Can only be used on 1 annotation at a time and allows you to link an enrichment to the entity.
Document preview
The document preview allows you to see what the document looks like. It shows al the entities that were found. This view also has 2 modes:
- Hybrid - the original document is shown
- Formatted - the text is displayed as the OCR model has recognised it
When certain words are not readable in a display, you can always change the display. In the 2 possible views it is possible to add entities.
It is also possible to enlarge or reduce the display or to open it on a second screen.