What is Data Prep?
Duco’s Data Prep functionality allows you to feed multiple data sources - of different types and formats - into a single side of a reconciliation. This means end users can perform highly complex data transformations, from multiple sources in different formats, with no ETL necessary.
What are the benefits?
With Data Prep you can combine multiple processes into one, so you don’t need to set up as many reconciliations. This also means you can:
- Investigate your exceptions with reduced overhead
- Minimise any pre-processing or pre-transformation
- Spend less time on maintenance
There are three main scenarios when one has multiple input sources for a side of the reconciliation that need to be normalised to a single output format:
- Multiple internal systems that do similar things for slightly different data items
(e.g. front office system installations for different asset classes or maintaining multiple sub-ledgers for a general ledger)
- Something that you do with several external parties (e.g. all of the counterparties that you trade swaps with or that you have cash accounts with)
- A system produces related data that should be loaded together, but is produced as separate files (e.g. cash transactions and balances where the balances should be loaded together with the transactions as their opening/closing)
The multi-input feature is not intended to cover scenarios where a user wants to add “more columns” from multiple sources (i.e. enrichment activities such as adding settlement instructions). The platform already supports this using reference data tables. Instead, it is aimed at “more rows” scenarios
- Up to 100 inputs
A single data preparation process can deal well with up to a 100 different input formats. These can be any of Duco’s in-built formats (e.g. SWIFT MT) or generic files (delimited, Excel). It is also possible to upload ZIP files containing multiple individual files of the same input as one of the inputs.
- Up to 400 columns
Every input (and the normalised output) can support up to 400 columns (no matter the data type of the columns)
- Single digit million items per day - across all inputs
A data prep process can process up to high single digit millions items per day - across all of its inputs
- Processing takes no longer than 10-20 minutes
For 100 inputs with 400 fields to 400 output fields for single digit million records processing should take no longer than 10-20 minutes
- Search, filter and sort interactively on data for a particular day
Select a particular day’s worth of data up-front (e.g. end of day run of the reconciliation) and interactively search, filter and sort
There are three principal components involved when using Data Prep:
- Creation of a Data Prep Process
- Creation of a Snapshot
- Use of a Snapshot in a matching process
The Data prep process is not a matching process. Rather, with this process you are configuring an output, derived from multiple sources, that can then serve as an input for a regular matching process.
- units of onward consumption for reconciliation processes
- units of browsing data that has been processed by the data prep process
This guide will break-down each of these steps, explaining the configuration required and best practices.
Creating a Data Prep Process
Much in the way you would create any other type of process, the first step in creating a Data Prep process involves selecting Create Process from the Process tab.
From the Type drop-down select Data prep process. Then populate the requested details.
Note: Masking is not a supported feature for Data Prep Processes
Configuring Data Inputs
Uploading your data
Next you will be required to populate your Data Inputs.
This can be configured in three ways:
- Select Add inputs and you will be presented with a drop-down listing different file formats. Selecting Generic will then lead to an upload window where you can provide Duco with sample data from which it can identify the file’s type and structure. Note that it is possible to upload multiple files at once.
- Selecting any of the other options from the drop-down will not require the upload of a sample, as these are file formats that Duco has stored templates for (e.g. SWIFT MT).
- Alternatively, you can simply drag and drop your files into the Data Inputs pane. This can be an easy way to bulk upload a large number of inputs.
At this stage you should upload all of the inputs that will be involved in the process.
As you would expect with other process types, Duco automatically identifies the file structure and type. These details can be edited if necessary.
When you are manually uploading file, if it is large and will likely take time to upload, Duco will show the uploading status and provide an option to cancel if you realize you have uploaded the wrong file.
Naming your Inputs
By default Duco will use the title of each respective file/sheet when naming the inputs.
However, these file names may not clearly identify the source. As such, best practice is to rename the input in a way that will make its source clearly identifiable, eg AAA Bank, Fenergo, etc. This will become especially important later on when you are analysing the results of the process, as you will be able to determine the source of each row of data.
Remember to hit enter to save your changes as they are made.
Extract file header data for transformation
You can choose to use header data in delimited files in your transformation rules within data prep. This, for example, enables you to extract a date or other meta data that is embedded in the first few lines of your input file, and use a transformation rule to repeat that for every row of data in the file.
If you find in the future that your Data Input setting require adjustment, you can choose to reprocess the input. For example, if you've updated the first data row in your file from 2 to 4, reprocessing your input will prompt Duco to pick out field headers in row 3 of your file, and sample values from row 4.
To make any adjustments, simply edit the relevant field to reflect the change, ie. a different delimiter or a new First Data row. This will then enable the Reprocess button.
Note: Existing input fields may be deleted as a result, and you may need to reconfigure your transformation rules.
Process balance as separate output lines when using MT940
If you use a SWIFT MT940 as one of the inputs for your data prep process, you have the option to choose to only output transaction lines (default option), or also include balance data as output lines. When using this option, transaction and balance lines are distinguished by a field called Data type, which you can then use to process as balance within Data Prep with the Map balances feature, or pass them all to the recon processes.
To view the fields (and accompanying line of sample data) from within a file, either click on the input or check the box next to the file name. This will display the fields from the input. If necessary, these fields can then be edited in the same way you would expect with other process types.
A useful search function is available by clicking on the magnifying glass on the top-right of the pane.
Checking the inputs enables the Delete button, allowing you to remove inputs individually or in bulk.
Map and Transform
The next step after you have uploaded your inputs is to map your fields. This mapping will create your output schema, ie. all of the fields that you want to make available for use in your subsequent matching processes.
Best practice here is to identify from your inputs which one will act as your reference for the output format. This would be the input that is closest in structure to the desired output.
To make this step a little easier, at the Data Input stage you may want to upload your “reference input” first so it will appear at the top of the list.
First you will need to map the desired output fields that will constitute your output schema. Once these have been mapped, you will then map the corresponding fields from the other inputs to these output fields. For example:
Clicking on the name of a file will reveal its fields. You can then begin to check the specific fields you are looking to map as your output fields.
Once you select your field/s click on Map.
The field in the Input section will now be marked as “Mapped”. The Output section will populate with the values from the file.
If you need to modify the field name or data type for your mapped field, you can do so by hovering over the existing field name and then selecting the pencil icon that appears.
Now cycle through the “reference file” selecting all fields that you would like to have mapped. Remember that the output fields you map here can also be derived from any of the other files you have uploaded, not just the same file.
It is possible to bulk select fields when mapping.
As you do this, an overall progress bar at the bottom of the screen charts your mapping progress. The score that displays is derived by taking the number of output fields, multiplied by the number of inputs, and then calculating as a percentage the total the number of “cells” that have been mapped.
Also useful is the Show outstanding button which allows you to toggle between seeing all fields or just those that have not yet been mapped. Selecting this can spare the need to scroll through all of your fields as you perform your mapping.
Mapping your other inputs
Now you can begin to work through your other inputs. When you pick a new file to map, you will see the message Add mapping/rule under each field header. For simple one-to-one mapping between fields check the field from your input and then the corresponding Output Field you would like to map it to. Then select Map.
Once mapped, you will see the underlying cells populate with data, and the name of the input field will appear in blue underneath the Output field name.
You’ll also notice a status bar against each input you are mapping, that indicates how close you are to completing the mapping specifically for that input.
As you work through the fields you are mapping, you may find it useful to click on the Show outstanding button, as this will clear the screen of fields that have already been mapped. Note that any item that has not been mapped across all inputs will qualify as outstanding.
The magnifying glass icon next to Show outstanding can also be useful when working with large numbers of fields, as you can search and isolate a specific field.
Passing cash statement balances for reconciliation
You can also normalise your cash transaction data in your Data prep process, to be used in your cash reconciliation process. Now you can use rules to extract statement balances from your data file and pass them to the cash reconciliation process.
Sometimes your mapping will not be a straightforward one-to-one scenario, but will instead require some sort of data transformation. You can do this by selecting the Add mapping/rule button. This will return a pop-up box where you are able to configure an NRL rule to convert the data, e.g. transforming or normalising it.
If your data is transformed, Duco will evaluate the rule by applying it to the sample values from your file. This will allow you to immediately see whether or not your rule has worked, meaning you can catch mistakes early, rather than having to wait until a run has been completed using the data to identify problems with a rule.
You can easily identify where a rule has been applied, as a label saying Rule is placed under the field name.
Recently, we have added the ability to use the file name as part of the NRL rule. Often there is useful information embedded in the file name submitted to Duco that you may want to use in Data Prep processing. For example, part of the file name may be the trade date that applies to the transactions in that file.
Now you can use the submitted file name as part of your Map & Transform rules or Filter rules in Data Prep.
An alternative to manually mapping your fields individually, is to leverage the Quick Map feature. This can be especially useful where you have a large number of inputs to be mapped to a particular field, particularly where the mapping is simple one-for-one.
Generally users will find that the most efficient way to complete their mapping is to begin with Quick Map to clear out the simple one-to-one matches, and then move to manual mapping for the inputs requiring rules.
To use the feature, first select the field you are looking to map. This will then enable the Quick map button.
Now click on Quick Map. This will return a box where you can search for candidate fields from your other inputs to map against the output field you have selected. All of the fields across all of the other inputs will be available here.
Simply check any of the fields you would like to map from the selection, then either select Map, or Map and next if you want to cycle through other fields.
Note that the Quick map button will not be available if you have multiple fields checked.
You can use NRL rules to filter out records from individual inputs that you do not want to be processed in your data prep process.
Click Filter Rules, then New rule to create a new filter.
You will get a pop-up to choose the input this filter will be applied to. You can choose several data inputs at the same time.
You can then use Natural Rule Language to write the filter rule. Start typing to search for the required NRL operator.
Once the NRL is ready click Apply rule. Your filter will be saved.
Switch on the filter by moving the toggle. You will get a confirmation that the change is saved. Your filter is now active and will be applied to the next submission.
The rule can be deleted when highlighted.
The result control setting contains two separate toggles, allowing you to report filtered items from the main result set or the balance data individually.
Best Practice Flow
Many users will find the below flow will serve as the most efficient way to create their Data Prep processes.
Like other process types, Data Prep offers different levels of access rights.
The table below outlines the permissions that are allocated to a User/Group based off the roles that are associated to their account.
View submitted data
Submit new data
Change process configuration
Change process permissions
Can upload sample data for new data inputs
Resubmit data, delete submissions, replace submissions
Enforcing configuration changes to go through 4-eye approval
Similar to our reconciliation processes, once your Data Prep process goes into production, you can now lock down the configurations so that any changes will need to go through a staging copy and subsequently a 4-eye approval process.
Creating a Snapshot
Snapshots are units of onward consumption for reconciliation processes. Or to put it another way, a snapshot represents a parcel of data that can then be used as an input on a generic two-sided process. This step is taken after you have completed your mappings and any necessary transformation.
However, before we look at the creation of a snapshot, first let’s begin with some orientation for this screen.
The results screen is divided into four tabs:
Pending: Results table showing data that has just been uploaded, but isn’t yet ready to be moved into a matching process. This data will need to be captured as a snapshot before it can be passed into a reconciliation. This tab has additional sub-views that are detailed later in the guide.
Snapshots: A quickview of recently created snapshots, with an option to view all snapshots. Here you can see key information about the snapshots that have been created, such as the number of inputs or the timestamp. You can also drill-down into a view of the actual data parceled together for a given snapshot.
Submissions: List of all files that have been submitted, regardless of whether or not they have been used in a snapshot.
Inputs: Lists your inputs, with the latest file that has been submitted for it. This view can be used to identify late submissions.
Submitting your data manually
To create a Snapshot, exit the Settings menu. This will navigate you to the Results screen, which at this stage will be blank.
Select Submit Data.
This will prompt you to provide submissions either through drag and drop or by upload.
You will then need to associate the files you have submitted with the appropriate input.
Select Submit data. The Results Table will then populate with the data from the input files.
You can view all of your snapshots using our infinite scroll feature, which eliminates the need to look through multiple pages. In addition, when there are new snapshots available while you’re browsing, the app will prompt you and you can either ignore it to continue browsing, or choose to refresh to see the latest ones.
After data has been submitted, some other views will also populate. Selecting Submissions on the left of the screen you can view a list of the files you have submitted, complete with a Submission time. Errors will display the number of errors found in a submission, if any. If present, you can click on the number to drill into the detail around the errors.
Errors will list any bad submissions, identifying the offending file or field. An accompanying error message will explain the specific problem for each line item. For more information on errors click here.
Options are available to resubmit, replace or delete submissions. To perform any of these actions, go to the Submissions tab and select the relevant file/s and then go to the Actions button.
Submitting your data automatically
You can also submit files automatically through SFTP to your Data prep processes. Just like how you do it for reconciliation processes, you can set up Submission routing to configure how files submitted through SFTP to Duco should be used in your Data Prep processes. More about SFTP setup in this article Automated file submissions via SFTP.
FAQ: What if I adjust the original output schema after I’ve made submissions? Say, for example, you decide to map a new field at this stage. No snapshot has been created yet, so the data is still regarded as being in a pending state. Any change to the configuration will prompt a pop-up message that will present you with the option to either discard the pending data you have already submitted, or to go ahead and create a snapshot.
Creating a snapshot
This data you have just uploaded is not yet ready to be moved into a matching process. First we need to create a snapshot. This can be thought of as a pending parcel of data that will then be passed into a reconciliation.
To create your snapshot manually, select Create snapshot.
This will then update the Snapshot column in the All submissions section to reflect the change. This view can be differentiated from the earlier Submissions tab by the fact that it will show all submissions made, not exclusively ones associated with this one particular snapshot.
The Actions button for this view allows for the option to make a resubmission.
The Snapshots tab will allow you to quickly navigate between your recent snapshots.
This view will list the snapshots, with the number of Inputs shown, the Snapshot date and a Triggered by column that will state whether the snapshot was manually or automatically created.
While scrolling to the right you will find details of how much data was submitted to a Data Prep process and processed per snapshot, helping you to quickly assess if all data have been received, or your filter rules are working as expected.
Clicking on a specific snapshot will take you to a Results view.
Results lets you see the actual data submitted line for line, for each of your inputs. These results represent the parcel of data that will be passed to a matching process.
The Inputs tab will list you all of your Inputs, with the Latest submission column identifying the latest file submitted and Submission time prodiing a timestamp for the submission. This view can be especially useful from a controls perspective as it can be used to identify outstanding submissions.
Many (or zero) snapshots in a data prep process will then result in a run in the reconciliation process. This can happen in one of four ways for vanilla reconciliations:
- A snapshot has been created and the other side of the reconciliation already has an outstanding file on their side and so a reconciliation run is triggered.
- A (or several) snapshot has been created since the last reconciliation run. There is no file outstanding for the other side yet, but one arrives later on.
When that file arrives, a reconciliation run is triggered that will cause all snapshots that have not previously been processed in the reconciliation to be processed.
- A reconciliation run is triggered through a submission window.
(Similar to the above case, all “outstanding” snapshots will be consumed in this case.”)
- A run is triggered manually in the reconciliation: This will now have the option of using data from snapshots that have not yet been processed in the reconciliation. This is described in more detail below.
For cash reconciliations, a new snapshot will immediately result in a cash reconciliation submission.
To automate the creation of snapshots you will need to configure Triggers. Three types of trigger are available:
Time - A specified day/time
Completion - When all submissions have arrived
Submission - Every time a submission arrives
These triggers can be used individually, or in combination with each other. An example of a use case could be where a trigger is used to generate a snapshot automatically at a particular cut off time. This could be achieved using a time trigger for, say every week day at 7am. This would generate a snapshot of all submitted data in a pending status up until this time.
Choosing multiple triggers will structure them using an Or condition, allowing a snapshot to be generated under flexible conditions.
To delete a trigger, tick the check box and then click on Delete.
Using your Data prep process
Once configured, a Data prep process can then be selected as a Data Input for your match processes. In this example I have created a generic two-sided process. There is no change to the initial steps of the build, with the usual requirements for a process name, short code, etc. However, where you will see a change is in the Data Inputs step.
In the Format drop-down in Data Inputs you will have an option to select Process, Data prep process.
Selecting Data prep process will then lead to a second step where you specify the Data prep process you would like to use.
This will then take the output schema defined in your Data Prep process and use it as the Data Input for the process. As such, once selected the screen will then show all the Output Fields you defined.
Note that unlike with other Data Input types, the fields here will not be editable. This is because any necessary adjustments will have been already within the Data Prep Process.
Once you have selected both your Data Inputs you can then move on to configuring your match fields. Note that Duco Alpha is not currently available for processes using Data Prep.
Matching the fields is done in the regular manner, and NRL can be applied where necessary.
When it comes to submitting data for a process using Data Prep you will see a number of unique submission options.
This will submit any data that you have in pending results, rather than in a snapshot already. In effect this option creates a snapshot (assuming the user has the requisite permissions) using that pending data, and then uses it as the submission. This can be particularly useful when setting up your process, or on a run where some additional data that has arrived at the end of day.
This will submit all of the snapshots not yet consumed by the reconciliation. An example of this could be where a number of snapshots have been generated overnight and now you are looking to use them as one consolidated submission.
Here you can identify a specific snapshot you are looking to submit.
Specific snapshot or onwards
This option allows you to pick a particular snapshot and then have that submitted along with all snapshots that were created subsequently. A common use case scenario for this submission type is where multiple processes leveraging the same Data Prep process but you are looking to consume specific snapshots into particular matching processes.
Viewing your results
When viewing the results of a process, it is possible to trace each item back to the constituent Data Prep input the data is derived from.
While the Data used gadget on the Overview screen shows the specific snapshot used.