The following document describes the recommended steps for ensuring more effective data parsing when using fixed width files in Duco.
A fixed width file is a text file where the data is organised into rows and columns, and each column has a fixed width in characters. The file does not use delimiters to separate the fields, instead using padding characters to fill in the remaining space.
These files may, though not always, be produced to ensure backward compatibility with older legacy systems and standards.
File extension of such files could be .DAT, .TXT, no extension at all - and is generally not relevant.
Often, such files contain header and footer rows in a different format, that denote file characteristics and integrity checks such as number of rows, date produced, etc. Also, typically, there are no column names, and instead the files come with an accompanying documentation that describes the data and its relative positions within the file.
It is essential that this accompanying documentation is located before attempting to parse this file in Duco, and within the document, the section is identified that describes the data records, and not headers and footers.
This documentation could include field position, length and format:
Or, it could be just field order, length and type - leaving the user with the need to figure out the position :
If this is the case, you can open the file in an advanced editor such as Notepad++ or BBEdit and identify the position by placing the cursor at the start of the value:
Duco Data Inputs screen
We recommend using Data Prep for handling fixed width files. It makes subsequent handling of such files in the downstream processing easier, since it creates easy-to-handle output columns. It is also possible to use Fixed Width files directly in reconciliations. In that case, the NRL to extract the values would have to be done in the Matching Fields screen.
When attempting to parse such files in Duco, it will identify it in as a single-column comma-delimited file (comma is the default if no delimiter is found).
In Data Inputs, it is recommended to do the following:
- Replace the file name with the appropriate meaningful name for the input
- Rename the column (in this example, “H”, to a name that actually describes it: we usually use “FWRow”)
- Change the delimiter from comma to something less common, like a pipe character. That avoids Duco interpreting commas that may appear in your data as a delimiter, and splitting your fixed width row into multiple columns. Having a variable number of columns would lead to errors when loading such files.
- Remove the default quote character (avoids similar issues with quotes that may be in the rows)
- If needed, change First Data Row to where the body of your text actually starts
Data prep Map-and-Transform screen
Note: If you are ingesting this type of file into a reconciliation directly, you would use similar NRL in your process’ Match Fields screen
Background reading:
Configuring rules using Natural Rule Language (NRL)
Text fields:
These are the simplest: use Text portion of [Field] From-To Position NRL function, e.g.
Set to text portion of FWRow from 2 to 10
Numeric fields:
You may need to convert your text to a number. Sometimes, if decimal point is not provided in the file, you may need to obtain it by dividing by 100:
Set to first number embedded in text portion of FWRow from 12 to 16 / 100
Date fields:
You will need to convert your text to a date using Set to Date in <something> in format [format].
If, for example, the date is contained in 8 characters starting at position 179, like 20171005 in this example:
It would be:
Set to date in text portion of FWRow from 179 to 186 in the format yyyyMMdd
Data prep provides an extra bonus of near-instant validation of your NRL:
’
If your NRL rule returns blank, and you expect there to be data, it means that you probably did not pick up the correct position range, and the substring you extracted does not match the date format (e.g. yyyyMMdd) you defined. For troubleshooting purposes you may want to create a Text field and use From-To Position NRL to ensure you extract correct substring
Conclusion
The legwork associated with mapping a fixed width file should be a one time lift. There will only be a need to revisit the configuration if future changes or updates occur in the file, in which case you can easily adjust your rules to continue to parse out the necessary data points.