Duco normally takes an all-or-nothing approach when matching two fields. The fields simply match or do not match. But when the Fuzzy Match rule is in place, Duco calculates the match score in a way that also accounts for the similarity between non-matching fields.
- The field values "Lloyds Banking Group" and "Lloyds Banking Gr." would have a similarity score of 85%.
- The field values "London" and "Singapore" would have a similarity score of 7%.
A word of caution
Example and purpose
Watch this quick video for an overview of Fuzzy Matching.
Let's now look at a practical example.
Imagine you need to match the following data:
Note the items highlighted in red. They match on Price but the Name differs. Duco normally takes an all-or-nothing approach when matching an individual pair of fields. Two fields either match or do not match. In this case, the values in the Name fields are "Lloyds Banking Group" and "Lloyds Banking Gr.". Even if the difference is small, Duco considers these values as not matching and, because we are matching two fields ("Name" and "Price"), it assigns a match score of 50% to this match result.
This way of matching often works very well, but in this situation the all-or-nothing approach, combined with the fact that we are matching only two fields, returns a result that is not as nuanced as is desirable. The Fuzzy match rule can help to get better results.
Running through the example
You can run through a fuzzy matching example by using the data provided at the bottom of the page. First, match the fields:
Run the process. You should obtain results like the following:
To add a fuzzy match rule:
- Click on Settings → Rules and Rule sets
- Click Add rule for the "Name" field
- Click Fuzzy match to apply the rule
The rules and rules set screen will show the fuzzy match rule as follows:
If you run the matching process with the Fuzzy match rule in place, you should get the following results:
Note that the result above now has a score of 92%, whereas it had a score of 50% without the fuzzy matching rule. When a fuzzy match rule is in place, Duco calculates a score that indicates how similar two text fields are. Two fields that are similar get a high score, and two fields that are different get a low score.
- The field values "Lloyds Banking Group" and "Lloyds Banking Gr." have a similarity score of 85%.
- The field values "London" and "Singapore" have a similarity score of 7%.
When this is useful
- When matching only a single field. For example, if all you have is a list of names, the fuzzy match rule will find the closest mismatch by text similarity. Without fuzzy matching, you will very likely find a high percentage of unmatched items.
- When data quality is low. The fuzzy match rule can drag a partial match over the threshold and prevent you from being confronted with unmatched items, which are difficult to pair manually.