Over 60 reference data sources to improve your master data quality
CDQ works as a broker for open and premium data: Our CDQ Cloud Platform already uses over 60 selected data sources, such as Google Maps or national commercial registers, to improve your master data quality. With the help of integrated sources, business partner data can be validated, curated, and enriched. This makes incomplete or incorrect records a thing of the past.
Some of our integrated data sources
How to integrate new reference data sources
After determining which external reference data source you want to use, you need to take the following steps:
Start mapping data fields. Simply put, for each data field you want to use, you must define the corresponding counterpart in your database. For example, UK Companies House provides address information in the semantically ambiguous data fields such as “AddressLine1” which require mapping and transformation into your data model.
After mapping your data, you need to curate it. For example, if you look at the information from Zefix (Central Business Name Index: https://www.zefix.ch), you can see that the “address” field contains a mix of information. There is the street name (in our database, this is “thoroughfare value”), and there is the house number (in our database, this is “thoroughfare number”).
Some sources even mix more information within one data field, as you can see in the example of the address field in the European Commission’s VAT database (accessed via VIES: http://ec.europa.eu/taxation_customs/vies).
Our software developers found a solution for these types of discrepancies by writing the appropriate code so that the required information is provided in a semantically unambiguous and structured representation.
First, you must do some upfront research on how often data sources are updated because this differs a lot. The “Global Legal Entity Identifier Foundation” updates their data daily, the Belgium Company Register updates its data weekly, and the Company Register in Norway updates monthly.
Then, you have to find a technical solution for how to integrate this data. Instead of relying on API connection for our data sources, we have programmed bots that monitor updated dump files (usually provided in various formats such as CSV, JSON, or Excel).
Our expert team programs the bots to ensure that we get new data directly after it is published to provide the latest data to the users of our DQaaS services.
Using open reference data sources is beneficial. However, making this data usable requires a lot of effort. Additionally, these integrations involve continuous maintenance. We monitor changes to the data as well as the data models and implement these changes accordingly.