Surprisingly simple: Data profiling as a tool to tackle the dirty data problem

Surprisingly simple: Data profiling as a tool to tackle the dirty data problem

Mick Gouldson describes his recent observations and experience regarding how for​you​and​your​cus​tom​ers in Melbourne applied methods to visualise compliance with data validity requirements to ease and expedite the mass onboarding of acquired products.

As companies seek to grow through the integration of their supply chains or by mergers and acquisitions, they inevitably face the problem of integrating product data from various sources with varying degrees of data integrity. The problem is made worse by a complex set of data models and business rules that companies enforce before the successful integration of digital product information can be achieved.

The world is seemingly getting smaller. Organisations however seek to grow by consolidation, acquisition and integration of similar but unique businesses.

A typical strategy for large enterprise organisations is to integrate these businesses into their core systems and business processes in an attempt to optimise the operations and ultimately reduce overheads. Whilst each of the acquired businesses may present a unique offering to the market, the enterprise will seek to consolidate those operations that are not contributing to the differentiation. This means consolidating and integrating the data that supports these businesses. Product Master Data is some of the most complex to deal with in this respect.

There are unique challenges with product data as each of the acquired businesses will no doubt have varying approaches to defining process, data completeness and data quality. Generally, the more an enterprise grows, the more control is required to be applied in regards to formalising processes and uplifting product data quality. That in turn helps to prevent chaos and allows them to meet their organisational standards.

To achieve the required level of control, inevitably a data cleansing project will be embarked upon. In my time in business and IT consulting, I have seen these projects play out in various ways: 

 –  The IT team are made responsible – this rarely goes well, especially when the IT team might take a technical and non-product view of the issues.

 –  The Category / Product team are given the task of “fixing” the data – this may result in ad-hoc cleansing attempts, with poor planning and poor visibility of where the problems in the data are.

The approach that is proven to work best is when the Category team is responsible and is provided with the right support from IT to understand where they need to focus their energy. That way they can effectively plan their attack, and use the tools available to them to identify the most common issues.

The Project

Recently, I had the opportunity to work with an Enterprise sized business in the Automotive Aftermarket space in Europe. They have a strategy of Master Data consolidation across their various brands and their Master Data Management platform of choice is Stibo STEP, supporting SAP as their ERP.

“The data profile functionality provides detailed overviews of data in specific branches of the hierarchy in the tree. Each profile contains information about the profiled data and provides easy access to correcting data errors.”

– from the Stibo help documentation.

At any one time, there are at least 2 or 3 Product Data Migration projects running as a result of mergers or acquisitions. Each of these projects involves unique products from various product groups, such as clutch, suspension or braking products.

Each of these products has its unique data completeness and quality requirements. To add to the complexity, there were a significant number of business rules and conditions that needed to be passed before the products could be successfully migrated into SAP

The Problem

When foryouandyourcustomers became involved, the Product Managers were struggling with data quality. They were trying to clean up data but did not have any visibility of where the problems lay. They would fix one problem, only to run into another problem in the data.

Their trial and error approach was inefficient and cumbersome, and they had no understanding of how much longer data cleansing would take. Taking a step back to focus on the big issues was not possible. What the Product Managers needed was an insight into where to spend their time and how to plan their data cleansing approach.

What we did

Stibo STEP is quite advanced in regards to its Data Profiling capabilities. We decided to utilise the data profiling capabilities of STEP to provide basic dashboards, reports and insight as to where the issues in the data lay. This would help the business to plan and report on progress.

Specifically, we implemented Attribute Value profiling, which provides insight into frequent and rare values, common patterns, maximum length, and attribute usage across product groups. We took each of the business rules, conditions and validations that were executed when a user attempts to approve a product; and wrote the output of each of these business rules into multi-valued attributes in a new attribute group stored on the products. Fortunately, we had a great base to work from as the business rules had highly detailed error messages, and error codes that the whole lot was well documented from a previous project.

Once the business rules are complete, and the attributes populated, we then execute the data profiling over these attributes and products. We added the required pages into the WebUI to integrate this solution into the business processes. This resulted in a neat “Out of the Box” dashboard available on each product group that allowed the Product Managers to see what type of errors were present on their products and which were the most prevalent. 

This solution also provided the capability to select a particular type of error and filter and search for products based upon that error. The Product Managers could then create targeted actions to research or procure data that needed to be populated, by means of exporting the products that needed specific remediation, or running bulk updates to solve the specific issue. The data profile dashboard also allowed the Product Team to track progress on remediation and formulate a plan of attack. Something that was simply not possible previously.

Conclusion

Whilst what we implemented in STEP was quite simple, the business value created was significant. With this solution in place, the Product Managers and their teams can now:

 – Understand the validity of their product data

 – Find systemic problems in their product data

 – Plan and measure the progress of their data cleansing projects

 – Forecast throughput and timelines.

The solution was so well accepted, that it has now also been adopted for general product maintenance and updates, in addition to the original data migration focus. This approach and solution was yet another example where foryouandyourcustomers were able to demonstrate one of our core values of “Surprisingly Simple”.

Magazine Articles