Best Practice

A Beginner's Guide to Data Quality: Introduction 101 - Part 1 πŸ“ˆ

  • 20 June 2023
  • 2 replies
  • 629 views
A Beginner's Guide to Data Quality: Introduction 101 - Part 1 πŸ“ˆ
Userlevel 6
Badge +7
  • Community Manager
  • 255 replies

Hi everyone!


We are continuing with our platform best practice series with Data Quality (DQ)! If you have missed the previous ones, so far we’ve covered Data Governance and Data Observability. Check out the articles below:

Β 


This post will provide an overview of the key features of Ataccama ONE Data Quality and help you get started! We will cover different functionalities under DQ in the following days and weeks.


Before diving into the details, we recommend familiarizing yourself with the basic concepts described in our comprehensive guide. Although not mandatory, it will provide a solid foundation for this tutorial. However, you can still follow along without completing all the steps related to Catalog and Glossary.


In this guide, we will primarily focus on two areas of the application: Data Quality and Business Glossary.

  1. Data Quality: This section serves as your central hub for data quality monitoring and rule development. It consists of several subsections:
    • Rules: Create and manage data quality rules.
    • Components: Explore various components related to data quality.
    • Monitoring Project: Evaluate and monitor the data quality of selected catalog items.
    • Lookups: Access reference data for data quality checks.
  2. Business Glossary: This is the centralized storage for all your business terms.

    Β 

Now, let's familiarize ourselves with the key concepts of data quality that we'll be working with:

  • Rules: Rules help you validate data based on defined conditions. There are two types of rules:
    • DQ evaluation rules: Evaluate the quality of catalog items and their attributes during data quality evaluation. The results include DQ metrics at the catalog item, attribute, and term levels.
    • Detection rules: Identify business domains and automatically assign appropriate business terms to attributes.
  • DQ dimensions: These classify rules based on the implemented logic type. Each rule belongs to a specific DQ dimension. The "Overall quality" metric in DQ results represents the aggregated score of contributing DQ dimensions.
  • Monitoring projects: These projects evaluate the data quality of selected catalog items and monitor it over time. The evaluation includes:
    • DQ checks: These are similar to DQ rules but applied directly to data instead of business terms.
    • Structure checks: Track missing attributes or changes in attribute data types.
    • Anomaly detection: Powered by AI, it alerts you of possible inconsistencies in your data, allowing you to confirm or dismiss the findings.

Now, let's explore how to effectively work with data assets and utilize data quality insights.

Searching for Data Assets
Β 

To find the data asset you want to work with, you can use the full-text search, apply filters, or combine both methods. When data quality evaluation results are available, you can include them in your search criteria to quickly find assets with data issues.

For example, you can use the "Data Quality" filter to locate all data assets with an overall quality score higher than 70%.
Β 


Exploring Data Quality Insights
Β 

Data quality insights provide information about the quality of your data and how it aligns with defined rules. The key metric to focus on is the "Overall quality," which aggregates results from all globally configured DQ dimensions. Each DQ dimension represents a different aspect of data quality, such as validity, accuracy, uniqueness, or completeness.

You can access DQ insights from the Knowledge Catalog screen and throughout the platform, specifically on the "Data Quality" tab of catalog items, attributes, or business terms. When viewing catalog item DQ results, you can select a specific attribute to quickly access detailed DQ results.
Β 

Β 

Configuring Automatic Term Detection
Β 

Categorizing your data is an important step in understanding it better. Ataccama ONE provides two methods for term assignment: manual assignment and automatic term detection using detection rules.

Detection rules operate independently from AI-powered term detection, which generates term suggestions. Detection rules allow you to define specific conditions for recognizing business domains, such as verifying value format or referencing a list of values. We will guide you through the process of creating a detection rule in the following section.

The assignment of a term to catalog item attributes based on a detection rule depends on two factors:

  • The rule logic: How the application identifies the attributes to which a term should be applied.
  • The detection threshold: You specify the percentage of attribute values that must fulfill the rule conditions. If the threshold is not met, the term is not assigned.

Term suggestions are generated through AI detection. If enabled for a term (default setting), the platform suggests assigning the term to data assets based on similarities with other assets that already have assigned terms. You can then review and approve or reject these suggestions based on their accuracy.
Β 

Creating a Detection Rule
Β 

To define automatic term detection for a specific term, follow these steps:

  1. Create a rule, define the rule implementation logic (conditions that the records are evaluated against), and test the rule.
  2. Map the rule to the term.

To create a new rule, go to the "Data Quality" section, navigate to the "Rules" tab, and click "Create."

Β 

Enter a name and description for the rule, and specify the rule owner. You can use an existing detection rule as a template by duplicating it and making necessary modifications.

Configure the detection logic of the rule from the "Implementation" tab. You can add multiple conditions, and a record must pass all the conditions by default. However, you can also use the "Advanced expression" mode for a less strict logical operator.

You can leverage attribute "Pattern Analysis" results to create a detection rule quickly. For example, navigate to the catalog item "customers," open the "Profile & DQ Insights" tab, select the "email" attribute, check the available patterns, and use them in rule creation.

Β 

Test your detection rule to ensure it functions as intended before sharing it with other users. Once verified, publish the rule or submit it for publishing if you lack the necessary permissions.

Β 

Finally, map the published detection rule to the corresponding business term. In the "Business Glossary" section, locate and open the term, navigate to the "Settings" tab, and add the detection rule. Specify the detection threshold, which determines the percentage of attribute records that must pass the rule for the business term to be assigned.

Make sure to save your changes and publish them for them to take effect.

We hope this guide has given you a solid foundation for getting started with Ataccama ONE Data Quality. If you have any questions feel free to ask in the comments πŸ‘‡


2 replies

Badge

Hi, is there any possibility to associate specific rules to specific attributes inside a business term? I can only select rules to apply for all the attributes of the respective business term. This is something I can do at catalog item level, I mean, associate a particular rule with a particular field.

For example, if I include a lot of attributes in a particular business term not all these attributes have completeness or validity, but the Settings section of the business term do not allow to select which rule to apply atribute by attribute...I can only select rules for all the attributes.
Thanks in advance

Badge +1

Hi @GastonQ, at the moment, the rule will be applied to all attributes where the term is mapped and you cannot select specific attributes to map to from there.

If there is a rule running on an attribute that you don’t believe is appropriate, then you can go to that specific catalog item and pause the DQ rule, preventing the DQ rule from running on that specific attribute.Β 

Hope this helps!

Reply