Skip to main content

Happy Tuesday Community!

 

We started this week by sharing best practices on Monitoring Projects, and today we’ll be continuing with how to configure Monitoring Projects. If you have missed the first post you can find it here:

Let's explore the best practices for configuring Monitoring Projects to ensure robust data quality checks and anomaly detection.

Select Catalog Items

  1. Go to the Configuration & Results tab.
  2. To add catalog items:
    • For a new project, select Add catalog items.

       

    • To add more items to an existing project, select +Add under Items to Monitor.

       

    • To remove an item, use the more options icon and select Delete.
  3. The listed catalog items will display a summary of:
    • Structure: Aggregated results for structure checks on attributes.
    • Anomalies: Detected anomalies on attributes.
    • DQ Checks: Results of data quality checks assigned to attributes.
    • Overall: Aggregated quality based on assigned DQ checks contributing to overall quality.
    • Individual DQ dimensions, listed with their abbreviations.

For more detailed results or to edit checks, click on a catalog item to access the attribute view. Proceed to configure each item as required.
 

DQ Dimensions

By default, DQ dimensions contributing to overall quality match global settings (see Configuring Data Quality Dimensions). To customize the configuration:

  1. Navigate to the Overview tab.
  2. In the Overall quality contribution widget, select desired dimensions to contribute to overall data quality results for the project.

     

To revert to default settings, select Revert to default.
 

Structure, DQ Checks, and Anomaly Detection

After selecting catalog items, configure each item individually to add checks:

  • Structure Checks: Alert for missing columns or data type changes.
  • Anomaly Detection: AI-powered alerts for potential anomalies.
  • DQ Checks: Assign Data Quality Evaluation rules (see Working with Rules).

For more information on each check, refer to the links provided in the text.
 

Assign Checks

  1. Use the plus icons to enable or add Structure Checks, Anomaly Detection, and DQ Checks.
    • Structure Checks can be enabled on an attribute-by-attribute basis by selecting +Make Mandatory.

       

    • Anomaly Detection can be enabled on an attribute-by-attribute basis by selecting +Enable Detection.
       

       

  2. For DQ Checks, choose either:
    • Assign custom DQ checks using the plus icon under Applied DQ Checks and select rules from the list.

       

    • Use the rule suggestions option to review and accept/reject suggested rules for the catalog item.

       

To remove checks, hover over the applied check in the table and click on the cross icon. Remember to publish changes after implementation.
 

Additional Configuration Options
 

Filter by Attributes

Use attributes as filters to view DQ results for specific data sources. Enable filters for attributes and publish the changes to the project.
 

 

Update Rule References

Be aware of any edited rules in the project. Admins can enable/disable notifications for all users through core:reference trait. Ensure required reference strategy before project configuration.
 


 

Configure Parallel Processing

Global and project-specific parallelization allows efficient processing. Configure it through property settings or web app advanced settings.

 

By following these best practices, you can ensure an effective and accurate Monitoring Project configuration for maintaining high data quality.

Any questions? Thoughts? Share them in the comments 👇

 

 

Hello Community 🌻 Can anyone advise how the “Checks” count is calculated please? 

Looking at our monitoring projects it doesn’t seem to add up to any of the following so I am confused what it represents 😆

It doesn't appear to be:

  • Number of DQ checks applied
  • Number of attributes with DQ checks applied
  • Number of attributes with invalid rows
  • Number of failed data quality rules (i.e. rules with less than 100% result)
  • Number of invalid rows

Thank you!

UPDATE: I can see it is actually the number of checks applied 🤦🏻 sorry! I was incorrectly double counting some of the checks. Where some of the checks apply to more than 1 attribute I was counting this as multiple checks rather than 1 check. For example I was incorrectly counting this as 4 checks rather than 3 as the 3rd check is comparing two attributes to each other:

 


Reply