Skip to main content
Question

DQ Rule for missing values


Forum|alt.badge.img+1

I want to find out missing values in the columns. 

Use case, column called colors which has values as [‘ORANGE’,’BLACK’,’BLUE’] and rule is if RED is not part of column values, then Rule should fail and alert that RED color is missing in table. 

Forum|alt.badge.img
  • Data Voyager
  • April 24, 2025

I would create a SQL Catalog Item based on your table, selecting a count of each colour (including values that aren’t listed), something like the following: 


SELECT Color, COUNT(*) AS Count
FROM ColorsTable
WHERE Color IN ('Black', 'Orange', 'Blue', 'Red')
GROUP BY Color;

Then, build a DQ rule on the SQL Catalog Item, where if the count total is 0, record fails, if greater than 0 record passes. That way, if Red has no records, then Red fails. It would require you to know all the acceptable colours, and list them out specifically like I have in the WHERE clause. 

Not sure if there is a better way, there might be something in Data Aggregation rules that could do this maybe? Nothing I’m familiar with though. 


Forum|alt.badge.img+1
  • Data Pioneer
  • April 24, 2025

Thanks ​@bobparry for your suggestion.  I was thinking different way.  

Create Aggregated rule like Maximum (find (‘RED’, <Attribute>), this way, you can if you find the RED anywhere in column values, it will true value. Maximum will return true. If RED doesn't available, all the values will be false. Hence it will return false. 

But I am looking for any better approach where we can provide the list of value to find the same without creating additional catalog items.

Should there be an option to calculate such thing by creating SQL statement under DQ rule section, instead of creating new catalog item for each scenario. 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings