Skip to main content
Question

Best Way to Set Up Freshness Check as a DQ Rule


Does anyone know how to essentially recreate a freshness check rule with the data quality rules so we can assign the DQ rules to a monitoring project? We are working in Ataccama One Web and do not want to use Ataccama One Desktop currently. 

I have an attribute that has the date/time new records are loaded into our table. If we get at least one new record on the current date, then we want every single record in the table to pass. So with one record uploaded on the current date, the monitoring project will pass with 100% data quality. However, if no record is loaded on the current date, then the monitoring project will fail with 0% data quality. 

 

What is the best approach to set up a rule like this? Thank you in advance!

We want to use the monitoring projects automation capabilities vs manually running the freshness check from the knowledge catalog every single day. 

Did this topic help you find an answer to your question?

7 replies

Lisa Kovalskaia
Ataccamer
Forum|alt.badge.img+3

@ryan.carpenter hi! You could use the following logic in an aggregation rule with no group by parameter (which means the entire dataset is taken as a single group). Would this work or do you see any caveats?

 


@Lisa Kovalskaia  Thanks for this thorough response. 

This rule logic does work, I am surprised that the solution involves using a validity rule vs a timeliness one. 

Thank you! 

Ryan


Lisa Kovalskaia
Ataccamer
Forum|alt.badge.img+3

@ryan.carpenter no problem! To clarify the dimensions aspect, I just went with validity by default, since this won’t make any material impact on the evaluation result. You can definitely configure this as a timeliness rule, to get it to properly contribute to the timeliness dimension on the DQ dashboards and to have a different set of valid/invalid result labels on the data.

 


Hi ​@Lisa Kovalskaia  I also have similar requirement to check data timeliness however I was wondering how would the DQ evaluation work if we use the group by aggregation rules along with other non-aggregation rules (completeness/data format checks) on the same data set at the same time. Here, one rule checks data on a row level whereas other one checks at table level. Will it not cause any delays when you have large volume of data let’s say in few millions. Looking forward for ur suggestions here. Thanks!


Lisa Kovalskaia
Ataccamer
Forum|alt.badge.img+3

@mp_ataccamauser Hi! Aggregation rules certainly take longer to execute but I’d say a dataset in lower millions of records is very common, and it’s definitely not unusual to combine record-level and group-level rules on one monitoring project or catalog item. Performance would depend on the platform resources (which are scalable within the license parameters), how fast Ataccama will get the data from the source, and how many jobs you’re expecting to run in parallel. Overall I’d suggest running a realistic scenario in a test environment, benchmarking results and then figuring out if there’s space to optimize performance from there. 

If you had the chance to do some test runs already - are you concerned with anything in particular from your results? 


Yes, ​@Lisa Kovalskaia 

license + platform resources could be the reason why for me it’s taking few hours to run profile + DQ check for aggregation and other non-aggregate validation rules. For now, as a workaround,  I am using two data objects one for table level aggregations using SQL catalog items and then applying rules, the second one whereas is for row level DQ checks.  I am having data more than 400 Million records.


Lisa Kovalskaia
Ataccamer
Forum|alt.badge.img+3

@mp_ataccamauser what you’re doing is a valid option, the only downside I see outright is the additional operational overhead wherein you need to maintain an additional SQL CI. Is this a major use case for you, do you need to create a large number of these SQL CIs for aggregation checks? Depending on your priorities, scaling up the license/ DPE might be a good idea. Are you running on prem or in the Ataccama cloud?


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings