Ataccama monitoring projects are failing to trigger for large volumes

Forum|Forum|1 month ago
October 9, 2025
3 replies
39 views

mp_ataccamauser
Star Blazer L3

Hii folks,

We have a requirement to do DQ evaluation on a table that has few billions of records (> 10 billions). We have to perform various checks on this table that includes aggregations for uniqueness check, completeness, data format etc. Making a subset of data also doesn’t help in reducing volume to desirable/executable range as it still has total count in billions.

What would be the ideal way to to DQ on such tables? Your inputs are highly appreciated here. Thanks!

@anna.spakova @Lisa Kovalskaia any suggestions here?

+3

anna.spakova
Ataccamer
Forum|Forum|1 month ago
October 9, 2025

Hi @mp_ataccamauser ,

thank you for your question. If you need the whole dataset to perform the DQ checks, I don’t think there is much that we can do.

You don’t mention the data source technology here, but in case you don’t need the invalid records export, you might want to consider pushdown.

Other way I can think of is to at least prepare some of the more performance-demanding operations in your source system instead of Ataccama, esp. the aggregations will always be slower in Ataccama I think. As an example, if you are checking uniqueness, you can do the computation in the source system and create an additional column that would store e.g. the size of the group for each record. Ataccama would then just evaluate this column with a simple DQ check (> 1 → INVALID).

Another question could be - is partitioning possible? Again, depending on the technology and use-cases, you mention that you don’t see a way how to do subsets, so probably not a good idea. Esp. for the aggregations, it is probably not an option, but that could be done in the source, and then in Ataccama you can split the data to smaller chunks or even try the whole dataset with just the simple checks. The question could also be what is the data - billions of data that need to be unique is quite interesting.

Btw in general, Ataccama still should handle it, but make sure that you have sufficient memory on the tmp storage for DPE and disk space.

Perhaps Lisa will suggest some more options.

Kind regards,

Anna

Like

M

mp_ataccamauser
Author
Star Blazer L3
Forum|Forum|1 month ago
October 9, 2025

Thanks for the quick response @anna.spakova we are having snowflake database in this scenario. Also, uniqueness is mainly checked for combination of multiple fields (6 fields to be precise) being unique on certain condition. Do you think it would help if we take DQ fields only in source db and create an intermediate table/view that is pulled as catalog item for DQ in Ataccama.

Like

+3

anna.spakova
Ataccamer
Forum|Forum|1 month ago
October 14, 2025

Hi @mp_ataccamauser ,

you can ofc test the performance of such an aggregation rule in Ataccama first, but I believe it would be better to prepare the dataset in Snowflake first - do the uniqueness combination in Snowflake or at least limit the number of columns, which won’t speed up the DQ evaluation, but at least reading the data from the source. That way, you can leverage the performance there. Ataccama can do Snowflake pushdown, which would be ideal for these cases; however, aggregation rules are not supported at the moment.

Kind regards,

Anna

Like

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded