Skip to main content

Hi all!

There is a table in the source that contains about 100 million records and is updated every day. Is there any way to profile the table so that it adds only new data (for example, those that have appeared today) while not profiling all 100 million rows, but conditionally adding 1 million new records to the 99 million existing ones?

Hello,

what is the source technology? Relational database? Hadoop? Version of Ataccama?

For relational, I can suggest to create SQL catalog item (from 13.6), where in the select statement you would limit it by date, e.g. timestamp > today()-1 to load only the new data, but this will always profile only the new data, you will never get the whole dataset. You can have that as another CIs that you can profile only e.g. once a month for let’s say Time series analysis and so on.

Anna

 


Hello, @anna.spakova 

Source Technology - Greenplum DBMS based on PostgreSQL
Ataccama version - 13.6.0

Yes, I also thought about such options, but the problem is that I always need to have a complete dataset. Accordingly, I thought there is a way to profile the entire table, but during the profiling process, it will only check the new data, while not discarding the old data. As a result, with 100 million records in the table, profiling will add 1 million new records to the 99 million records of the existing Catalog Item


Reply