Hello,
what is the source technology? Relational database? Hadoop? Version of Ataccama?
For relational, I can suggest to create SQL catalog item (from 13.6), where in the select statement you would limit it by date, e.g. timestamp > today()-1 to load only the new data, but this will always profile only the new data, you will never get the whole dataset. You can have that as another CIs that you can profile only e.g. once a month for let’s say Time series analysis and so on.
Anna
Hello, @anna.spakova
Source Technology - Greenplum DBMS based on PostgreSQL
Ataccama version - 13.6.0
Yes, I also thought about such options, but the problem is that I always need to have a complete dataset. Accordingly, I thought there is a way to profile the entire table, but during the profiling process, it will only check the new data, while not discarding the old data. As a result, with 100 million records in the table, profiling will add 1 million new records to the 99 million records of the existing Catalog Item