Skip to main content
Solved

Temp Data Files Purging in Ataccama

  • April 5, 2023
  • 1 reply
  • 117 views

Forum|alt.badge.img+1

DPM and DPE - Ataccama ONE Gen2 Platform Latest

Once DPE connects to a data source, it creates metadata and optional sampling of records that do not satisfy DQ rules.

Could you please answer the following questions or direct me to a link that can answer the below questions?
1. Does DPE create other temporary files as a result of processing as well?
2. What happens to the metadata, sample records that do not satisfy DQ rules and any temporary files generated during processing after the connection to the data source get closed?
3. How does purging of data works in Ataccama?

Best answer by anna.spakova

Hello, I hipe I can provide some information:

ad 1. Yes, during evaluation (either profiling or DQ), DPE creates temporary file on the server (by default it is the /tmp folder on the server, this can be changed). Once the evaluation is over, all these files are deleted.

ad 2. Samples of invalid records are stored in MiniO, so even after the connection is closed, they are stored in the platform. This can be disabled in the configuration of monitoring project. As for metadata, they are stored in the Ataccama database (e.g. profiling results, table attributes, aggregated results for DQ etc.). Temporary files as mentiond in #1 are deleted once the evaluation is finished.

ad 3. What do you mean about this? Like what specific queries are called to the database? In very high level, once a job is started (profiling or DQ), Ataccama DPE will call several select queries into the datasource and as mentioned, several temporary files with the data are created on the DPE server, on top of which is Ataccama doing the computation. So the actual profiling or DQ evaluation is happening in DPE, not in the datasource. If you need more details, I can ask our engineering team.

Hope this helps :)

Anna

View original
Did this topic help you find an answer to your question?

1 reply

anna.spakova
Ataccamer
Forum|alt.badge.img+3
  • Ataccamer
  • 144 replies
  • Answer
  • April 17, 2023

Hello, I hipe I can provide some information:

ad 1. Yes, during evaluation (either profiling or DQ), DPE creates temporary file on the server (by default it is the /tmp folder on the server, this can be changed). Once the evaluation is over, all these files are deleted.

ad 2. Samples of invalid records are stored in MiniO, so even after the connection is closed, they are stored in the platform. This can be disabled in the configuration of monitoring project. As for metadata, they are stored in the Ataccama database (e.g. profiling results, table attributes, aggregated results for DQ etc.). Temporary files as mentiond in #1 are deleted once the evaluation is finished.

ad 3. What do you mean about this? Like what specific queries are called to the database? In very high level, once a job is started (profiling or DQ), Ataccama DPE will call several select queries into the datasource and as mentioned, several temporary files with the data are created on the DPE server, on top of which is Ataccama doing the computation. So the actual profiling or DQ evaluation is happening in DPE, not in the datasource. If you need more details, I can ask our engineering team.

Hope this helps :)

Anna


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings