Skip to main content
Solved

Extracting data quality results- deatiled failed records

  • September 23, 2022
  • 5 replies
  • 299 views

Forum|alt.badge.img+1
  • Universe Traveller
  • 4 replies

Hi,

We are using Ataccama one 12.6 version. We have a project containing 40+ models with 90 + rules. I am looking for an options to get the detailed invalid/failed records for each model. Like all the attributes from the model along with invalid rule instance names.

The idea is to get the whole failed dataset to automate the remediation process. We already have automated process to remediate the data using other technologies.

I don’t want to use the download results in csv option, because we are transitioning to latest version of Ataccama.

I am looking for couple of options

  1. Extract the detailed failed records from backend tables using sqls (where Ataccama stores the results) or would like to understand how those csv files are generated in backend and use the same way to get the data and store in  db.
  2. Reading data through Ataccama inbuilt API’s and loading into a db. I am  going through the GraphQL options. Could you please let me know, If there is way to get the similar data through GraphQL? If yes please share the sample query.

Thanks in Advance!

Best answer by Marnix Wisselaar

Hi Paul,

 

Happy to exchange out current solution in meeting. We are currently on 13.91. Let me know.

Current solutions:

  • MP-template which we use to get the postprocessing-results - facts table data
  • Read the metadata in dimensional tables with metadata-readers 
  • A workflow which orchestrates the backend

We currently get detailed results and do the aggregation in SQL and PowerBI. We will move to the new steps in 13.9 to get only the aggregates.

Mail is marnix.wisselaar@stater.nl

Marnix

View original
Did this topic help you find an answer to your question?

5 replies

Forum|alt.badge.img+3
  • Data Pioneer L2
  • 31 replies
  • September 26, 2022

+1


Forum|alt.badge.img+2

Interesting let us know how this unfolds. We have a similar challenge.

We have Gen3 and monitoring projects are really nice, so maybe migrate first to Gen2. We use a monitoring project for each project - may contain several catalog items. I’ve created a template (component) for the post processing to a data quality repository - as we report and process further from there. Applying this component will store the info from the monitoring project in a database.


Forum|alt.badge.img+1
  • Space Explorer
  • 2 replies
  • February 28, 2023

We have a similar use case for storing DQ results into a backend database also that we are considering.  We are using v13.9 and looking to migrate to V14 in the near future.  We would like the aggregated metrics and well as the detailed invalid or exception records. 

Any advice would be more than welcome on the best practice or best methods for doing this.

Thanks


Forum|alt.badge.img+2

Hi Paul,

 

Happy to exchange out current solution in meeting. We are currently on 13.91. Let me know.

Current solutions:

  • MP-template which we use to get the postprocessing-results - facts table data
  • Read the metadata in dimensional tables with metadata-readers 
  • A workflow which orchestrates the backend

We currently get detailed results and do the aggregation in SQL and PowerBI. We will move to the new steps in 13.9 to get only the aggregates.

Mail is marnix.wisselaar@stater.nl

Marnix


Cansu
Community Manager
Forum|alt.badge.img+3
  • Community Manager
  • 651 replies
  • March 7, 2023

Hi folks, I second @Marnix Wisselaar’s suggestion. To follow up on that, you could also try to use the post-processing plan (per single table) and filter the rows with invalid rule (there's a column for that).

From that point, you can continue with the action you’d like to take, i.e. putting it into SQL DB.

The graphql APIs can’t return per-record info/data; the full data doesn't leave the DPE processing (except for a small sample of invalid results), and we do not store it in detail, only the aggregated results. 

Getting data in CSV format is not a recommended practice since the file can be extremely large. Therefore, we recommend using the plan approach as opposed to the graphql APIs (stream-like processing). Hope this helps!


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings