Solved

Extracting data quality results- deatiled failed records

  • 24 September 2022
  • 5 replies
  • 217 views

Userlevel 1
Badge +1
  • Universe Traveller
  • 4 replies

Hi,

We are using Ataccama one 12.6 version. We have a project containing 40+ models with 90 + rules. I am looking for an options to get the detailed invalid/failed records for each model. Like all the attributes from the model along with invalid rule instance names.

The idea is to get the whole failed dataset to automate the remediation process. We already have automated process to remediate the data using other technologies.

I don’t want to use the download results in csv option, because we are transitioning to latest version of Ataccama.

I am looking for couple of options

  1. Extract the detailed failed records from backend tables using sqls (where Ataccama stores the results) or would like to understand how those csv files are generated in backend and use the same way to get the data and store in  db.
  2. Reading data through Ataccama inbuilt API’s and loading into a db. I am  going through the GraphQL options. Could you please let me know, If there is way to get the similar data through GraphQL? If yes please share the sample query.

Thanks in Advance!

icon

Best answer by Marnix Wisselaar 6 March 2023, 10:45

View original

5 replies

Userlevel 2
Badge +3

+1

Userlevel 4
Badge +3

Interesting let us know how this unfolds. We have a similar challenge.

We have Gen3 and monitoring projects are really nice, so maybe migrate first to Gen2. We use a monitoring project for each project - may contain several catalog items. I’ve created a template (component) for the post processing to a data quality repository - as we report and process further from there. Applying this component will store the info from the monitoring project in a database.

We have a similar use case for storing DQ results into a backend database also that we are considering.  We are using v13.9 and looking to migrate to V14 in the near future.  We would like the aggregated metrics and well as the detailed invalid or exception records. 

Any advice would be more than welcome on the best practice or best methods for doing this.

Thanks

Userlevel 4
Badge +3

Hi Paul,

 

Happy to exchange out current solution in meeting. We are currently on 13.91. Let me know.

Current solutions:

  • MP-template which we use to get the postprocessing-results - facts table data
  • Read the metadata in dimensional tables with metadata-readers 
  • A workflow which orchestrates the backend

We currently get detailed results and do the aggregation in SQL and PowerBI. We will move to the new steps in 13.9 to get only the aggregates.

Mail is marnix.wisselaar@stater.nl

Marnix

Userlevel 6
Badge +7

Hi folks, I second @Marnix Wisselaar’s suggestion. To follow up on that, you could also try to use the post-processing plan (per single table) and filter the rows with invalid rule (there's a column for that).

From that point, you can continue with the action you’d like to take, i.e. putting it into SQL DB.

The graphql APIs can’t return per-record info/data; the full data doesn't leave the DPE processing (except for a small sample of invalid results), and we do not store it in detail, only the aggregated results. 

Getting data in CSV format is not a recommended practice since the file can be extremely large. Therefore, we recommend using the plan approach as opposed to the graphql APIs (stream-like processing). Hope this helps!

Reply