Solved

Extracting data quality results- deatiled failed records

2 years ago
September 23, 2022
5 replies
299 views

KP_DQ
Universe Traveller
4 replies

Hi,

We are using Ataccama one 12.6 version. We have a project containing 40+ models with 90 + rules. I am looking for an options to get the detailed invalid/failed records for each model. Like all the attributes from the model along with invalid rule instance names.

The idea is to get the whole failed dataset to automate the remediation process. We already have automated process to remediate the data using other technologies.

I don’t want to use the download results in csv option, because we are transitioning to latest version of Ataccama.

I am looking for couple of options

Extract the detailed failed records from backend tables using sqls (where Ataccama stores the results) or would like to understand how those csv files are generated in backend and use the same way to get the data and store in db.
Reading data through Ataccama inbuilt API’s and loading into a db. I am going through the GraphQL options. Could you please let me know, If there is way to get the similar data through GraphQL? If yes please share the sample query.

Thanks in Advance!

Best answer by Marnix Wisselaar

Hi Paul,

Happy to exchange out current solution in meeting. We are currently on 13.91. Let me know.

Current solutions:

MP-template which we use to get the postprocessing-results - facts table data
Read the metadata in dimensional tables with metadata-readers
A workflow which orchestrates the backend

We currently get detailed results and do the aggregation in SQL and PowerBI. We will move to the new steps in 13.9 to get only the aggregates.

Mail is marnix.wisselaar@stater.nl

Marnix

View original

Did this topic help you find an answer to your question?

Siva_Madhavan
Data Pioneer L2
31 replies
2 years ago
September 26, 2022

Marnix Wisselaar
Star Blazer L2
46 replies
2 years ago
September 28, 2022

Interesting let us know how this unfolds. We have a similar challenge.

We have Gen3 and monitoring projects are really nice, so maybe migrate first to Gen2. We use a monitoring project for each project - may contain several catalog items. I’ve created a template (component) for the post processing to a data quality repository - as we report and process further from there. Applying this component will store the info from the monitoring project in a database.

PaulS
Space Explorer
2 replies
2 years ago
February 28, 2023

We have a similar use case for storing DQ results into a backend database also that we are considering. We are using v13.9 and looking to migrate to V14 in the near future. We would like the aggregated metrics and well as the detailed invalid or exception records.

Any advice would be more than welcome on the best practice or best methods for doing this.

Thanks

Marnix Wisselaar
Star Blazer L2
46 replies
Answer
2 years ago
March 6, 2023

Hi Paul,

Happy to exchange out current solution in meeting. We are currently on 13.91. Let me know.

Current solutions:

MP-template which we use to get the postprocessing-results - facts table data
Read the metadata in dimensional tables with metadata-readers
A workflow which orchestrates the backend

We currently get detailed results and do the aggregation in SQL and PowerBI. We will move to the new steps in 13.9 to get only the aggregates.

Mail is marnix.wisselaar@stater.nl

Marnix

Cansu
Community Manager
651 replies
2 years ago
March 7, 2023

Hi folks, I second @Marnix Wisselaar’s suggestion. To follow up on that, you could also try to use the post-processing plan (per single table) and filter the rows with invalid rule (there's a column for that).

From that point, you can continue with the action you’d like to take, i.e. putting it into SQL DB.

The graphql APIs can’t return per-record info/data; the full data doesn't leave the DPE processing (except for a small sample of invalid results), and we do not store it in detail, only the aggregated results.

Getting data in CSV format is not a recommended practice since the file can be extremely large. Therefore, we recommend using the plan approach as opposed to the graphql APIs (stream-like processing). Hope this helps!

Check out our Quick Start Guide to get started on the community 🙋‍♀️

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Reply

Related topics

Are E Bike Rides a subset of Rides in the annual summary?icon

E-bike mileageicon

Creating your own Club on Strava

Inconsistencies in "My Year in Sport" Numbersicon

Inconsistencies in "My Year in Sport" Numbers

Most Liked this week

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings