Hi @mp_ataccamauser , i believe the as part of default post-processing output you should also be able to get dqCheckId (as a part of invalid_rules/invalid_rules_explanations strigns) which can be used to tie the results with other pieces of metadata such as monitoring project, catalog item, rule and so on. However, while you can add metadata reader steps to post processing, i don’t think that post processing component is the best place to do that as it will significantly complicate the logic and make the process run slower and consume more resources.
I’d say it’s more common to do some filtering, row splits (generating separate row of data for each failed dq check) and parsing of of some key information such as attribute names, ruleInstanceId’s(dqCheckId’s) and so on.
And then, once you have all the data, you can read metadata separately through a plan running on the orchestration server and then join it with post processing results using dq check id do generate final enriched output dataset.
I hope this answers your question.
Ivan
Thanks @ivan.kozlov one question regarding when you say the metadata reader step on orchestration server, do u mean to run it on ataccama server altogether ? How do we do so? Sorry I am going to work on it for the first time so not very much acquainted about this. I really appreciate your input here.
@mp_ataccamauser you would still have your post-processing results but only to structure\format the data in desired format. However you can also deploy additional workflow\component on the orchestration server which would read the platform metadata just once and then use it to enrich the data generated by post processing components. By default post processing results are stored in internal MinIO storage of the platform which can be accessed from runtime server and this way you should be able to read all the files generated by post processing at once.
How to do this is more complex question as you’ll need to prepare the components\workflows, push them to your git, then from there deploy it on your server and only then you’ll be able to run it from the Runtime Server Admin Center.
You can find some document links relevant for your case as well in the following thread:
In general it’s hard to explain how to do this on a high level without actually building the solution so if you’re facing any difficulties i’d recommend recommend you to get in touch with your Ataccama representative and maybe involving out professional services team to help with this.
But maybe you if don’t really need to do this for all the projects you have i guess it can be done inside post processing component as well, id’s say the choice of the right approach might depend on the scale at which you need to make these changes. For a single project with a few post processing components - it might be sufficient to do it in the actual post processing components, but if you have tens or hundreds of projects with large number of postprocessing components - doing it in a centralized manner using orchestration server is a better approach.