Skip to main content

Hello

I am looking to create a One Desktop plan to extract records which are out of sync. I need to compare tables coming from different databases and sources. Or better suggestion to achieve the same.

Need help with this.

 

Hello ​@sumisha ,

 

thank you for your question. Can you maybe elaborate a bit more on your use-case? Otherwise, the ONE Desktop plans should achieve this using the JDBC Reader steps to connect to your databases, joins to join the data and several different output and transformation steps to gather the output. One thing to look out for is performance; the joins in the application are much slower than the database joins, so in case of joining large datasets, you might face some resource issues.

Is there a specific reason why you want to use the Desktop tool? Is this going to be part of some larger pipeline? Or is it because of the Catalog limitations that currently don’t allow to use one rule on multiple tables at once? For that, you can use the Virtual Catalog Item to prepare the data.

Thank you.

Kind regards

Anna


Thanks Anna. I need to filter records (historical records) from one source (data warehouse) before comparing with different source. Issue is: will need to create virtual catalogs for all tables in first source, which is like 1000+ tables and not good idea to do that. I was planning to create a global filter in DPM  (DSL query) which solves my problem where I use Data Recon feature of Web Ui but creates another one where I can’t generally create Virtual catalog items, which extracts selected columns from tables. 


I was exploring One Desktop to get actual records which are out of sync. But looks like I need to address first problem to get basic recon values.    


Hi ​@sumisha ,

the data reconcilliation projects do just a basic comparison so you wouldn’t be able to get the records which are out of sync (it practically compares profiling results of both source and target, not the individual records).

I don’t think we have a quick solution for this. If your tables aren’t too big (less than <10k records), I would suggest trying Transformation plan instead of VCI and send it to ONE Data on the output. That can be then used in the monitoring project to actually compare the records. Then you can configure another transformation plan as post-processing to send the out-of-sync records to DB or some other storage (S3, ADLS, ...) Why I am mentioning the plans - you might try to explore the API since it is all configured via front-end:

So if you are able to find some generic pattern to apply to all of your 1000+, this might be a solution. Otherwise, via the VCIs the situation is more complicated and to easy to resolve. Creating the projects with the rules is then also possible via API (even the ONE Metadata Writer steps actually).

Let me know if this might be feasible. But it will still require some development work.

Kind regards,

Anna


Thanks Anna. Appreciate your response. I will explore this further.


Reply