Skip to main content
Solved

Does it good idea use MDC read inside export operations?


alexAguilarMx
Data Pioneer
Forum|alt.badge.img+2

Hi,

I am wondering to know if is good idea:

  • use MDC Reader step for delta export operations where we need to complement data from entities of other layer

 

 

 

or

  • add another Data Sources with full mode of this other entity, then use a Join step to get the information required. 

 

Both are feasible, not sure what is better talking about batch processing, performance, etc.

Best answer by AKislyakov

Well this highly depends on the proportion of the data that is changed between executions. If it is low, I suggest the first approach, if it is something like 10% or higher, then definitely the second. If it varies a lot, then again the second approach will give you much more stable execution time.


Another approach I might suggest is Traversing Plan Publisher which is able to fetch related records in an optimized way.

View original
Did this topic help you find an answer to your question?

5 replies

Forum|alt.badge.img+2
  • Ataccamer
  • 146 replies
  • Answer
  • July 26, 2022

Well this highly depends on the proportion of the data that is changed between executions. If it is low, I suggest the first approach, if it is something like 10% or higher, then definitely the second. If it varies a lot, then again the second approach will give you much more stable execution time.


Another approach I might suggest is Traversing Plan Publisher which is able to fetch related records in an optimized way.


alexAguilarMx
Data Pioneer
Forum|alt.badge.img+2
  • Author
  • Data Pioneer
  • 9 replies
  • August 16, 2022

@AKislyakov 

 

Thank you for the recommendation, I want to follow the first approach but, what can I do to optimize the initial execution, it will extract the information of the full table, it could take days, but I want to keep the delta watermark for next executions.

 

Regards,

Alejandro 


Forum|alt.badge.img+2
  • Ataccamer
  • 146 replies
  • August 16, 2022

Hi @alexAguilarMx,

This highly depends on the data volumes. If you have a low to moderate amount of records in the full_master entity (let’s say below 10M, but again the number is arbitrary), then waiting for the plan to finish might be a better option. Because the time it takes to develop and test a workaround might be longer then just waiting for non-optimized plan to finish.

Otherwise you can develop two export operations. One using join step approach for the initial execution and the second using MDM Read approach for ongoing delta operations.
Then you can use first one for the initial load and supply referenceTransactionId parameter to the second one to skip the changes. If you are going with this approach I strictly suggest to disable all incoming integration (batch loads, streaming, online services) for the whole period of the initial export and the first run of a delta one.


alexAguilarMx
Data Pioneer
Forum|alt.badge.img+2
  • Author
  • Data Pioneer
  • 9 replies
  • August 16, 2022

Hi @AKislyakov 

 

It really helps, a last question about it, how can I obtain the referenceTransactionId value for the second one?

 

Regards,

Alejandro


alexAguilarMx
Data Pioneer
Forum|alt.badge.img+2
  • Author
  • Data Pioneer
  • 9 replies
  • August 16, 2022

Hey, @AKislyakov 

 

I found the referenceTransactionId in  __export_reg table.

 

Regards,

Alejandro


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings