Solved

Implement checkpoints in mdm data consolidation process

  • 9 April 2024
  • 1 reply
  • 47 views

Badge

Hi Team, could someone please assist in getting idea of implementing checkpoints in master data consolidation process.?  as currently we observed MDM follows a one lengthy workflow of MDM data consolidation process from Data acquisition to publish data to the downstream systems [ data acquisition→ change detection → cleansing → match→ and then merge.]  during this entire flow execution if any one of the components failed after few hours due technical some issues, then it is required to restart this entire flow. just to avoid this we wanted to restart the jobs from the point where failure happens, is there any way to breakdown or separate the jobs or implement the check points to restart.? please suggest with the recommendations.

Thanks, 

icon

Best answer by Pele 11 April 2024, 12:52

View original

1 reply

Userlevel 1
Badge

Hi @srini,

let me try to provide you some context and explain the options available to your scenario.

The MDM processing is transactional, hence, you don’t need to worry about data consistency in case of any failures, everything would be either committed or reverted as part of the rollback.

For situations when long running batches occur - typically it’s the case of initial load operations - we have the ability to use a feature called “checkpoints”. You can find the related documentation following this link: https://docs.ataccama.com/mdm/latest/input-and-output-interfaces/initial-load-operation.html#resumable-initial-load-checkpoints

Please note that the feature has some pre-conditions, i.e. the MDM parallel strategy needs to be disabled ( nme.parallel.strategy=NONE ), this also means that you can’t use it for regular processing, but really should use it just for initial loads. Additionally, remember, that the checkpoint is actually a set of files that capture the processing state, therefore may require relatively significant free disk space (similar to how much you would normally have allocated in the java.io.tmpdir).

Let us know if you have more questions.

Regards,

Petr

Reply