I have a requirement to ignore certain columns to not detect as a change when that field comes with a different values in the message.
I tried applying the field in ignores comparison columns as shown below, but looks like when the message came in again with a new value for this field, it still got processed. Are there more steps to be done to ignore a field? Also to highlight, we consume data from Kafka streaming consumer.
Page 1 / 1
Hi @hkulkarni, this setting applies only to batch operations, not streaming/online requests.
For streaming, there is an option (Selected Columns) to limit which columns should be processed.
E.g., with the configuration below, only the src_first_name column will be changed by the Streaming interface.
Hi @AKislyakov,
Thanks for your response, but adding the field in consumer will restrict the data to get loaded for that field. We want to load the message with this ignored field only when there are changes in other fields of the message and not only in the ignore field, otherwise just ignore it in the change detection.
Eg: src_firstname, src_last_name, message_timestamp. We want to process the message with all three fields only when there is change detected in src_firstname, src_last_name and not consider message_timestamp. If there is no change in src_firstname, src_last_name but message_timestamp is updated, do not process this message.
Please let me know your thoughts.
Thanks,
Unfortunately, there is no exact "ignored comparison columns" feature for streaming interfaces. However, you can add MDM Read steps to check previous values of the columns and drop any redundant records. Depending on your version, you might also need to start the MDM Server with Streaming Consumers deactivated and start them manually after the server startup (nme.stream.consumers.active=false).
Hi @AKislyakov,
Thanks for your response.
This might change our current process so no worries. We’ll try to do something in out layer of Ataccama.
But is it possible to for you to shed some light on Change detection and data acquisition please to have a better understanding ? like in which step does it actually happen?
When it is consumed by the consumer, does it happen in the integration input phase or while loading into the instance tables ?
Thanks,
Change detection occurs during the "Change detection" phase (or subtask) of processing. All major steps are described here (MD Process Monitoring :: Ataccama ONE).
Specifically for a Stream Consumer:
The stream consumer collects a specified number of messages (or waits for a timeout).
The stream consumer initiates MDM processing and provides the gathered messages.
The Plan Transformer is started to convert message data into the MDM format (this is the "Data acquisition" subtask).
The output of the Plan Transformer is compared to the data stored in the MDM Repository (the "Change detection" subtask).
Changed records are then cleansed, matched, merged, etc. (the "Master Data Consolidation" subtask).
Finally, the results of MDM Processing (both instance and master tables) are saved to the MDM Repository (the "Committing" subtask).
@AKislyakov - My apologies for pinging in a separate issue however I have a similar requirement like the above mentioned by @hkulkarni -
As mentioned above -
Eg: src_firstname, src_last_name, message_timestamp. We want to process the message with all three fields only when there is change detected in src_firstname, src_last_name and not consider message_timestamp. If there is no change in src_firstname, src_last_name but message_timestamp is updated, do not process this message.
Now, what I am looking for is that in the above case if message_timestamp is updated - the timestamp should reflect on the INSTANCE layer but not on the MASTER layer. Is their a way it can be achieved? If the change is detected for src_firstname, src_last_name then the timestamp field should be updated at both INSTANCE and MASTER layer. My requirement is based on BATCH operations only.
Any help will be greatly appreciated.
Thank You Ritesh Ranjan
@AKislyakov , thank you for your explanation.
Hi @ritesh.ranjan
> We want to process the message with all three fields only when there is change detected in src_firstname, src_last_name and not consider message_timestamp
If you map message_timestamp to source_timestamp field, then this is the default behavior. The source_timestamp is updated only when other columns are changed. Additionally, this field is used to manage out-of-order messages. For example, if a message has a source_timestamp value earlier than what's stored in the MDM Repository, that message is ignored. This works for all interfaces Batch/Stream/Online APIs.
Additionally for Batch operations you can define ignored comparison columns that will be ignored for change detection. You can do it either globally (MDM Preferences :: Ataccama ONE) or for selected batch load operations (Batch Interface :: Ataccama ONE).