Skip to main content

I am working on the usecase of comparing two different columns in two different csv files. I need to compare these column, remove any duplicates and merge them into single column. I am trying to use join component but not sure about this approach will work or any better solution is available to achieve this use case - If anyone can suggest a better apprach it will be helpful. - Thanks.

The output will be coming from two different multiplicator which consists of those two columns.

 

 


Hi @Karthikeyan,

 

Join will work, especially if you have any additional columns, that you need to bring to resulting data set. For this case you need to use Join Type = OTHER to preserve un-matched records from both inputs.

Another approach, especially if you need to remove duplicates within input data files as well is to use Union step to join two sets together, followed by a Group Aggregator step, to remove duplicates. Here is an example of how result may look like based on 05.06 Union.plan from the Tutorials project.

 

 


@AKislyakov  - thanks for the quick response I will try the union approach and check the results. 


Hello @Karthikeyan  I’m closing this thread, please feel free to follow up here with any of your questions or create a new post 🙋‍♀️


@AKislyakov - you mean to say join type = outer because I am not able to find Join Type = OTHER to preserve un-matched records from both inputs.?

 


Yep


Reply