Skip to main content
Solved

Compare two different columns from two different files


Forum|alt.badge.img

I am working on the usecase of comparing two different columns in two different csv files. I need to compare these column, remove any duplicates and merge them into single column. I am trying to use join component but not sure about this approach will work or any better solution is available to achieve this use case - If anyone can suggest a better apprach it will be helpful. - Thanks.

Best answer by AKislyakov

Hi @Karthikeyan,

 

Join will work, especially if you have any additional columns, that you need to bring to resulting data set. For this case you need to use Join Type = OTHER to preserve un-matched records from both inputs.

Another approach, especially if you need to remove duplicates within input data files as well is to use Union step to join two sets together, followed by a Group Aggregator step, to remove duplicates. Here is an example of how result may look like based on 05.06 Union.plan from the Tutorials project.

 

 

View original
Did this topic help you find an answer to your question?

6 replies

Forum|alt.badge.img
  • Author
  • Data Pioneer
  • 12 replies
  • June 3, 2024

The output will be coming from two different multiplicator which consists of those two columns.

 

 


Forum|alt.badge.img+2
  • Ataccamer
  • 150 replies
  • Answer
  • June 3, 2024

Hi @Karthikeyan,

 

Join will work, especially if you have any additional columns, that you need to bring to resulting data set. For this case you need to use Join Type = OTHER to preserve un-matched records from both inputs.

Another approach, especially if you need to remove duplicates within input data files as well is to use Union step to join two sets together, followed by a Group Aggregator step, to remove duplicates. Here is an example of how result may look like based on 05.06 Union.plan from the Tutorials project.

 

 


Forum|alt.badge.img
  • Author
  • Data Pioneer
  • 12 replies
  • June 3, 2024

@AKislyakov  - thanks for the quick response I will try the union approach and check the results. 


Cansu
Community Manager
Forum|alt.badge.img+3
  • Community Manager
  • 635 replies
  • June 3, 2024

Hello @Karthikeyan  I’m closing this thread, please feel free to follow up here with any of your questions or create a new post šŸ™‹ā€ā™€ļø


Forum|alt.badge.img
  • Author
  • Data Pioneer
  • 12 replies
  • June 4, 2024

@AKislyakov - you mean to say join type = outer because I am not able to find Join Type = OTHER to preserve un-matched records from both inputs.?

 


Forum|alt.badge.img+2
  • Ataccamer
  • 150 replies
  • June 4, 2024

Yep


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings