Removing Html tags

Question

Hi Team,I am reading from collibra using collibra reader, some data has html tags like -

,

in it. Is there a way where we can filter out the html tags if its present in the data. like strip html tags like that. Let me know if more information is needed.Thanks.

Samuel Muvdi · Accepted Answer

Hi!For removing html tags, I would recommend using either a transliterate step or a regex matching step.Using the transliterate step you can do something like this:And then when you test this out you should see that we removed the

tags in the cio_data columnThe other way you can achieve this (I think the faster method) would be to use the regex matching step like soWhen testing this out, we can see that it gives us just the data between the tags and no tags<([a-zA-Z0-9_]+)>([\S\s]*)<([a-zA-Z0-9_]+)>We can then see that $2 takes in our data without tags :)Hope this helps!!-Samuel

Cansu · Answer

Hi @Karthikeyan, I’m closing this thread for now. If you have any follow-up questions please feel free to share them in the comments or create a new post 🙋‍♀️

Removing Html tags

3 replies

Reply

Reply

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded