Solved

Removing Html tags

Forum|Forum|1 year ago
June 25, 2024
3 replies
67 views

Karthikeyan
Data Pioneer
12 replies

Hi Team,

I am reading from collibra using collibra reader, some data has html tags like - , <div> in it. Is there a way where we can filter out the html tags if its present in the data. like strip html tags like that.

Let me know if more information is needed.

Thanks.

Best answer by Samuel Muvdi

Hi!

For removing html tags, I would recommend using either a transliterate step or a regex matching step.

Using the transliterate step you can do something like this:

And then when you test this out you should see that we removed the tags in the cio_data column

The other way you can achieve this (I think the faster method) would be to use the regex matching step like so

When testing this out, we can see that it gives us just the data between the tags and no tags

<([a-zA-Z0-9_]+)>([\S\s]*)<([a-zA-Z0-9_]+)>

We can then see that $2 takes in our data without tags :)

Hope this helps!!-

Samuel

S

+2

Samuel Muvdi
Ataccamer
27 replies
Answer
Forum|Forum|1 year ago
July 1, 2024

Hi!

For removing html tags, I would recommend using either a transliterate step or a regex matching step.

Using the transliterate step you can do something like this:

And then when you test this out you should see that we removed the tags in the cio_data column

The other way you can achieve this (I think the faster method) would be to use the regex matching step like so

When testing this out, we can see that it gives us just the data between the tags and no tags

<([a-zA-Z0-9_]+)>([\S\s]*)<([a-zA-Z0-9_]+)>

We can then see that $2 takes in our data without tags :)

Hope this helps!!-

Samuel

K

Karthikeyan
Author
Data Pioneer
12 replies
Forum|Forum|1 year ago
July 2, 2024

@Samuel Muvdi - Thanks for this will check and get back.

A

+1

Ataccama Community Admin
Intergalactic Expert
704 replies
Forum|Forum|1 year ago
July 12, 2024

Hi @Karthikeyan, I’m closing this thread for now. If you have any follow-up questions please feel free to share them in the comments or create a new post 🙋‍♀️

Check out our Quick Start Guide to get started on the community 🙋‍♀️

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded