Solved

Removing Html tags

1 year ago
June 25, 2024
3 replies
58 views

Karthikeyan
Data Pioneer

Hi Team,

I am reading from collibra using collibra reader, some data has html tags like - , <div> in it. Is there a way where we can filter out the html tags if its present in the data. like strip html tags like that.

Let me know if more information is needed.

Thanks.

Best answer by Samuel Muvdi

Hi!

For removing html tags, I would recommend using either a transliterate step or a regex matching step.

Using the transliterate step you can do something like this:

And then when you test this out you should see that we removed the tags in the cio_data column

The other way you can achieve this (I think the faster method) would be to use the regex matching step like so

When testing this out, we can see that it gives us just the data between the tags and no tags

<([a-zA-Z0-9_]+)>([\S\s]*)<([a-zA-Z0-9_]+)>

We can then see that $2 takes in our data without tags :)

Hope this helps!!-

Samuel

View original

Did this topic help you find an answer to your question?

Samuel Muvdi
Ataccamer
1 year ago
July 1, 2024

Hi!

For removing html tags, I would recommend using either a transliterate step or a regex matching step.

Using the transliterate step you can do something like this:

And then when you test this out you should see that we removed the tags in the cio_data column

The other way you can achieve this (I think the faster method) would be to use the regex matching step like so

When testing this out, we can see that it gives us just the data between the tags and no tags

<([a-zA-Z0-9_]+)>([\S\s]*)<([a-zA-Z0-9_]+)>

We can then see that $2 takes in our data without tags :)

Hope this helps!!-

Samuel

Karthikeyan
Data Pioneer
1 year ago
July 2, 2024

@Samuel Muvdi - Thanks for this will check and get back.

Cansu
Community Manager
11 months ago
July 12, 2024

Hi @Karthikeyan, I’m closing this thread for now. If you have any follow-up questions please feel free to share them in the comments or create a new post 🙋‍♀️

Check out our Quick Start Guide to get started on the community 🙋‍♀️

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Reply

Related topics

On-Screen Takeoff 3.97.00.14 Release Notes (02/18/2020) WITHDRAWN - OST DPC

00 On-Screen Takeoff and Digital Production Control Release Notes History

On-Screen Takeoff 3.97.02.02 Release Notes (03/16/2020) - OST DPC

On-Screen Takeoff 3.97.00.13 Release Notes (02/07/2020) WITHDRAWN - OST DPC

Previous Versions of On-Screen Takeoff/Digital Production Control (still supported)

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings