Skip to main content
Solved

Removing Html tags

  • June 25, 2024
  • 3 replies
  • 67 views

Forum|alt.badge.img

Hi Team,

I am reading from collibra using collibra reader, someĀ data has html tags like - <p> , <div> in it. Is there a way where we can filter out the html tags if its present in the data. like strip html tags like that.

Ā 

Let me know if more information is needed.

Thanks.

Best answer by Samuel Muvdi

Hi!

For removing html tags, I would recommend using either a transliterate step or a regex matching step.Ā 

Using the transliterate step you can do something like this:

Ā 

Ā 

And then when you test this out you should see that we removed the <p><p> tags in the cio_data column

Ā 

Ā 

Ā 

The other way you can achieve this (I think the faster method) would be to use the regex matching step like so

Ā 

When testing this out, we can see that it gives us just the data between the tags and no tags

<([a-zA-Z0-9_]+)>([\S\s]*)<([a-zA-Z0-9_]+)>

Ā 

Ā 

Ā 

We can then see that $2 takes in our data without tags :)

Ā 

Hope this helps!!-

Samuel

3 replies

Forum|alt.badge.img+2
  • Ataccamer
  • 27 replies
  • Answer
  • July 1, 2024

Hi!

For removing html tags, I would recommend using either a transliterate step or a regex matching step.Ā 

Using the transliterate step you can do something like this:

Ā 

Ā 

And then when you test this out you should see that we removed the <p><p> tags in the cio_data column

Ā 

Ā 

Ā 

The other way you can achieve this (I think the faster method) would be to use the regex matching step like so

Ā 

When testing this out, we can see that it gives us just the data between the tags and no tags

<([a-zA-Z0-9_]+)>([\S\s]*)<([a-zA-Z0-9_]+)>

Ā 

Ā 

Ā 

We can then see that $2 takes in our data without tags :)

Ā 

Hope this helps!!-

Samuel


Forum|alt.badge.img
  • Author
  • Data Pioneer
  • 12 replies
  • July 2, 2024

@Samuel MuvdiĀ - Thanks for this will check and get back.


Forum|alt.badge.img+1

Hi @Karthikeyan, I’m closing this thread for now. If you have any follow-up questions please feel free to share them in the comments or create a new post šŸ™‹ā€ā™€ļø