Solved

Parsing XML with repeated tags containing data

1 year ago
July 3, 2023
8 replies
476 views

aiwalia
Data Pioneer

I am trying to parse an XML File that has data related to employees in the following format

<DATA>

<item>

<WA>Emplyee1No Employee1Name Employee1PhNo Employee1Email</WA>

</item>

<item>

<WA>Emplyee2No Employee2Name Employee2PhNo Employee2Email</WA>

</item>

….

</DATA>

</TABLES>

I am using XML parser to parse this but it does not give any result.
I think the issue is with the same tag containing multiple records.
As I am able to parse it when there is just one record i.e. one <WA> and <item> tag.

Can anyone please help me with this?

Best answer by AKislyakov

Hi @aiwalia,

XML Reader step will create you a separate row for each element found by XPath expression in the Data Stream Configuration.
In your case you need to put /TABLES/DATA/item in there. Like this

If you have some additional data that is stored on the level of TABLES or DATA you can configure child streams to get it. There is an example of such XML Reader in the Tutorials project in 01.04 Read xml file.plan

View original

Did this topic help you find an answer to your question?

AKislyakov
Ataccamer
1 year ago
July 3, 2023

Hi @aiwalia,

aiwalia
Data Pioneer
1 year ago
July 3, 2023

@AKislyakov
I am using the same concept in xml parser.
In the example you mentioned, the xml file contains different tag names and each tag specifies one detail(ex:there is a tag for sin). In my situation, I have one-the same tagname(<WA>) that is containing different details within it(as mentioned in the question above).
Here, I dont get any output as the same tag is repeated several times.
Is there a way to run a step multiple times in an incremental manner(like a for or while loop)?

Zdenek Tomis
Ataccamer
1 year ago
July 3, 2023

Hi@aiwalia,

when you use the aforementioned xpath

/TABLES/DATA/item/WA

the XML Reader will create a record for each <WA> element, the contents of the record will be the full contents of the <WA> element.

If you want to separate the individual info from each line into columns, you have to further process it using an additional step. In your example, you can use the Regex Matching step to split the line into four groups. You could use the pattern

(\w+) (\w+) (\w+) (\w+)

to achieve your goal. Note that when the EmployeeName or EmployeePhNo can contain spaces, you would need to use a more complex pattern or different kind of processing step.

See the attached example I came up with

aiwalia
Data Pioneer
1 year ago
July 3, 2023

Hi @Zdenek Tomis
Thank you for the reply.
Unfortunately, my issue is not for parsing the contents within the <WA> tag.
So, I’m trying to explain with a few images of my plan and input and output.

xml tags as under result header of input text file

The output has only one record with the contents of first WA tag.
Can you please suggest a way in which I have contents of each WA tag as a seperate record in output?

AKislyakov
Ataccamer
1 year ago
July 3, 2023

You need to put WA to child Data stream.

You can also access columns from parent data stream using Shadow Columns section.

aiwalia
Data Pioneer
1 year ago
July 5, 2023

@Zdenek Tomis
Did you find any solution to my issue?

Zdenek Tomis
Ataccamer
1 year ago
July 5, 2023

@aiwalia

I tried your what you tried and could not get even a single record.

What worked for me was to include the whole path as xpath, including the root element _-BODS_-RFC_READ_TABLE.

Furthermore, when there are more records present, you have to use the xpath of the record as the data stream xpath. As the xpath in the Columns section of the Xml Parser step configuration you can put the xpath to select the desired part of each record, e.g. WA. Alternatively, you can inlude the WA part in the data stream xpath and it works the same, in that case you can put a dot (‘.’) in the Columns configuration xpath as the current element.

Here is my setup:

As input files I put a similar xml only with different data in WA elements and I put all the xml on one line as otherwise the Text Reader step parsed it as more records, but it seems you got this part solved on your end.

And this is the output:

I understood your issue correctly, if not, please don’t hesitate to elaborate. I can send you my version of the plan if it would help.

aiwalia
Data Pioneer
1 year ago
July 6, 2023

@Zdenek Tomis
Thanks a lot :).

This was the solution required.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Reply

Related topics

Accessing parent node attributes in XML with repeated tagsicon

Expressions 101 for non-technical users...

How can we handle csv output from API responseicon

ONE Desktop: Working with Data Files 📂

[Part II] Version 15.3 is here! Updates to ONE AI, ONE, and ONE Data 🤖

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings