Skip to main content
Solved

Parsing XML with repeated tags containing data


I am trying to parse an XML File that has data related to employees in the following format

<TABLES>

        <DATA>

            <item>

                <WA>Emplyee1No Employee1Name Employee1PhNo Employee1Email</WA>

            </item>

            <item>

                 <WA>Emplyee2No Employee2Name Employee2PhNo Employee2Email</WA>

            </item>

            ….

            ….

       </DATA>

</TABLES>


I am using XML parser to parse this but it does not give any result.
I think the issue is with the same tag containing multiple records.
As I am able to parse it when there is just one record i.e. one <WA> and <item> tag.

Can anyone please help me with this?

Hi @aiwalia,

 

XML Reader step will create you a separate row for each element found by XPath expression in the Data Stream Configuration.
In your case you need to put /TABLES/DATA/item in there. Like this
 

 

If you have some additional data that is stored on the level of TABLES or DATA you can configure child streams to get it. There is an example of such XML Reader in the Tutorials project in 01.04 Read xml file.plan


@AKislyakov 
I am using the same concept in xml parser.
In the example you mentioned,  the xml file contains different tag names and each tag specifies one detail(ex:there is a tag for sin). In my situation, I have one-the same tagname(<WA>) that is containing different details within it(as mentioned in the question above).
Here, I dont get any output as the same tag is repeated several times.
Is there a way to run a step multiple times in an incremental manner(like a for or while loop)?


Hi@aiwalia,

when you use the aforementioned xpath

/TABLES/DATA/item/WA

the XML Reader will create a record for each <WA> element, the contents of the record will be the full contents of the <WA> element.

If you want to separate the individual info from each line into columns, you have to further process it using an additional step. In your example, you can use the Regex Matching step to split the line into four groups. You could use the pattern

(\w+) (\w+) (\w+) (\w+)

to achieve your goal. Note that when the EmployeeName or EmployeePhNo can contain spaces, you would need to use a more complex pattern or different kind of processing step.

See the attached example I came up with


Hi @Zdenek Tomis 
Thank you for the reply.
Unfortunately, my issue is not for parsing the contents within the <WA> tag.
So, I’m trying to explain with a few images of my plan and input and output.
 

xml tags as under result header of input text file
steps in plan
Xpath as mentioned in XML parser step

 

contents of output file

 

input text file

The output has only one record with the contents of first WA tag.
Can you please suggest a way in which I have contents of each WA tag as a seperate record in output?


You need to put WA to child Data stream.
 

You can also access columns from parent data stream using Shadow Columns section.


@Zdenek Tomis 
Did you find any solution to my issue?

 


@aiwalia 

I tried your what you tried and could not get even a single record.

What worked for me was to include the whole path as xpath, including the root element _-BODS_-RFC_READ_TABLE.

Furthermore, when there are more records present, you have to use the xpath of the record as the data stream xpath. As the xpath in the Columns section of the Xml Parser step configuration you can put the xpath to select the desired part of each record, e.g. WA. Alternatively, you can inlude the WA part in the data stream xpath and it works the same, in that case you can put a dot (‘.’) in the Columns configuration xpath as the current element.

Here is my setup:

The plan is same as yours
The main Xml Parser configuration
the Columns configuration

As input files I put a similar xml only with different data in WA elements and I put all the xml on one line as otherwise the Text Reader step parsed it as more records, but it seems you got this part solved on your end.

And this is the output:

My output, seems correct

I understood your issue correctly, if not, please don’t hesitate to elaborate. I can send you my version of the plan if it would help.


@Zdenek Tomis 
Thanks a lot 🙂.

This was the solution required.


Reply


ataccama
arrows
Lead your team  forward  OCT 24 / 9AM ET
×