My input file has 3 different record types within it. The first can be considered the header record consists of 43 of 1000 bytes and is defined by a "recordtype" of DTL. Then we have a PRM record type that has has different data from column 34 thru 1000. Then we have a SUP record type that also has different data from column 34 thru 1000. I have been able to split the file using condition statements but that only works for the first 43 bytes - I thought shadow columns would work but no values are passed when using that option.
so each DTL record can have more than one PRM or SUP after it.
Any ideas on how to resolve this issue? thanks
Help get this topic noticed by sharing it on
Twitter,
Facebook, or email.
Twitter,
Facebook, or email.
-
Hello Allen,
DQ Analyzer is prepared for analysis simple CSV, Fixed Width files or database tables. Reading files with different record formats is solvable, but it is a bit complicated, since the DQ Analyzer metadata editor won't help you much.
I understand you are able to create a plan and use it instead of simple Create profile dialog. What do you need are Reader steps for each distinct record format. What actually probably happened is that the data of other formats then defined in the input step are already lost during the reading process.
I don't know, whether you read the file using Text File Reader of Fixed Width File Reader. You need to create multiple readers (you can easily copy and paste them in the editor). Open the steps and setup the columns for each desired record format in each reader under the columns node.
Afterwards you need to filter different record formats then you are reading. There are 2 ways. Under Advanced tab in the reader you can specify a Regular expression to filter records with different format. I don't know, which expression would I advice, as I do not know the exact format of your file, but I understand there are codes at the start for each different record format at the start of the line. Second way might be easier, but it is harder to debug. You can choose to skip whole lines from the input file that doesn't have the record format configured in the Reader step. When the DQ Analyzer encounters such line, it tries to read all columns it can by default and erase the rest. You can configure the behaviour under Error Handling Strategy/Error Instructions node. I suggest to do it similar to this:
Using the Regular expression to filter records is usually safier. If you try the Error Handling Strategy approach, you may want to try to put on the Put To Reject (writing the records with different record formats to reject file specified also in the Error Handling Strategy for debugging) and Put To Log (writing a message about invalid record for each invalid record) setting for all the error types. -
-
Martin: Thanks for getting back to me on this. I have been trying what you have suggested without much luck. I believe the problem is in my inability to setup the regular expression correctly. The file layout is simple it is 1000 bytes long with the first 3 bytes being the recordtype field. This field can be either "DTL" or "PRM" or "SUP". the DTL record only populates the first 42 bytes and all records share the first 33 bytes of data.
Can you provide me with an example of how the expression should be coded? -
-
I am going to send you the solution by email.
I have used regular expression ^(DTL|PRM).* to filter all rows starting with either DTL or PRM, so I get only records starting with SUP in the data flow with an appropriate number of columns and record format.
What am I a bit afraid is that your file doesn't contain new lines as mine does. In that case, you may need to delete the Line separator \r\n from the step and change Line Max Read Length property on Advanced tab to 1000.
-
Loading Profile...




EMPLOYEE
