How do i do data quality on inbound text file objects

  • 5 April 2024
  • 1 reply
  • 30 views

Userlevel 2
Badge +1

Hello:

I really appreciate the commnity helping me solve more use cases and get value out of the investment we made in Ataccama.

 

Now - i have another area that i need help with.

We have a vendor that drops a file every night - this is not comma delimited and not fixed length. Based on the first character of the line, it decides the format of that line

For ex: if char(1) is - 1 - then the lenght of the file will be 10 characters with first 5 characters as identfier and next five char as status.

But if the first character is 2 - that means it is a different type record (type 2), that has 100 characters in length - and each of them fixed length positions.

 

Now - when i scan that file through Ataccama, it is thinking the file attributes based on the first row. But is there a way i can give the metadata to the profiler?

And by the way, i need this to be a monitoring project that runs DQ on a demand basis (that i would like to trigger through Airflow when the file arrives)

can some one give me some help - where to start, what to do, things to consider etc.?

 

Appreciate your inputs. Thanks

anna.spakova 23 days ago

Hello @Prasad Rani ,

thank you for the question. Your use case is currently not supported. In the future versions (and I am not sure what is your current version) it should be possible to define metadata for the flat files and Excels, however even that is on the level of the whole file. In your case, it looks like the metadata differs for each line.

To advise you further, we would have to see the data and think of a potential workaround like pre-processing of the file using workflows and VCIs etc.

Kind regards,

Anna

View original

1 reply

Userlevel 5
Badge +8

Hello @Prasad Rani ,

thank you for the question. Your use case is currently not supported. In the future versions (and I am not sure what is your current version) it should be possible to define metadata for the flat files and Excels, however even that is on the level of the whole file. In your case, it looks like the metadata differs for each line.

To advise you further, we would have to see the data and think of a potential workaround like pre-processing of the file using workflows and VCIs etc.

Kind regards,

Anna

Reply