Skip to main content
Question

merging Partitioned Parquet files

  • 3 July 2024
  • 3 replies
  • 36 views

Hello all, 

i have datalake gen2 parquet files in containers and folders. In the folder are delta.log file and parquets. Is there some way to merge these parquets to one table? This table should represent one current(up to date) table contains data from parquets. 

this is what i see now in ataccama. I want to merge these data to one table. 

Thanks for replies, Matus

3 replies

Userlevel 3
Badge +2

Hello!

You could probably achieve this by using transformations in ONE web app. You would need to add multiple Catalog Item Input steps pointing each to one of the partitioned parquet files, then using the partition key the parquet files have, you are able to do a union to vertically stack each file on top of eachtoher, you can create this into a new table by using the ONE Data table writer, where you will then have it in tabular form

 

Best regards-

Samuel :) 

Hi Samuel

thanks for reply. So there is not any way to read partitional parquet files as one table within read and then write this table somewhere to some data storage? What about azure symapse?

And do you know if ataccama web app in the future will be able to read partitional parquet files?

Thanks Matus

Userlevel 3
Badge +2

Hi! Sorry for the late reply, I believe the only way to do this is to use metastore connection and read them as a Hive table, then you profile and run MP on the different partitions of the table

Reply