Skip to main content

hi Team,

I would like to understand a couple of things related to implementation of slices directly on catalog item v/s filtering out records in sql or virtual catalog items and then using these items in monitoring projects (MPs) 

I am adding a data slice on a data catalog item X having e.g., 300 millions of records in total, sliced subset is having records around 30-35 millions -

Question: when I run the MP, will it first pull all 300 records and then filter out those 30 million records based on slice and then do the DQ evaluation? 

Also, In case I use SQL catalog item (SCI), I am already cutting down undesired records, so I’m directly applying DQ evaluation on 30 million records, will it be faster in this case than above scenario?

In any case, Slice / SCI/ VCI - how are we calculating the overall quality of the data catalog item here? Are there any guidelines on calculation of DQ on the whole dataset using output/score from DQ of subset of data.

Really looking forward for your inputs here!

Thanks.

Be the first to reply!