Vendor Data Source metadata changes

  • 1 April 2024
  • 2 replies
  • 50 views

Userlevel 2
Badge +1

Hello Community.

 

I have a vendor who shared their data through Snowflake Data Sharing. We added it as a data source and scanned the source to add the objects into my Data Catalog. I even have a sccheduled job that scans this source everyday (end of day) for sample profiling.

Today, we had an issue that 10 of the tables that were previously present in the shared data source, were dropped, two new tables created, and some of the tables, have new fields added, existing fields removed, some fields renamed.

This source is about 300 tables, but some of the tables ahve about 700+ fields. So, I would like to know best way for me to get a report/alert/notification on

  1. any new tables added to the source
  2. any existing tables dropped
  3. any existing table modified with new fields added
  4. any existing table modified with existing fields dropped/removed.

 

What is the best way to accomplish this?

For the dropped tables that previously existed, my end of day profile job is showing error in the DPM (because the table exists in my data dictionary, it tried to profile the table and found that table is missing). May be I can somehow capture that error from DPM on the profile job error on those tables. But how about new additions, how about field changes?

 

Appreciate your inputs!!

 

srija piratla 28 days ago

Hi @Prasad Rani ,

For your question to achieve this we can use change data capture mechanism for metadata to observe the changes.

1) Create a job/Workflow to store the information_schema.tables and information_schema.columns of shared database. (Lets assume Table Name Name : Current_information_schema_table  and Current_information_schema_columns.                    2) Compare the information schema of the database received from step 1 with the old  Information Schema (Table Name Name : snapshot_information_schema_tables/snapshot_information_schema_columns) Using a SQL Query.                    3) Copy the Current_information_schema to snapshot_information_schema for tomorrows comparison. You can use below query to observe the changes.

 

View original

2 replies

Badge

Hi @Prasad Rani ,

For your question to achieve this we can use change data capture mechanism for metadata to observe the changes.

1) Create a job/Workflow to store the information_schema.tables and information_schema.columns of shared database. (Lets assume Table Name Name : Current_information_schema_table  and Current_information_schema_columns.                    2) Compare the information schema of the database received from step 1 with the old  Information Schema (Table Name Name : snapshot_information_schema_tables/snapshot_information_schema_columns) Using a SQL Query.                    3) Copy the Current_information_schema to snapshot_information_schema for tomorrows comparison. You can use below query to observe the changes.

 

Userlevel 3
Badge +2

Hi @Prasad Rani! Data Observability can be configured to track and notify users of schema changes (i.e. new/deleted columns, change in data types, etc.). Additional information on Data Observability capabilities can be found here. Hope this helps! 

 

Reply