Skip to main content

Hi community!

 

In this post, we will cover a fundamental practice of configuring the profiling step 👣

If you have missed it you can check the first post on the introduction to data profiling here👇🏽

 

The Profiling Step

When you create a profile using a plan, the Profiling step becomes a pivotal part of your data journey, connecting your data source to insights. Let's explore how to make the most of it.

General Category

Basic Tab

In the Basic tab, you can set the step's name, output file name, location, and the default locale for generated files.
 

Masks Tab

The Masks tab is where you define and edit masks. Masks help reveal data structure without displaying the actual content. For instance, you can use "D" to represent a digit and "L" for a letter. Configure characters, symbols, repeated symbols, and thresholds for your masks here.
 

Right-click on the row to add, delete, or edit the characters:
 


Drill-through Tab

The Drill-through tab enables drill-through functionality, allowing you to inspect individual records that compose the generated statistics. You'll need a database connection for this. Configure it by specifying the database name, optional table prefix, and display limit.
 

Foreign Keys Tab

If you have multiple inputs connected to the same Profiling step, the Foreign Keys tab lets you analyze foreign key relationships. You can specify the left and right input names and the columns to analyze for these relationships.
 

 

Input Category
 

Data Tab

The Data tab displays all data to be profiled and allows for individual column configuration. You can set expressions, data types, masks, domain analysis, standard statistics, frequency analysis, group size analysis, locale, and add comments here.
 

Dependencies Tab

Use the Dependencies tab to test dependencies between fields in different columns. Define the name, determinant (key), and dependants. A threshold parameter helps assess dependency levels.
 

Roll Ups Tab

The Roll Ups tab lets you create separate profile analyses for specific subsets of your data. For example, you can analyze data based on gender values to uncover patterns.
 

You can always select “Fill Columns” to select all options.

Business Rules Tab

In the Business Rules tab, you can define Boolean expressions for evaluation and results presentation in the Profile Viewer. This helps you ensure data conforms to your business rules.
 

Primary Keys Tab

Use the Primary Keys tab to analyze column uniqueness and identify primary keys. Enter the column names for analysis, either individually or in conjunction.
 

Configuring Business Domain Analysis

Business domain analysis helps determine the type of data in a column in a business context (e.g., name, address, postal code). You can configure strict and loose thresholds to control how many domains are displayed as "matched" in the results.

To change these settings:

  1. Switch to the Profiling step layout.
  2. Select the Business Domains node.
  3. Modify the settings as needed and save your changes.

Configuring the Profiling Step is essential to gain insights into your data's quality and structure. What are your tips and tricks on the profiling configuration? Share them in the comments below 👇🏽

Be the first to reply!

Reply