Hi community!
In this post, we will cover a fundamental practice of configuring the profiling step
If you have missed it you can check the first post on the introduction to data profiling here
The Profiling Step
When you create a profile using a plan, the Profiling step becomes a pivotal part of your data journey, connecting your data source to insights. Let's explore how to make the most of it.
General Category
Basic Tab
In the Basic tab, you can set the step's name, output file name, location, and the default locale for generated files.
Masks Tab
The Masks tab is where you define and edit masks. Masks help reveal data structure without displaying the actual content. For instance, you can use "D" to represent a digit and "L" for a letter. Configure characters, symbols, repeated symbols, and thresholds for your masks here.
Right-click on the row to add, delete, or edit the characters:
Drill-through Tab
The Drill-through tab enables drill-through functionality, allowing you to inspect individual records that compose the generated statistics. You'll need a database connection for this. Configure it by specifying the database name, optional table prefix, and display limit.
Foreign Keys Tab
If you have multiple inputs connected to the same Profiling step, the Foreign Keys tab lets you analyze foreign key relationships. You can specify the left and right input names and the columns to analyze for these relationships.
Input Category
Data Tab
The Data tab displays all data to be profiled and allows for individual column configuration. You can set expressions, data types, masks, domain analysis, standard statistics, frequency analysis, group size analysis, locale, and add comments here.
Dependencies Tab
Use the Dependencies tab to test dependencies between fields in different columns. Define the name, determinant (key), and dependants. A threshold parameter helps assess dependency levels.
Roll Ups Tab
The Roll Ups tab lets you create separate profile analyses for specific subsets of your data. For example, you can analyze data based on gender values to uncover patterns.
Business Rules Tab
In the Business Rules tab, you can define Boolean expressions for evaluation and results presentation in the Profile Viewer. This helps you ensure data conforms to your business rules.
Primary Keys Tab
Use the Primary Keys tab to analyze column uniqueness and identify primary keys. Enter the column names for analysis, either individually or in conjunction.
Configuring Business Domain Analysis
Business domain analysis helps determine the type of data in a column in a business context (e.g., name, address, postal code). You can configure strict and loose thresholds to control how many domains are displayed as "matched" in the results.
To change these settings:
- Switch to the Profiling step layout.
- Select the Business Domains node.
- Modify the settings as needed and save your changes.
Configuring the Profiling Step is essential to gain insights into your data's quality and structure. What are your tips and tricks on the profiling configuration? Share them in the comments below