ONE Desktop Plans: Column Naming & Debugging 🪲

1 year ago
September 6, 2023
0 replies
191 views

Cansu
Community Manager

Hi everyone!

Today, we have a quick best practice on covering the basics of column naming and debugging in ONE Desktop plans.

When working with ONE Desktop, maintaining a consistent and meaningful naming convention for columns in your output files is crucial. This practice ensures clarity, organization, and seamless data processing. In this guide, we'll provide recommendations for naming conventions and debugging your expressions and steps for efficient plan execution.

Column Naming Conventions

Output files in ONE Desktop can contain numerous columns, especially after processing input data in DQC. To keep your data well-organized, we recommend grouping columns by content. Below are the key prefixes and suffixes used in our naming convention:

Prefixes

Attribute Prefix	Explanation
src_xxx	Source input values (read-only attribute)
meta_xxx	Source input metadata
dec_xxx	Decoded source input values
dic_xxx	Translated master value storage (src_xxx is the source)
pur_xxx	Pre-cleansed values (operational column)
tmp_xxx	Temporary column (operational column)
pat_xxx	Attribute structure description (patterns)
cnt_xxx	Counter (operational column)
std_xxx	Standardized value of the attribute
cio_xxx	Cleansed instance output of the attribute
cmo_xxx	Cleansed master output of the attribute
out_xxx	Operational output column (components interface)
sco_xxx	Attribute score (higher numbers indicate worse data, 0 means perfect)
exp_xxx	Scoring explanation column - cleansing codes for each attribute
lbl_xxx	Human-readable and/or GUI-friendly data quality explanation based on sco and exp attributes
sco_instance	Instance score (usually, the sum of attribute scores)
exp_instance	Instance explanation code (list of error messages); aggregated attribute explanations
mat_xxx	Matching value of the attribute
uni_can_id	Candidate group ID
uni_can_id_old	Candidate group ID (old, from the last unification process)
uni_mat_id	Matching group ID
uni_mat_id_old	Matching group ID (old, from the last unification process)
uni_role or uni_instance_role	Instance unification role (Master, Slave, ...)
uni_msr_role	Merge survivor record role
uni_rule_name	Name of the applied unification rule
uni_grp_can_role	Group unification role for the candidate group (A, C, M, U)
uni_grp_mat_role	Group unification role for the matching group (A, C, M, U)

Note: Row highlighting in the tables indicates the purpose of the columns.

Suffixes

Attribute Suffix	Explanation
xxx_rpl	Replacement data
xxx_pat	Parsing data
xxx_id	Attribute IDs
xxx_orig	Original values found during parsing (example: pur_first_name_orig)

Obsolete and Rarely Used Prefixes

Attribute Prefix	Explanation
cyr_xxx	Attribute analysis of Cyrillic characters (operational column)
lat_xxx	Attribute analysis of Latin characters (operational column)
length_xxx	Attribute length analysis (operational column)
char_xxx	Attribute character analysis (operational column)
word_xxx	Attribute word analysis (operation column)
qma_xxx	attribute quality mark - ABCDX (operation column)
qme_xxx	entity quality mark - ABCDX (operation column)
uir_xxx	address lookup file data (operation column)
rpl_can_xxx	replacement candidates (incorrect data)
bin_xxx	dust bin for text waste (operation column)
pri_xxx	primary unification (operation column)
sec_xxx	secondary unification (operation column)

Debugging Expressions and Steps

When using Ataccama Desktop Plans for your data processing tasks, it's essential to efficiently debug expressions and steps to ensure the accuracy and reliability of your data transformations. Debugging allows you to identify and rectify errors without running the entire plan, saving you time and effort.

Debugging a Function (Expression)

Debugging a function or expression can be instrumental in verifying its correctness. Here's how to do it:

1. Accessing the Expression Debugger

In Ataccama Desktop Plans, you can debug functions or expressions, such as those used in Business Rules within the Profiling step. Follow these steps:

Click on the "Debug..." button associated with the expression you want to debug.
A dialog will appear, allowing you to edit the expression at the top.
Input the data next to the column name.
Click "Evaluate" to see the result.
The expression tree on the right of the Data sources section will display the evaluation/transformation steps, along with the results of each transformation in brackets.

Debugging expressions provides insights into how they affect your data, making it easier to spot and correct issues.

Debugging a Regular Expression

Regular expressions play a crucial role in data processing. To debug a regular expression, follow these steps:

Navigate to the Regex Matching step in the Ataccama Desktop Plan.
In the "Properties of Regex Matching" dialog on the left, select the regular expression you want to debug.
Click the "Debug..." button next to the "Pattern" field.
A new screen will appear.
Enter text into the "Input text" field.
Press "Evaluate" to see the results and substitution options.

If you encounter errors in your regular expression, you can edit it within the same dialog and repeat the evaluation procedure until it's correct.

Debugging a Step

When a step in your plan contains multiple transformations, debugging the step as a whole can help you understand how these transformations impact your data. Here's how to do it:

Right-click the step you want to debug.
Select "Debug."
A "Step Debugger" window will open, divided into two parts: "in" and "out," listing the same columns.
You can choose which columns to display by clicking the "Filter Columns" button (a small table icon).
Additionally, the "Filter" feature can help you narrow down the selection when dealing with many columns.

While debugging a step doesn't allow you to edit expressions directly, it provides a real-data testing environment. You can import data by right-clicking the "in" part and selecting "Import Data..." to populate the table with the first 500 rows of data from a selected file. Please note that this functionality isn't available for database tables, so use it on actual data matching the step's configuration.

Any questions, comments, or best practices? Share them in the comments 👇

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing