Skip to main content

Hi everyone!

Today, we have a quick best practice on covering the basics of column naming and debugging in ONE Desktop plans.

When working with ONE Desktop, maintaining a consistent and meaningful naming convention for columns in your output files is crucial. This practice ensures clarity, organization, and seamless data processing. In this guide, we'll provide recommendations for naming conventions and debugging your expressions and steps for efficient plan execution.
 

Column Naming Conventions

Output files in ONE Desktop can contain numerous columns, especially after processing input data in DQC. To keep your data well-organized, we recommend grouping columns by content. Below are the key prefixes and suffixes used in our naming convention:
 

Prefixes

Attribute Prefix Explanation
src_xxx Source input values (read-only attribute)
meta_xxx Source input metadata
dec_xxx Decoded source input values
dic_xxx Translated master value storage (src_xxx is the source)
pur_xxx Pre-cleansed values (operational column)
tmp_xxx Temporary column (operational column)
pat_xxx Attribute structure description (patterns)
cnt_xxx Counter (operational column)
std_xxx Standardized value of the attribute
cio_xxx Cleansed instance output of the attribute
cmo_xxx Cleansed master output of the attribute
out_xxx Operational output column (components interface)
sco_xxx Attribute score (higher numbers indicate worse data, 0 means perfect)
exp_xxx Scoring explanation column - cleansing codes for each attribute
lbl_xxx Human-readable and/or GUI-friendly data quality explanation based on sco and exp attributes
sco_instance Instance score (usually, the sum of attribute scores)
exp_instance Instance explanation code (list of error messages); aggregated attribute explanations
mat_xxx Matching value of the attribute
uni_can_id Candidate group ID
uni_can_id_old Candidate group ID (old, from the last unification process)
uni_mat_id Matching group ID
uni_mat_id_old Matching group ID (old, from the last unification process)
uni_role or uni_instance_role Instance unification role (Master, Slave, ...)
uni_msr_role Merge survivor record role
uni_rule_name Name of the applied unification rule
uni_grp_can_role Group unification role for the candidate group (A, C, M, U)
uni_grp_mat_role Group unification role for the matching group (A, C, M, U)

Note: Row highlighting in the tables indicates the purpose of the columns.
 

Suffixes

Attribute Suffix Explanation
xxx_rpl Replacement data
xxx_pat Parsing data
xxx_id Attribute IDs
xxx_orig Original values found during parsing (example: pur_first_name_orig)


Obsolete and Rarely Used Prefixes

Attribute Prefix Explanation
cyr_xxx Attribute analysis of Cyrillic characters (operational column)
lat_xxx Attribute analysis of Latin characters (operational column)
length_xxx Attribute length analysis (operational column)
char_xxx Attribute character analysis (operational column)
word_xxx

Attribute word analysis (operation column)

qma_xxx attribute quality mark - ABCDX (operation column)
qme_xxx entity quality mark - ABCDX (operation column)
uir_xxx address lookup file data (operation column)
rpl_can_xxx replacement candidates (incorrect data)
bin_xxx dust bin for text waste (operation column)
pri_xxx primary unification (operation column)
sec_xxx

secondary unification (operation column)

 

Debugging Expressions and Steps

 

When using Ataccama Desktop Plans for your data processing tasks, it's essential to efficiently debug expressions and steps to ensure the accuracy and reliability of your data transformations. Debugging allows you to identify and rectify errors without running the entire plan, saving you time and effort. 
 

Debugging a Function (Expression)

Debugging a function or expression can be instrumental in verifying its correctness. Here's how to do it:

1. Accessing the Expression Debugger

In Ataccama Desktop Plans, you can debug functions or expressions, such as those used in Business Rules within the Profiling step. Follow these steps:

  • Click on the "Debug..." button associated with the expression you want to debug.
  • A dialog will appear, allowing you to edit the expression at the top.
  • Input the data next to the column name.
  • Click "Evaluate" to see the result.
  • The expression tree on the right of the Data sources section will display the evaluation/transformation steps, along with the results of each transformation in brackets.

Debugging expressions provides insights into how they affect your data, making it easier to spot and correct issues.

Debugging a Regular Expression

Regular expressions play a crucial role in data processing. To debug a regular expression, follow these steps:

  • Navigate to the Regex Matching step in the Ataccama Desktop Plan.
  • In the "Properties of Regex Matching" dialog on the left, select the regular expression you want to debug.
  • Click the "Debug..." button next to the "Pattern" field.
  • A new screen will appear.
  • Enter text into the "Input text" field.
  • Press "Evaluate" to see the results and substitution options.

If you encounter errors in your regular expression, you can edit it within the same dialog and repeat the evaluation procedure until it's correct.

Debugging a Step

When a step in your plan contains multiple transformations, debugging the step as a whole can help you understand how these transformations impact your data. Here's how to do it:

  • Right-click the step you want to debug.
  • Select "Debug."
  • A "Step Debugger" window will open, divided into two parts: "in" and "out," listing the same columns.
  • You can choose which columns to display by clicking the "Filter Columns" button (a small table icon).
  • Additionally, the "Filter" feature can help you narrow down the selection when dealing with many columns.

While debugging a step doesn't allow you to edit expressions directly, it provides a real-data testing environment. You can import data by right-clicking the "in" part and selecting "Import Data..." to populate the table with the first 500 rows of data from a selected file. Please note that this functionality isn't available for database tables, so use it on actual data matching the step's configuration.

Any questions, comments, or best practices? Share them in the comments 👇

Be the first to reply!

Reply