Hi everyone!
Today, we have a quick best practice on covering the basics of column naming and debugging in ONE Desktop plans.
When working with ONE Desktop, maintaining a consistent and meaningful naming convention for columns in your output files is crucial. This practice ensures clarity, organization, and seamless data processing. In this guide, we'll provide recommendations for naming conventions and debugging your expressions and steps for efficient plan execution.
Column Naming Conventions
Output files in ONE Desktop can contain numerous columns, especially after processing input data in DQC. To keep your data well-organized, we recommend grouping columns by content. Below are the key prefixes and suffixes used in our naming convention:
Prefixes
Attribute Prefix | Explanation |
---|---|
src_xxx | Source input values (read-only attribute) |
meta_xxx | Source input metadata |
dec_xxx | Decoded source input values |
dic_xxx | Translated master value storage (src_xxx is the source) |
pur_xxx | Pre-cleansed values (operational column) |
tmp_xxx | Temporary column (operational column) |
pat_xxx | Attribute structure description (patterns) |
cnt_xxx | Counter (operational column) |
std_xxx | Standardized value of the attribute |
cio_xxx | Cleansed instance output of the attribute |
cmo_xxx | Cleansed master output of the attribute |
out_xxx | Operational output column (components interface) |
sco_xxx | Attribute score (higher numbers indicate worse data, 0 means perfect) |
exp_xxx | Scoring explanation column - cleansing codes for each attribute |
lbl_xxx | Human-readable and/or GUI-friendly data quality explanation based on sco and exp attributes |
sco_instance | Instance score (usually, the sum of attribute scores) |
exp_instance | Instance explanation code (list of error messages); aggregated attribute explanations |
mat_xxx | Matching value of the attribute |
uni_can_id | Candidate group ID |
uni_can_id_old | Candidate group ID (old, from the last unification process) |
uni_mat_id | Matching group ID |
uni_mat_id_old | Matching group ID (old, from the last unification process) |
uni_role or uni_instance_role | Instance unification role (Master, Slave, ...) |
uni_msr_role | Merge survivor record role |
uni_rule_name | Name of the applied unification rule |
uni_grp_can_role | Group unification role for the candidate group (A, C, M, U) |
uni_grp_mat_role | Group unification role for the matching group (A, C, M, U) |
Note: Row highlighting in the tables indicates the purpose of the columns.
Suffixes
Attribute Suffix | Explanation |
---|---|
xxx_rpl | Replacement data |
xxx_pat | Parsing data |
xxx_id | Attribute IDs |
xxx_orig | Original values found during parsing (example: pur_first_name_orig) |
Obsolete and Rarely Used Prefixes
Attribute Prefix | Explanation |
---|---|
cyr_xxx | Attribute analysis of Cyrillic characters (operational column) |
lat_xxx | Attribute analysis of Latin characters (operational column) |
length_xxx | Attribute length analysis (operational column) |
char_xxx | Attribute character analysis (operational column) |
word_xxx | Attribute word analysis (operation column) |
qma_xxx | attribute quality mark - ABCDX (operation column) |
qme_xxx | entity quality mark - ABCDX (operation column) |
uir_xxx | address lookup file data (operation column) |
rpl_can_xxx | replacement candidates (incorrect data) |
bin_xxx | dust bin for text waste (operation column) |
pri_xxx | primary unification (operation column) |
sec_xxx | secondary unification (operation column) |
Debugging Expressions and Steps
When using Ataccama Desktop Plans for your data processing tasks, it's essential to efficiently debug expressions and steps to ensure the accuracy and reliability of your data transformations. Debugging allows you to identify and rectify errors without running the entire plan, saving you time and effort.
Debugging a Function (Expression)
Debugging a function or expression can be instrumental in verifying its correctness. Here's how to do it:
1. Accessing the Expression Debugger
In Ataccama Desktop Plans, you can debug functions or expressions, such as those used in Business Rules within the Profiling step. Follow these steps:
- Click on the "Debug..." button associated with the expression you want to debug.
- A dialog will appear, allowing you to edit the expression at the top.
- Input the data next to the column name.
- Click "Evaluate" to see the result.
- The expression tree on the right of the Data sources section will display the evaluation/transformation steps, along with the results of each transformation in brackets.
Debugging expressions provides insights into how they affect your data, making it easier to spot and correct issues.
Debugging a Regular Expression
Regular expressions play a crucial role in data processing. To debug a regular expression, follow these steps:
- Navigate to the Regex Matching step in the Ataccama Desktop Plan.
- In the "Properties of Regex Matching" dialog on the left, select the regular expression you want to debug.
- Click the "Debug..." button next to the "Pattern" field.
- A new screen will appear.
- Enter text into the "Input text" field.
- Press "Evaluate" to see the results and substitution options.
If you encounter errors in your regular expression, you can edit it within the same dialog and repeat the evaluation procedure until it's correct.
Debugging a Step
When a step in your plan contains multiple transformations, debugging the step as a whole can help you understand how these transformations impact your data. Here's how to do it:
- Right-click the step you want to debug.
- Select "Debug."
- A "Step Debugger" window will open, divided into two parts: "in" and "out," listing the same columns.
- You can choose which columns to display by clicking the "Filter Columns" button (a small table icon).
- Additionally, the "Filter" feature can help you narrow down the selection when dealing with many columns.
While debugging a step doesn't allow you to edit expressions directly, it provides a real-data testing environment. You can import data by right-clicking the "in" part and selecting "Import Data..." to populate the table with the first 500 rows of data from a selected file. Please note that this functionality isn't available for database tables, so use it on actual data matching the step's configuration.
Any questions, comments, or best practices? Share them in the comments