Hello,I am currently working on a data masking project in ONE Desktop and I'm seeking some advice on how to approach a specific scenario using two tables.I have two CSV files: Classification Table (CSV #1): This table contains two columns - 'ColumnName' and 'DataClassification'. The 'ColumnName' field lists the names of various columns, and 'DataClassification' provides their respective data classifications, such as "Restricted", "Confidential", "Public", or it may be blank. Data Table (CSV #2): This table contains multiple columns, named according to the entries in the 'ColumnName' field of the Classification Table. Each column holds different sets of data. Classification Table (CSV #1):ColumnName DataClassification Name Confidential Age Public Email Restricted Data Table (CSV #2):Name Age Email Alice 30 alice@example.com Bob 25 bob@example.com My goal is to mask the data in the Data Table (CSV #2) based on the classification provided in the Classification Table (CSV #1). For instance, if a column is marked as "Confidential" in CSV #1, the entire corresponding column in CSV #2 should be masked accordingly.I'm looking for guidance or suggestions on the most scalable way to achieve this without resorting to writing a custom step. It seems like all of the possible steps in the desktop app go row by row, which is an issue if I’m trying to conditionally mask an entire column, or require manual input to specify the columns.

If you open an Ataccama plan in a text editor or click the XML button at the top of One Desktop, you will see that the plan is actually an XML file.Thus, you can generate plans as any other XML-files.In attached Example (bin/csv_read_write_generator.comp)I take an input that contains list of csv files and their columns in following formatFOLDER_NAMEFILE_NAMEFILE_EXTORDERCOLUMN_NAME../data/in/sample_inputcsv1col_1../data/in/sample_inputcsv2col_2And for each combination of FOLDER_NAME, FILE_NAME and FILE_EXT generate a plan in the generated_plans folder that:Reads the FOLDER_NAME/FILE_NAME.FILE_EXT file with specified columns;Adds a new column;Writes original columns to the FOLDER_NAME/../out/FILE_NAME.FILE_EXT output file.Here is an example of how generated plan looks like

Masking based on column instead of row?

Hello,

I am currently working on a data masking project in ONE Desktop and I'm seeking some advice on how to approach a specific scenario using two tables.

I have two CSV files:

Classification Table (CSV #1): This table contains two columns - 'ColumnName' and 'DataClassification'. The 'ColumnName' field lists the names of various columns, and 'DataClassification' provides their respective data classifications, such as "Restricted", "Confidential", "Public", or it may be blank.
Data Table (CSV #2): This table contains multiple columns, named according to the entries in the 'ColumnName' field of the Classification Table. Each column holds different sets of data.

Classification Table (CSV #1):

`ColumnName`	`DataClassification`
`Name`	`Confidential`
`Age`	`Public`
`Email`	`Restricted`

Data Table (CSV #2):

`Name`	`Age`	`Email`
`Alice`	`30`	`alice@example.com`
`Bob`	`25`	`bob@example.com`

My goal is to mask the data in the Data Table (CSV #2) based on the classification provided in the Classification Table (CSV #1). For instance, if a column is marked as "Confidential" in CSV #1, the entire corresponding column in CSV #2 should be masked accordingly.

I'm looking for guidance or suggestions on the most scalable way to achieve this without resorting to writing a custom step. It seems like all of the possible steps in the desktop app go row by row, which is an issue if I’m trying to conditionally mask an entire column, or require manual input to specify the columns.

Page 1 / 1

Hi @danielsu

The most scalable way it to leverage the fact that Ataccama plans are actually XML-files.
So what you can do is:

Use Classification Table as an input
Generate a plan that will Read CSV file based on columns list
Apply necessary transformations based on Classification
Write results wherever you want.

This approach though implies that your Classification Table holds all the columns from the Data Table and in correct order. if this is not true you can use semi-manual approach.

Create a generator plan as in previous approach, but this time instead of actual Read/Write steps use Integration Input/Output steps. And use .comp extension for the result
Manually create a plan that will read data from Data Table and write it to desired destination
Put component created on step #1 to plan created at step #2 and use Auto-Map to link columns.

@AKislyakov

If I have the columns that need to be masked in a lookup table, is it possible to for the columns to be fed in an alter column step or similar?

We’re trying to make this as generic as possible as part as an export project execution, so any help would be appreciated.

No, the problem is that inside expression you don’t know what the name of the target column is. Another approach could be to transpose the data to vertical format, apply masking based on Column Name and then transform it back.

`Row ID`	`Column Name`	`Column Value`
`1`	`Name`	Alice
`1`	`Age`	30
`1`	Email	alice@example.com

The problem with this approach is that this transformation parts need manual configuration and are error-prone.

@AKislyakov Thank you so much for the reply. I’ve been looking into possible solutions and I’m also wondering if there’s any function in Ataccama expression to get the current entry’s column? Like a simple getColumnName()? That would solve the problem as well as I could pair that with a lookup table of the columns that need to be masked.

Also, could you clarify step #2?

Generate a plan that will Read CSV file based on columns list

How exactly can I read a CSV file based on the columns list without hardcoding an expression in the input reader?

I’m also a little confused on what you meant by leveraging the fact that Ataccama plans are XML files. How can that be leveraged to solve this problem?

Appreciate the support.

If you open an Ataccama plan in a text editor or click the XML button at the top of One Desktop, you will see that the plan is actually an XML file.

Thus, you can generate plans as any other XML-files.

In attached Example (bin/csv_read_write_generator.comp) I take an input that contains list of csv files and their columns in following format

FOLDER_NAME	FILE_NAME	FILE_EXT	ORDER	COLUMN_NAME
../data/in/	sample_input	csv	1	col_1
../data/in/	sample_input	csv	2	col_2

And for each combination of FOLDER_NAME, FILE_NAME and FILE_EXT generate a plan in the generated_plans folder that:

Reads the FOLDER_NAME/FILE_NAME.FILE_EXT file with specified columns;
Adds a new column;
Writes original columns to the FOLDER_NAME/../out/FILE_NAME.FILE_EXT output file.

Here is an example of how generated plan looks like

Hi @danielsu - I’m closing this thread for now. If you have any questions please don’t hesitate to follow up here or create a new post ‍♀️

Reply

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded