Data masking in Google DataStudio


First, why is there a need for Data masking?
Consider a table containing a column of Social Security Number as an example. Without Data Masking, the SSN would be displayed as 123–456–7899. With data masking, this number is displayed as 123-xxx-xxx-7899.

Then an insider leaks this sensitive information. Data masking minimizes the potential damage. The data-masked field does NOT have enough information to help identify a person.

Data masking allows us to hide all or part of the data in a specified column. This is one of the new security controls under ISO270001.

The problem with Data Masking in Google DataStudio:
However, Google Data Studio does not support data masking. It has no built in function to create data masking. In contrast, Microsoft SQL 2016 (and newer versions) and even MySQL have built-in Data Masking function.

The workaround solution:
I created a function that masks the middle portion of our sensitive data. In this example, I designated the Destination IP address (dst_ip) of our SIEM (Security Incidents and Event Management) as “sensitive”.

Applying masking, here is the resulting table, where the MASK_DstIP is shown below:

As a background, this table is part of our security dashboard. It is shared with Security Interns from University of Pangasinan. We wanted to give them enough data to be able to hunt for IOCs (indicators of compromise) plus any Indicators of Attacks. But still protect the sensitive data at the same time.

In Google DataStudio, first edit the datasource used by DataStudio report. I created a new field based on the Dst_IP column.

Here is the rather rough formula that I used:

CONCAT
(LEFT_TEXT(dst_ip, 4), regexp_replace(substr(dst_ip,5,LENGTH(dst_ip)-7),’+’,”XXX”), RIGHT_text(dst_ip, 4))

The short laymans explanation is that I split the IP address into three parts. I used the LEFT_TEXT to get the first part. It gets the leftmost 4 characters. Then for the middle portion, I used the regexp to replace all numbers from zero to nine with “XXX” . Finally, for the 3rd part, I extracted the rightmost 4 characters using the RIGHT_TEXT.

I then used the concat() function to combine all these three parts into one new column named: Mask_DstIP. Hope this helps! Let me know what you think.