PII Digest

Interesting Articles on Handling Sensitive Data

Xin Cheng
5 min readSep 6, 2024

Some resources on PII handling.

https://www.elpasotexas.gov/assets/Documents/CoEP/Community-Development/Forms-and-Notices/Manuals/DCHD-PII-Policy-updated.pdf

Types of PII (depending on impact): PII that may require legal notification of a breach, Legally Protected PII that is considered Sensitive/Confidential, Other Forms of PII with the potential for misuse

https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-122.pdf

Organizations should categorize their PII by the PII confidentiality impact level. (dimensions: Identifiability (direct, indirect), Quantity of PII, Data Field Sensitivity, Context of Use, Obligations to Protect Confidentiality, Access to and Location of PII)

Then apply the appropriate safeguards for PII based on the PII confidentiality impact level. (Creating Policies and Procedures, Conducting Training, De-Identifying PII, Access Enforcement, Access Control for Mobile Devices, Transmission Confidentiality, Auditing Events)

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10431740

In GDPR, PII can be removed due to data subject request or regulatory requirements. Generally system should have following capabilities:

  1. Detection, discovery, and classification of PII
  2. Entity linking from a large number of datasets
  3. Removal of PII (pseudo-anonymized, anonymized (meaningless value), deleted (NULL))

Sometimes for PII info that has analytical value (e.g. location, age), special action can be considered (e.g. generalization (e.g. broader region, age group), aggregation so that individual cannot be identified)

Metadata model helps in PII handling

  • Reference metadata that include data describing Zones, Object types, PII Domains, Actions, Retention Periods, Status.
  • Business Rules metadata that include data describing business rules for personal data removal and anonymization
  • Object metadata that will include data that describe objects such as datasets and features
  • Action metadata include data describing actions that should be performed to remove or anonymize the data.
  • Linked entities metadata that include data belonging to the same data subject in various datasets

https://www.anonos.com/hubfs/Anonos_BigPrivacy_GDPR_Blueprint_2nd_Edition.pdf

GDPR frequently recommends pseudonymisation of personal data, meaning maintaining those links (suitably accessible by encrypted keys and the like) in the hands of authorized parties only. For example, imagine that someone with an incurable medical condition is having their health data used to further investigational drug discovery. If an effective drug is discovered, GDPR-defined pseudonymisation enables the person to be contacted, treated and cured, whereas anonymisation makes it theoretically impossible to find that person again.

Limitations of Security-Only and Privacy-Only Compliance Solutions: Security-Only (Security tools such as encryption, hashing, static or stateless tokenization, data masking and related approaches help to protect against the unauthorized identification of data subjects using data that directly reveals the identity of a data subject within a single data source. However, those tools do nothing to protect against unauthorized re-identification of data subjects by correlating data attributes that exist in multiple data sources); Privacy-Only: Technologies developed to safeguard privacy rights either work on a binary access/no access basis, at the cost of constraining the utility and value of the data because it eliminates linkages.

https://cppa.ca.gov/regulations/pdf/cppa_regs.pdf

https://www.dataprotection.ie/sites/default/files/uploads/2019-07/190708%20Guidance%20for%20SMEs.pdf

retention policies and procedures: ensure data are held for no
longer than is necessary for the purposes for which they were collected?

https://docs.aws.amazon.com/pdfs/whitepapers/latest/navigating-gdpr-compliance/navigating-gdpr-compliance.pdf

AWS capabilities to support GDPR
Data access control: IAM, Temporary Access Tokens Through AWS STS, MFA, granular access to AWS resources, Control Access to Web Applications and Mobile Apps (Amazon Cognito)

Monitoring and Logging: Manage and Configure Assets with AWS Config, Compliance Auditing and Security Analytics (CloudTrail), CloudWatch Logs, Discovering and Protecting Data at Scale with Amazon Macie (PII detection), Centralized Security Management (AWS Control Tower, AWS Security Hub, Amazon GuardDuty, Amazon Inspector)

Data protection: Encrypt Data at Rest, Encrypt Data in Transit (VPN, AWS Direct Connect, AWS Certificate Manager/TLS), Encryption Tools (AWS Key Management Service, AWS CloudHSM, AWS Encryption SDK, Amazon DynamoDB Encryption Client, Linux DM-Crypt Infrastructure), Data Protection by Design and by Default (Nitro Systems, no operator access)

https://ico.org.uk/media/for-organisations/guide-to-the-general-data-protection-regulation-gdpr-1-0.pdf

https://www.gsa.gov/system/files/Personally-Identifiable-Information-%28PII%29-Processing-and-Transparency-Controls-%5BCIO-IT-Privacy-24-01.pdf

https://d1.awsstatic.com/partner-network/partner-solutions/data-governance-eBook.pdf

https://gdpr.eu/privacy-notice/

In privacy notice, how personal data is being used and retention policy, should be as transparent (not a thing hidden at backend). But most privacy notices are not so explicit about details (e.g. We will keep Personal Data no longer than necessary to fulfill the purposes described in this Notice. Under our record retention policy, we are required to destroy Personal Data after we no longer need it according to specific retention periods. However, we may need to hold Personal Data beyond these retention periods due to regulatory requirements or in response to a regulatory audit, investigation, or other legal matter. These requirements also apply to our third-party service providers.)

https://www.publicationsduquebec.gouv.qc.ca/fileadmin/Fichiers_client/lois_et_reglements/LoisAnnuelles/en/2021/2021C25A.PDF

https://www.accessprivacy.com/AccessPrivacy/media/AccessPrivacy/Content/AccessPrivacy-Bill-64-GDPR-Summary-Comparison-Table.pdf

Where an organization anonymizes personal information after the information is no longer required, the organization must anonymize the data in accordance with “generally accepted best practices” (s. 23). Bill 64 defines “anonymize” in an absolute and stringent fashion as “irreversibly no longer allows the person to be identified directly or indirectly” (s. 23).

Solutions integrated with Databricks Unity Catalog

Integrated with Databricks

Not integrated

https://www.opentext.com/products/data-privacy-and-protection

In order to comply with regulations on PII, firms need to provide transparent notice on how they handle user’s private data (but they tend to provide space for firm)

Sample Privacy Notice

https://business.bofa.com/content/dam/flagship/pdf/data-privacy-english.pdf

https://www.expat.hsbc.com/privacy/

https://firstbusiness.bank/privacy-notice/

https://business.bofa.com/content/dam/boamlimages/documents/articles/ID18_0208/GBAM-GDPR-Privacy-Notice-EMEA.pdf

https://www.wellsfargo.com/privacy-security/notice-of-data-collection/

https://www.jpmorgan.com/content/dam/jpm/global/documents/us-privacy-policies/jpm-apac-privacy-notice.pdf

https://www.pwc.com/m1/en/services/consulting/technology/cyber-security/documents/data-privacy-handbook.pdf

https://www.termsfeed.com/public/uploads/2021/12/sample-privacy-policy-template.pdf

Sample Data Retention policy

https://cdn2.hubspot.net/hub/166442/file-18451295-pdf/docs/ar_sample_bank_record_retention_policy.pdf?t=1422311019000

https://www.fl-counties.com/themes/bootstrap_subtheme/sitefinity/documents/2009_09_24_record_retention_and_destruction.pdf

https://www.regionalfoodbank.org/wp-content/uploads/2021/01/Document_Retention_Destruction_Policy.pdf

Some regulation retention duration

--

--

Xin Cheng

Multi/Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect, 3x Azure-certified, 3x AWS-certified, 2x GCP-certified