Skip to content

Services, Wiki-Artikel, Blog-Beiträge und Glossar-Einträge durchsuchen

↑↓NavigierenEnterÖffnenESCSchließen
Datenschutz Glossary

Anonymisierung und Pseudonymisierung

Anonymization permanently removes personal identifiers—pseudonymization replaces them with a pseudonym and is traceable. Both methods are key GDPR techniques for implementing data protection by design.

Anonymization and pseudonymization are two different methods for reducing the personal nature of data—with fundamentally different legal consequences under the GDPR.

Anonymization: No longer personally identifiable

Anonymized data is no longer subject to the GDPR—it is no longer considered personal data.

True anonymization approach:

  • Original: Name=Hans Müller, Age=42, Condition=Diabetes
  • Anonymized: Age=42, Condition=Diabetes (if age alone does not allow re-identification)
  • But: If Age+ZIP Code+Condition together uniquely identify the individual → NOT anonymized!

The Problem: Re-identification

True anonymization is extremely difficult. Netflix published "anonymized" movie ratings → Researchers were able to re-identify 84% of users by cross-referencing with IMDb.

Techniques for true anonymization:

  • k-anonymity: Each piece of data appears at least k times in the dataset
  • Differential Privacy: Statistical noise that hides individual data (Apple and Google use this)
  • Aggregation: Only sums/averages, no individual values

Pseudonymization: GDPR Technique

Pseudonymization replaces identifiers with pseudonyms—matching is possible if the "key" is known.

Example database:

-- Original
SELECT * FROM patients WHERE id = 12345;
-- id=12345, name="Hans Müller", address="Hauptstr. 1", diagnosis="Diabetes"

Pseudonym table (separate, access-controlled): pseudonym_id=ABC123 ↔ patient_id=12345

Analysis table (for researchers): pseudonym_id=ABC123, age_group=40-45, region="NRW", diagnosis="Diabetes"

Legal effect (GDPR Recital 26):

  • Pseudonymized data IS still personal data (if a key exists)
  • However: Art. 32 GDPR lists pseudonymization as a protective measure (reduces risk)
  • Art. 89 GDPR: Exemptions for research/statistics using pseudonymized data

Practical Application

Database logging:

# Instead of logging in plain text
logger.info(f"Login: user=hans.müller@company.de, ip=185.1.2.3")

# Pseudonymized logging
import hashlib
user_hash = hashlib.sha256(f"hans.müller@company.de{SECRET_SALT}".encode()).hexdigest()[:12]
ip_hash = hashlib.sha256(f"185.1.2.3{SECRET_SALT}".encode()).hexdigest()[:8]
logger.info(f"Login: user={user_hash}, ip={ip_hash}")
# Log analysis possible (same person = same hash), but no real names visible

Analytics:

// Plausible Analytics (privacy-friendly):
// No tracking via sessions, no personal reference
// IP is not stored, no fingerprinting

// Google Analytics (problematic without consent):
// User ID, cross-session tracking → personal reference

GDPR Data Minimization vs. Anonymization

Art. 5 (1) c GDPR: Data minimization – collect only what is strictly necessary.

Preferred strategy:

  1. First, consider: Do we really need this data?
  2. If yes: Collect only necessary fields
  3. If data is collected for future purposes: Anonymization or pseudonymization
  4. Define retention periods and automatically delete

Pseudonymization is a GDPR-compliant method for analyzing data for longer than would be permitted without protective measures.