Anonymisierung und Pseudonymisierung
Anonymization permanently removes personal identifiers—pseudonymization replaces them with a pseudonym and is traceable. Both methods are key GDPR techniques for implementing data protection by design.
Anonymization and pseudonymization are two different methods for reducing the personal nature of data—with fundamentally different legal consequences under the GDPR.
Anonymization: No longer personally identifiable
Anonymized data is no longer subject to the GDPR—it is no longer considered personal data.
True anonymization approach:
- Original: Name=Hans Müller, Age=42, Condition=Diabetes
- Anonymized: Age=42, Condition=Diabetes (if age alone does not allow re-identification)
- But: If Age+ZIP Code+Condition together uniquely identify the individual → NOT anonymized!
The Problem: Re-identification
True anonymization is extremely difficult. Netflix published "anonymized" movie ratings → Researchers were able to re-identify 84% of users by cross-referencing with IMDb.
Techniques for true anonymization:
- k-anonymity: Each piece of data appears at least k times in the dataset
- Differential Privacy: Statistical noise that hides individual data (Apple and Google use this)
- Aggregation: Only sums/averages, no individual values
Pseudonymization: GDPR Technique
Pseudonymization replaces identifiers with pseudonyms—matching is possible if the "key" is known.
Example database:
-- Original
SELECT * FROM patients WHERE id = 12345;
-- id=12345, name="Hans Müller", address="Hauptstr. 1", diagnosis="Diabetes"
Pseudonym table (separate, access-controlled): pseudonym_id=ABC123 ↔ patient_id=12345
Analysis table (for researchers): pseudonym_id=ABC123, age_group=40-45, region="NRW", diagnosis="Diabetes"
Legal effect (GDPR Recital 26):
- Pseudonymized data IS still personal data (if a key exists)
- However: Art. 32 GDPR lists pseudonymization as a protective measure (reduces risk)
- Art. 89 GDPR: Exemptions for research/statistics using pseudonymized data
Practical Application
Database logging:
# Instead of logging in plain text
logger.info(f"Login: user=hans.müller@company.de, ip=185.1.2.3")
# Pseudonymized logging
import hashlib
user_hash = hashlib.sha256(f"hans.müller@company.de{SECRET_SALT}".encode()).hexdigest()[:12]
ip_hash = hashlib.sha256(f"185.1.2.3{SECRET_SALT}".encode()).hexdigest()[:8]
logger.info(f"Login: user={user_hash}, ip={ip_hash}")
# Log analysis possible (same person = same hash), but no real names visible
Analytics:
// Plausible Analytics (privacy-friendly):
// No tracking via sessions, no personal reference
// IP is not stored, no fingerprinting
// Google Analytics (problematic without consent):
// User ID, cross-session tracking → personal reference
GDPR Data Minimization vs. Anonymization
Art. 5 (1) c GDPR: Data minimization – collect only what is strictly necessary.
Preferred strategy:
- First, consider: Do we really need this data?
- If yes: Collect only necessary fields
- If data is collected for future purposes: Anonymization or pseudonymization
- Define retention periods and automatically delete
Pseudonymization is a GDPR-compliant method for analyzing data for longer than would be permitted without protective measures.