Anonymisierung und Pseudonymisierung - Definition & Explanation

Anonymization and pseudonymization are two different methods for reducing the personal nature of data—with fundamentally different legal consequences under the GDPR.

Anonymization: No longer personally identifiable

Anonymized data is no longer subject to the GDPR—it is no longer considered personal data.

True anonymization approach:

Original: Name=Hans Müller, Age=42, Condition=Diabetes
Anonymized: Age=42, Condition=Diabetes (if age alone does not allow re-identification)
But: If Age+ZIP Code+Condition together uniquely identify the individual → NOT anonymized!

The Problem: Re-identification

True anonymization is extremely difficult. Netflix published "anonymized" movie ratings → Researchers were able to re-identify 84% of users by cross-referencing with IMDb.

Techniques for true anonymization:

k-anonymity: Each piece of data appears at least k times in the dataset
Differential Privacy: Statistical noise that hides individual data (Apple and Google use this)
Aggregation: Only sums/averages, no individual values

Pseudonymization replaces identifiers with pseudonyms—matching is possible if the "key" is known.

Example database:

-- Original
SELECT * FROM patients WHERE id = 12345;
-- id=12345, name=&quot;Hans Müller&quot;, address=&quot;Hauptstr. 1&quot;, diagnosis=&quot;Diabetes&quot;

Pseudonym table (separate, access-controlled): pseudonym_id=ABC123 ↔ patient_id=12345

Analysis table (for researchers): pseudonym_id=ABC123, age_group=40-45, region="NRW", diagnosis="Diabetes"

Legal effect (GDPR Recital 26):

Pseudonymized data IS still personal data (if a key exists)
However: Art. 32 GDPR lists pseudonymization as a protective measure (reduces risk)
Art. 89 GDPR: Exemptions for research/statistics using pseudonymized data

Practical Application

Database logging:

# Instead of logging in plain text
logger.info(f&quot;Login: user=hans.müller@company.de, ip=185.1.2.3&quot;)

# Pseudonymized logging
import hashlib
user_hash = hashlib.sha256(f&quot;hans.müller@company.de{SECRET_SALT}&quot;.encode()).hexdigest()[:12]
ip_hash = hashlib.sha256(f&quot;185.1.2.3{SECRET_SALT}&quot;.encode()).hexdigest()[:8]
logger.info(f&quot;Login: user={user_hash}, ip={ip_hash}&quot;)
# Log analysis possible (same person = same hash), but no real names visible

Analytics:

// Plausible Analytics (privacy-friendly):
// No tracking via sessions, no personal reference
// IP is not stored, no fingerprinting

// Google Analytics (problematic without consent):
// User ID, cross-session tracking → personal reference

Art. 5 (1) c GDPR: Data minimization – collect only what is strictly necessary.

Preferred strategy:

First, consider: Do we really need this data?
If yes: Collect only necessary fields
If data is collected for future purposes: Anonymization or pseudonymization
Define retention periods and automatically delete

Pseudonymization is a GDPR-compliant method for analyzing data for longer than would be permitted without protective measures.

Anonymization: No longer personally identifiable

Pseudonymization: GDPR Technique

Practical Application

GDPR Data Minimization vs. Anonymization

AWARE7 Services on This Topic