● Build passing v1.5.0 ⭐ Star
Performance

Benchmark Results

Precision, recall, F 1 and throughput for the SPG heuristic and full (heuristic + ML) pipelines, measured against a labeled synthetic dataset of 128 ground-truth PII samples across 9 types.

Loading results…
Headline Numbers
Configuration Comparison
Configuration Precision Recall F1 Score Throughput MB/s Heap MB
Loading…
Comparison baseline — Microsoft Presidio: The Python Microsoft Presidio library is the de facto open-source reference for PII detection. Presidio is measured separately (Python environment, same synthetic dataset exported as JSON) because it cannot be invoked directly from the JVM test suite. Indicative numbers from the project README: Presidio achieves macro F1 ≈ 0.82 on English text at ~ 18 MB/s (single-threaded, spaCy lg model). SPG Heuristic+ML targets ≥ 0.93 macro F1 at ≥ 200 MB/s on the JVM — a throughput advantage of roughly 11× for comparable or better accuracy.
Methodology

Results are produced by BenchmarkResult.compute() in the test suite. Each configuration is run against a synthetic labeled dataset of 128 samples spread across 9 PII types (SSN, EMAIL, PHONE, CREDIT_CARD, API_KEY, PASSWORD, IP_ADDRESS, BANK_ACCOUNT, DATE_OF_BIRTH) plus 20 negative / clean samples. Matching is overlap-based: a detection counts as a true positive when its character span overlaps the labeled span and the PII type matches. Macro precision / recall / F1 are averaged uniformly across types that appear in the reference data.

Throughput assumes 2 bytes per Java char (UTF-16). Heap delta is measured via MemoryMXBean.getHeapMemoryUsage() before and after the full dataset pass, after a forced GC cycle.

Run It Yourself
# Clone the repository git clone https://github.com/Sushegaad/Semantic-Privacy-Guard.git cd Semantic-Privacy-Guard # Run the full benchmark suite (requires Java 17+ and Maven 3.8+) mvn test -P benchmark # Results are printed to stdout AND written to: # docs/benchmark-results.json # Refresh this page after running to see live numbers

The benchmark Maven profile targets **/*BenchmarkTest.java. No extra model files or environment variables are required for the heuristic and ML configurations.

Links