● Build passing v1.5.0 ⭐ Star
Open Source · Apache 2.0 · Java 17+

Open-source Privacy Firewall
for AI Applications

Prevent sensitive data from reaching LLMs — without breaking prompts.

Three-layer detection: Regex + Naive Bayes ML + Apache OpenNLP NER — runs fully on your infrastructure, zero cloud calls.

⭐ View on GitHub 📖 Documentation 📊 Benchmarks
SSN Credit Card Email Phone IPv4 / IPv6 API Keys Passwords Names (ML+NLP) Orgs (ML+NLP) IBAN / Bank DOB Custom Patterns
10+
PII Types
97%
Macro Precision
390+
MB/s Throughput
0
Cloud Dependencies

🔬 Live Playground

Type any text containing PII, then press Ctrl+Enter or click Redact
✏️ Input Text
🔒 Redacted Output
🔐

Redacted text will appear here

PII is replaced with structured tokens like [EMAIL_1], [SSN_1]…

🔑 Reverse Map (de-tokenisation)

Under the Hood

Three-Layer Detection Pipeline

Each layer catches what the others miss. Matches are de-duplicated and merged into a single ranked result.

📄
Input
Raw Text
Any string: LLM prompt, email body, support ticket, API log line…
🔤
Layer 1
HeuristicDetector
10 pre-compiled regexes. Luhn validation for cards, NANP for phones, Shannon entropy for API keys.
🧠
Layer 2
MLDetector
Multinomial Naive Bayes. Classifies title-cased tokens using bag-of-words context features.
🔭
Layer 3
NLPDetector
Apache OpenNLP MaxEnt NER. Handles compound names, hyphens, and varied syntactic positions.
⚖️
Merge
CompositeDetector
De-duplicates overlapping matches. Promotes same-span detections to HYBRID with elevated confidence.
Output
PIITokenizer
TOKEN / MASK / BLANK modes. Builds reverse map for de-tokenisation after the LLM responds.
🌱

Spring AI Integration — New in v1.4.0

Drop SPGAdvisor into any Spring AI ChatClient to automatically redact PII before every LLM call — three lines of code, zero config required.

View Module →