Open Source · Apache 2.0 · Java 17+ · Zero Cloud Dependencies

AI Privacy Firewall
for Java

Detect and redact sensitive information before it reaches ChatGPT, Claude, Gemini, MCP tools, RAG pipelines, or internal AI systems.

PII Detection PHI Detection Financial Data Detection AI Prompt Sanitization On-Prem Deployment Java Native

⭐ Star on GitHub 📖 Docs

12+PII Types

97%Precision

390+MB/s

3Detection Layers

0Cloud Deps

Try It Yourself

Real-World AI Prompt Scenarios

Click any example to instantly see SPG sanitize the prompt before it reaches an LLM

🎫

Customer Support Ticket

Customer PII sent alongside a support summary request to an AI agent.

Load Example →

🏥

Healthcare Record

Patient PHI in a prompt asking an LLM to summarise a medical record.

Load Example →

🏢

Enterprise Employee Record

HR data with salary, ID, and email sent to an internal AI assistant.

Load Example →

🤖

AI Prompt Example

A management summary request containing several sensitive identifiers.

Load Example →

Edge Cases

Test the Limits

These examples show where simple regex stops and where the full Java library with OpenNLP NER takes over

Standard format Browser catches ✓

Card: 4532 0151 1283 0366 DOB: born on 03/15/1985 AWS: AKIAIOSFODNN7EXAMPLE

Structured patterns — regex + Luhn validation catches cards, prefixed DOBs, and well-known API key formats instantly.

Obfuscated email Full Java lib needed

john dot smith at gmail dot com

Spoken / obfuscated format. The browser regex engine won't match this — it requires the full Java library's contextual NLP layer.

Spoken numbers Full Java lib needed

call me at five five five one two three four SSN one two three dash four five dash six seven eight nine

Written-out digits. The NER layer in the full Java library identifies these using linguistic context, not pattern matching.

Compound names Partial — ML layer

Dear Dr. Mary Elizabeth Watson-Jones, please sign the attached NDA.

Multi-token hyphenated names. The ML layer catches context-preceded names; the full Java NER catches compound forms like "Watson-Jones".

IBAN variants Browser catches ✓

IBAN: GB29NWBK60161331926819 DE89370400440532013000 FR7630006000011234567890189

International Bank Account Numbers. The regex pattern covers all ISO 13616 country codes — caught in the browser demo.

Password patterns Browser catches ✓

password=MyS3cr3tP@ss! DB_PASSWORD=Secure#2024 secret=xK9#mN2$vL7@qR5 passphrase: correct horse

Key-value credential patterns. Keyword-prefix detection catches all standard config file formats, including dotenv and YAML.

💡 The browser playground runs Regex + Naive Bayes ML layers. The full Java library adds Apache OpenNLP NER as a third layer — handling compound names, obfuscated formats, and natural-language identifiers. See the full library →

Under the Hood

Three-Layer Detection Pipeline

Each layer catches what the others miss. Matches are de-duplicated and merged into a single ranked result.

📄

Input

Raw Text

Any string: LLM prompt, email body, support ticket, API log line…

🔤

Layer 1

HeuristicDetector

10 pre-compiled regexes. Luhn validation for cards, NANP for phones, Shannon entropy for API keys.

🧠

Layer 2

MLDetector

Multinomial Naive Bayes. Classifies title-cased tokens using bag-of-words context features.

🔭

Layer 3

NLPDetector

Apache OpenNLP MaxEnt NER. Handles compound names, hyphens, and varied syntactic positions.

⚖️

Merge

CompositeDetector

De-duplicates overlapping matches. Promotes same-span detections to HYBRID with elevated confidence.

✅

Output

PIITokenizer

TOKEN / MASK / BLANK modes. Builds reverse map for de-tokenisation after the LLM responds.

Get Started

Add to Your Project in 30 Seconds

Available on Maven Central. Works with Spring Boot, Quarkus, Micronaut, or plain Java 17+.

Maven Dependency

Add one dependency, zero configuration required.

pom.xml

<dependency>
  <groupId>io.github.sushegaad</groupId>
  <artifactId>semantic-privacy-guard</artifactId>
  <version>1.6.0</version>
</dependency>

📦 GitHub Repository 📖 Documentation 🏛 Maven Central 📊 Benchmark Results

⭐

If SPG saved your data, star the repo

Help other Java developers discover an open-source AI privacy firewall that runs entirely on-prem — no cloud, no cost.

⭐ Star on GitHub

v1.6.0Latest

Apache 2.0License

Java 17+Requires

AI Privacy Firewallfor Java

Real-World AI Prompt Scenarios

Customer Support Ticket

Healthcare Record

Enterprise Employee Record

AI Prompt Example

🔬 Try Semantic Privacy Guard

Test the Limits

Three-Layer Detection Pipeline

Add to Your Project in 30 Seconds

Maven Dependency

If SPG saved your data, star the repo

AI Privacy Firewall
for Java