PRIVACY·LGPD·2026
PII Redaction Models (pt-BR / LGPD)
Open models, demos, and benchmarks for redacting Brazilian Portuguese personal and sensitive data — including LGPD categories ordinary tools miss.
/ Overview
Most redaction tooling is built for English and trained on the kinds of identifiers US compliance cares about. Brazilian Portuguese — and the LGPD’s specific categories of sensitive data — fall through the gaps. This project is a set of open models, live demos, and benchmarks aimed squarely at that gap.
The point is practical privacy: catch the data that actually has to be removed before a document leaves the building, in the language and legal frame it was written in.
/ What it does
- Tags personal and sensitive data in pt-BR text at the token level, including LGPD categories ordinary tools miss.
- Ships as open models with runnable demos, so the behaviour can be inspected rather than trusted on faith.
- Publishes benchmarks against existing tools so the trade-offs are visible, not asserted.
/ Approach
Built on GLiNER and evaluated openly: every claim about coverage is backed by a benchmark you can re-run. Reproducibility and honest error reporting matter more than a headline number — the goal is a model a privacy team can actually rely on under LGPD.