PRIVACY·LGPD·2026

PII Redaction Models (pt-BR / LGPD)

Open models, demos, and benchmarks for redacting Brazilian Portuguese personal and sensitive data — including LGPD categories ordinary tools miss.

Visit PII → Discuss a similar build

Project hero drop 16:9 screenshot

/ Overview

Most redaction tooling is built for English and trained on the kinds of identifiers US compliance cares about. Brazilian Portuguese — and the LGPD’s specific categories of sensitive data — fall through the gaps. This project is a set of open models, live demos, and benchmarks aimed squarely at that gap.

The point is practical privacy: catch the data that actually has to be removed before a document leaves the building, in the language and legal frame it was written in.

/ What it does

Tags personal and sensitive data in pt-BR text at the token level, including LGPD categories ordinary tools miss.
Ships as open models with runnable demos, so the behaviour can be inspected rather than trusted on faith.
Publishes benchmarks against existing tools so the trade-offs are visible, not asserted.

/ Approach

Built on GLiNER and evaluated openly: every claim about coverage is backed by a benchmark you can re-run. Reproducibility and honest error reporting matter more than a headline number — the goal is a model a privacy team can actually rely on under LGPD.