← All projects

PRIVACY·LGPD·2026

PII Redaction Models (pt-BR / LGPD)

Open models, demos, and benchmarks for redacting Brazilian Portuguese personal and sensitive data — including LGPD categories ordinary tools miss.

Project hero drop 16:9 screenshot

/ Overview

Most redaction tooling is built for English and trained on the kinds of identifiers US compliance cares about. Brazilian Portuguese — and the LGPD’s specific categories of sensitive data — fall through the gaps. This project is a set of open models, live demos, and benchmarks aimed squarely at that gap.

The point is practical privacy: catch the data that actually has to be removed before a document leaves the building, in the language and legal frame it was written in.

/ What it does

  • Tags personal and sensitive data in pt-BR text at the token level, including LGPD categories ordinary tools miss.
  • Ships as open models with runnable demos, so the behaviour can be inspected rather than trusted on faith.
  • Publishes benchmarks against existing tools so the trade-offs are visible, not asserted.

/ Approach

Built on GLiNER and evaluated openly: every claim about coverage is backed by a benchmark you can re-run. Reproducibility and honest error reporting matter more than a headline number — the goal is a model a privacy team can actually rely on under LGPD.