// the find

charliehzm/medharness

★ 102 · Python · Apache-2.0 · updated Jun 2026

让医疗机构所有大模型流量：PHI 不外泄 · 模型走白名单 · 全量可审计 · 成本可控。

MedHarness is a compliance gateway that sits in front of LLM APIs in Chinese healthcare settings, enforcing PHI detection, de-identification, model allowlisting, and audit logging before any prompt reaches an upstream model. It targets hospital IT departments and medical AI vendors who need to demonstrate regulatory compliance (PIPL, China's data security laws) and keep patient data from leaking to external model providers. The open-core model gives you the gateway and 11 MCP components free; the trained Chinese medical NER models and managed hosting are commercial.

The fail-closed architecture is the right default — any gate failure returns 503 rather than passing through, which is exactly what a compliance layer should do. The zero-trust data classification approach (gateway signs the tier, clients can't self-report their sensitivity level) closes the obvious bypass where an application claims its data is low-sensitivity to get routed to a cheaper model. The WORM + hash-chain audit log with 6-year replay is a serious engineering decision, not a checkbox — ClickHouse is a reasonable choice for append-only compliance storage at this volume. The red-team drill CI running weekly with hard thresholds (PHI recall ≥ 92%, injection block rate ≥ 95%) and auto-opening issues on failure is a disciplined way to prevent compliance regression.

The community version's phi-detector is rule-based + Presidio, which is openly admitted but means recall on real Chinese clinical text (medication names, diagnosis codes, unusual ID formats) will degrade significantly from the advertised 1.0 — that number comes from the synthetic corpus in tests/red-team-drills/fixtures, not production data. The gateway substrate is described obliquely as 'a mature open-source gateway kernel' with the actual project name withheld, which makes it hard to evaluate the security posture of the base layer or understand what you're actually deploying. AES-256-GCM de-identification works for reversible pseudonymization, but the key provider defaults to a file-based provider that almost certainly gets baked into the image or mounted as a volume, making key rotation and HSM integration an exercise left entirely to the operator. With 13 container images on a 4 vCPU / 4-5 GB host, the single-machine deployment model will struggle under any meaningful concurrent load from a real HIS system.

View on GitHub → Homepage ↗