// the find

BYVoid/OpenCC

★ 9,789 · C++ · Apache-2.0 · updated Jul 2026

Library for conversion between Traditional and Simplified Chinese

OpenCC is a C++ library for converting between Simplified and Traditional Chinese, with proper phrase-level disambiguation and regional vocabulary support for Mainland China, Taiwan, and Hong Kong. It's the de facto standard for this problem — used by fcitx, RIME, and GoldenDict. If you need Chinese character conversion that isn't naive character substitution, this is what you reach for.

Phrase-level segmentation means it correctly handles the one-simplified-to-many-traditional ambiguity that character-by-character converters botch ('数据库' → '資料庫' in Taiwan config, not '數據庫'). Regional vocabulary configs (s2twp, tw2sp) go beyond character forms to swap out vocabulary differences between Mainland, Taiwan, and HK. The --inspect diagnostic mode outputs per-stage JSON showing exactly which conversion step produced a surprising result, which is genuinely useful when debugging a dictionary entry. Python and Node.js bindings ship as prebuilt wheels/Node-API binaries, so most users never touch the C++ toolchain.

The Jieba segmentation plugin ABI is explicitly unstable and will break between releases — don't ship a product that depends on it yet. The Hong Kong phrase configs (s2hkp, hk2sp) are marked as still under development, so HK regional vocabulary coverage is noticeably thinner than Taiwan's. The 1.4.0 SOVERSION bump to libopencc.so.1.4 means any C++ project that dynamically links OpenCC needs a recompile, and the README buries this in release notes rather than leading with it. On Windows, mixing MSVC-built and MinGW-built plugin/host binaries is unsupported with no runtime error — you just get silent misbehavior.

View on GitHub → Homepage ↗