// the find

liuhuanyong/QASystemOnMedicalKG

★ 7,304 · Python · updated Aug 2024

A tutorial and implement of disease centered Medical knowledge graph and qa system based on it。知识图谱构建，自动问答，基于kg的自动问答。以疾病为中心的一定规模医药领域知识图谱，并以该知识图谱完成自动问答与分析服务。

A tutorial project that builds a Chinese medical knowledge graph from scratch using Neo4j, then layers a rule-based QA system on top. Covers 44k entities and 300k relationships scraped from a medical website, supporting 18 question types like symptom lookup, drug recommendations, and treatment duration. Aimed at NLP students learning KG construction, not production medical applications.

The end-to-end pipeline is genuinely educational — scraping, entity extraction, graph ingestion, and Cypher-based QA are all present and connected. The knowledge graph schema is sensibly disease-centric with clearly typed relationships (has_symptom, recommand_drug, acompany_with) that make the graph traversal logic easy to follow. The pre-built medical.json dataset means you can skip the multi-hour scrape and go straight to querying. The QA intent classifier is honest rule-based NLP rather than pretending ML is doing something it isn't here.

Python 3.6 bytecache files are committed to the repo, and the last meaningful update was 2024 with no sign of Python version modernization — expect dependency pain. The data is scraped from a single Chinese medical website without attribution or quality controls; drug recommendations and symptom associations should not be trusted for any real use. The QA system is entirely keyword-matching with Cypher templates — it breaks on anything outside its fixed question patterns and has no fallback. No English support and no internationalization path, which limits the audience to Chinese-language medical data consumers.

View on GitHub →