YE Hui,JI Dong-hong.Research on Symptom and Medicine Information Abstraction of TCM Book Jin Gui Yao Lue Based on Conditional Random Field[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2016,40(5):14-17.[doi:10.3969/j.issn.2095-5707.2016.05.004]
基于多特征条件随机场的《金匮要略》症状药物信息抽取研究
- Title:
- Research on Symptom and Medicine Information Abstraction of TCM Book Jin Gui Yao Lue Based on Conditional Random Field
- 文章编号:
- 2095-5707(2016)05-0014-04
- Keywords:
- conditional random fields (CRF); Jin Gui Yao Lue; symptom and medicine information abstraction; ancient TCM books
- 分类号:
- R222.3
- 文献标志码:
- A
- 摘要:
- 目的 结合自然语言处理方法,研究可以有效抽取中医古籍中所含症状和药物文本实体信息的方法。方法 以《金匮要略》为例,采用条件随机场(CRF)算法,先将文本进行分词处理,然后以词性、基于键值对的中医诊断标记集作为辅助特征,通过症状-药物BIO标签为训练特征来训练出模型,然后利用该模型对测试集文本进行自动标签标注。结果 基于多特征CRF自动标注的结果准确率达到84.5%,召回率达到70.9%,F测度值达到77.1%。结论 运用CRF方法加入词性、中医诊断标记集特征集进行训练得出的多特征模型,能有效提高CRF算法对中医古籍的实体抽取能力,生成的模型可用来自动化抽取中医古籍文本的症状药物实体信息。
- Abstract:
- Objective To find an efficient way to abstract symptoms and medicine information from TCM book Jin Gui Yao Lue through combination of natural language processing method. Methods Taking Jin Gui Yao Lue as an example and by using conditional random fields (CRF), texts were processed according to words, and then part of speech and key assignments based on TCM diagnosis marker group were set as auxiliary features. Symptom-medicine BIO labels were set as the training features to train the model. Then this model was used to conduct automatic labeling to tested texts. Results The accuracy rate of automatic labeling based on multi-feature CRF was 84.5%, recall rate 70.9%, F measure value 77.1%. Conclusion The multi-feature model trained through CRF combined with part of speech and TCM diagnosis marker group can successfully improve abstraction entity information ability from ancient TCM books. The model can be used to automatically abstract symptom and medicine entity information from ancient TCM books.
参考文献/References:
[1] 马瑞民,马民艳.基于CRFs的多策略生物医学命名实体识别[J].齐齐哈尔大学学报,2011,27(1):39-42.
[2] LAFFERTY JD, MCCALLUM A, PEREIRA FCN. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]//The 18th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc. ,2001:282-289.
[3] 孟洪宇.基于条件随机场的《伤寒论》中医术语自动识别研究[D].北京:北京中医药大学,2014:33-34
[4] 王国龙,杜建强,郝竹林,等.中医诊断古文的词性标注与特征重组[J].计算机工程与设计,2015,36(3):836-841.
[5] 魏尊强,舒红平,王亚强.基于序列标注的中医症状名识别技术研究[J].山东工业技术,2015(8):237-238.
相似文献/References:
[1]朱宝琛,师帅,张菀桐.仲景通阳利水法治疗小便不利刍议[J].中国中医药图书情报杂志,2016,40(3):60.[doi:10.3969/j.issn.2095-5707.2016.03.016]
ZHU Bao-chen,SHI Shuai,ZHANG Wan-tong.Rustic Opinions on the Therapy of Inducing Perspiration and Diuresis By Zhang Zhongjing for the Treatment of Difficult Urination[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2016,40(5):60.[doi:10.3969/j.issn.2095-5707.2016.03.016]
[2]孙超,谢晴宇*.中医病历术语识别方法探讨[J].中国中医药图书情报杂志,2020,44(2):1.[doi:10.3969/j.issn.2095-5707.2020.02.001]
SUN Chao,XIE Qing-yu*.Discussion on Methods of Terminology Recognition in TCM Medical Records[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2020,44(5):1.[doi:10.3969/j.issn.2095-5707.2020.02.001]
[3]蔡向红,郑蕾*.痰饮治则“温药和之”中“和”字发音商榷[J].中国中医药图书情报杂志,2020,44(5):55.[doi:10.3969/j.issn.2095-5707.2020.05.013]
CAI Xiang-hong,ZHENG Lei*.Discussion on Pronunciation of “He” in “Phlegm and Fluid Retention Treated by Warm Herbs” of Therapeutic Principle for Phlegm and Fluid[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2020,44(5):55.[doi:10.3969/j.issn.2095-5707.2020.05.013]
备注/Memo
收稿日期:2016-06-08
更新日期/Last Update:
2016-09-26