[1]沈利言,姜海燕,胡滨,等.水稻病虫草害与药剂实体关系联合抽取算法[J].南京农业大学学报,2020,43(6):1151-1161.[doi:10.7685/jnau.201912024]
 SHEN Liyan,JIANG Haiyan,HU Bin,et al.A study on joint entity recognition and relation extraction for rice diseases pests weeds and drugs[J].Journal of Nanjing Agricultural University,2020,43(6):1151-1161.[doi:10.7685/jnau.201912024]
点击复制

水稻病虫草害与药剂实体关系联合抽取算法()
分享到:

《南京农业大学学报》[ISSN:1000-2030/CN:32-1148/S]

卷:
43卷
期数:
2020年6期
页码:
1151-1161
栏目:
食品与工程
出版日期:
2020-11-10

文章信息/Info

Title:
A study on joint entity recognition and relation extraction for rice diseases pests weeds and drugs
作者:
沈利言1 姜海燕12 胡滨1 谢元澄1
1. 南京农业大学人工智能学院, 江苏 南京 210095;
2. 南京农业大学国家信息农业工程技术中心, 江苏 南京 210095
Author(s):
SHEN Liyan1 JIANG Haiyan12 HU Bin1 XIE Yuancheng1
1. College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210095, China;
2. National Engineering and Technology Center for Information Agriculture, Nanjing Agricultural University, Nanjing 210095, China
关键词:
病虫草害实体关系抽取长短期记忆网络注意力机制
Keywords:
diseasespestsweedsentity relationship extractionlong short-term memory networksattention mechanism
分类号:
TP391.1
DOI:
10.7685/jnau.201912024
摘要:
[目的] 从水稻病虫草害防治文本中,自动抽取病虫草害与药剂之间的实体与关系,为构建作物系统领域知识图谱提供数据。[方法] 针对病虫草害防治文本中含有大量实体没有明确边界以及药剂与病虫草害实体之间存在多种类型关系的特点,设计了一种基于新标注模式的双层长短期记忆(bi-directional long short-term memory,BiLSTM)网络与注意力机制结合的水稻病虫草害与药剂的实体关系联合抽取算法(joint entity recognition and relation extraction for rice diseases,pests and weeds,JE-DPW)。该方法在解码层利用BiLSTM网络的前向传播和反向传播,增强对病虫草害防治文本中复杂语义特征的提取;再通过softmax分类器获取字符的类别标签,实现实体识别;与此同时,利用注意力机制判断当前字符与之前字符之间存在的关联关系,实现实体与多关系的联合抽取。[结果] 利用包含7 380个实体、8 605个关系的病虫草害防治文本数据集训练模型,使用测试集测试后发现:JE-DPW算法在病虫草害与药剂的实体抽取和关系分类任务中的准确率分别为91.3%和76.8%,对无边界实体识别的准确率为88.1%。与BiLSTM实现实体抽取方法相比,准确率高出8.1%。与利用循环神经网络(recurrent neural network,RNN)和长短期记忆网络(long short-term memory,LSTM)实现关系分类的方法相比,准确率分别高出22.6%和19.7%;随着关系数量的增加,JE-DPW算法在关系抽取上的F1值可保持17.4%~20.1%的优势。[结论] 本文提出的算法可以有效提升水稻病虫草害防治文本中实体关系联合抽取的准确度,提高作物系统领域知识库的构建速度。
Abstract:
[Objectives] From the documents on the control of diseases,pests and weeds for rice,the entities and relationships between diseases pests weeds and drugs were automatically extracted to provide an important data for the construction of knowledge maps in the field of crop systems.[Methods] Aiming at the characteristics that the documents contain a large number of entities without clear boundaries and multitype relationships between entities of drugs and diseases,pests and weeds,a joint entity recognition and relation extraction algorithm has been designed based on double BiLSTM(bi-directional long short-term memory) combined with attention mechanisms using the new annotation pattern. The algorithm name is a joint entity recognition and relation extraction for rice diseases,pests and weeds,referred to as JE-DPW(joint entity recognition and relation extraction for rice diseases,pestsss and weeds). This algorithm used the forward and backward propagation of the BiLSTM network at the decoding layer,which enhanced the extraction of complex semantic features in the diseases,pests and weeds control text. The Softmax classifier was used to obtain the category labels of characters to achieve entity recognition,and the attention mechanism was also used to determine the existing relationship between the current character and the previous character to realize the joint extraction of entities and multiple relationships.[Results] The model was trained using a disease,pest and weed and drugs data set containing 7 380 entities and 8 605 relationships,and it was found that the average accuracy of the JE-DPW algorithm in the entity extraction and relationship classification tasks reached respectively 91.3% and 76.8%,and the average accuracy rate of borderless entity recognition reached 88.1%. Compared with the BiLSTM implementation of the entity extraction method,the average accuracy rate was 8.1% higher. Compared with the algorithm using RNN(recurrent neural network) and LSTM(long short-term memory) to achieve the relationship classification respectively,the average accuracy rate was 22.6% and 19.7% higher respectively;as the number of relationships increased,the JE-DPW algorithm could maintain the F1 value of relationship extraction 17.4%-20.1% advantage.[Conclusions] The algorithm proposed in this paper can effectively improve the accuracy of joint extraction of entity relationships in the control texts of rice diseases,insect pests and weeds,and speed up the construction of the knowledge base of crop systems.

参考文献/References:

[1] 漆桂林,高桓,吴天星. 知识图谱研究进展[J]. 情报工程,2017,3(1):4-25. Qi G L,Gao H,Wu T X. The research advances of knowledge graph[J]. Technology Intelligence Engineering,2017,3(1):4-25(in Chinese with English abstract).
[2] Marrero M,Urbano J,Sánchez-Cuadrado S,et al. Named entity recognition:fallacies,challenges and opportunities[J]. Computer Standards & Interfaces,2013,35(5):482-489.
[3] Kumar S. A survey of deep learning methods for relation extraction[EB/OL]. (2017-08-13)[2018-06-12]. https://arxiv.org/pdf/1705.03645.pdf.
[4] 鄂海红,张文静,肖思琪,等. 深度学习实体关系抽取研究综述[J]. 软件学报,2019,30(6):1793-1818. E H H,Zhang W J,Xiao S Q,et al. Survey of entity relationship extraction based on deep learning[J]. Journal of Software,2019,30(6):1793-1818(in Chinese with English abstract).
[5] 王春雨,王芳. 基于条件随机场的农业命名实体识别研究[J]. 河北农业大学学报,2014,37(1):132-135. Wang C Y,Wang F. Study on recognition of Chinese agricultural named entity with conditional random fields[J]. Journal of Hebei Agricultural University,2014,37(1):132-135(in Chinese with English abstract).
[6] 李想,魏小红,贾璐,等. 基于条件随机场的农作物病虫害及农药命名实体识别[J]. 农业机械学报,2017,48(S1):178-185. Li X,Wei X H,Jia L,et al. Recognition of crops,diseases and pesticides named entities in Chinese based on conditional random fields[J]. Transactions of the Chinese Society for Agricultural Machinery,2017,48(S1):178-185(in Chinese with English abstract).
[7] 冯艳红,于红,孙庚,等. 基于BLSTM的命名实体识别方法[J]. 计算机科学,2018,45(2):261-268. Feng Y H,Yu H,Sun G,et al. Named entity recognition method based on BLSTM[J]. Computer Science,2018,45(2):261-268(in Chinese with English abstract).
[8] Zhang Y J,Zheng W,Lin H F,et al. Drug-drug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths[J]. Bioinformatics,2018,34(5):828-835.
[9] Li F,Zhang M,Fu G,et al. A Bi-LSTM-RNN model for relation classification using low-cost sequence features[EB/OL]. (2016-08-27)[2018-07-01]. https://arxiv.org/pdf/1608.07720.pdf.
[10] Miwa M,Bansal M. End-to-end relation extraction using LSTMs on sequences and tree structures[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin,Germany:Association for Computational Linguistics,2016:1105-1116.
[11] Feng Y T,Zhang H J,Hao W N,et al. Joint extraction of entities and relations using reinforcement learning and deep learning[J]. Computational Intelligence and Neuroscience,2017,2017:1-11.
[12] Zheng S C,Hao Y X,Lu D Y,et al. Joint entity and relation extraction based on a hybrid neural network[J]. Neurocomputing,2017,257:59-66.
[13] Zheng S,Wang F,Bao H,et al. Joint extraction of entities and relations based on a novel tagging scheme[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver,Canada,July 30-August 4,2017:1227-1236.
[14] Katiyar A,Cardie C. Going out on a limb:joint extraction of entity mentions and relations without dependency trees[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Vancouver,Canada,July 30-August 4,2017:917-928.
[15] Mikolov T,Chen K,Corrado G,et al. Efficient estimation of word representations in vector space[EB/OL]. (2014-04-20)[2018-12-13]. http://arxiv.org/pdf/1301.3781.pdf.
[16] Bahdanau D,Cho K,Bengio Y. Neural machine translation by jointly learning to align and translate[EB/OL]. (2016-05-19)[2019-03-02]. https://arxiv.org/pdf/1409.0473.pdf.
[17] Sun Junyi. 结巴中文分词[EB/OL]. (2014-02-01)[2018-12-19]. https://github.com/fxsjy/jieba.
[18] Thomas Roten. A Python wrapper around the NLPIR/ICTCLASchinese segmentation software[EB/OL]. (2015-08-07)[2019-03-14]. https://github.com/tsroten/pynlpir.
[19] Zhou L N,Zhang D S. NLPIR:a theoretical framework for applying natural language processing to information retrieval[J]. Journal of the American Society for Information Science and Technology,2003,54(2):115-123.
[20] Hochreiter S,Schmidhuber J. Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780.
[21] Bengio Y,Ducharme R,Vincent P,et al. A neural probabilistic language model[J]. Journal of Machine Learning Research,2003,3(6):1137-1155.
[22] 曹明宇,杨志豪,罗凌,等. 基于神经网络的药物实体与关系联合抽取[J]. 计算机研究与发展,2019,56(7):1432-1440. Cao M Y,Yang Z H,Luo L,et al. Joint extraction of drug entities and relationships based on neural networks[J]. Journal of Computer Research and Development,2019,56(7):1432-1440(in Chinese with English abstract).
[23] Lin Y,Shen S,Liu Z,et al. Neural relation extraction with selective attention over instances[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin,Germany,2016:2124-2133.
[24] 夏天赐,孙媛. 基于联合模型的藏文实体关系抽取方法研究[J]. 中文信息学报,2018,32(12):76-83. Xia T C,Sun Y. Tibetan entity relation extraction based on joint model[J]. Journal of Chinese Information Processing,2018,32(12):76-83(in Chinese with English abstract).
[25] Wang D X,Liu X,Luo H Z,et al. A novel framework for semantic entity identification and relationship integration in large scale text data[J]. Future Generation Computer Systems,2016,64:198-210.
[26] Ebrahimi J,Dou D. Chain based RNN for relation classification[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Denver,Colorado,May 31-June 5,2015:1244-1249.

备注/Memo

备注/Memo:
收稿日期:2019-12-13。
基金项目:国家重点研发计划项目(2016YFD0300607);江苏省研究生培养创新工程项目(SJCX18_0198)
作者简介:沈利言,硕士研究生。
通信作者:姜海燕,教授,博士,研究方向为模式识别与智能系统、农业人工智能,E-mail:jianghy@njau.edu.cn。
更新日期/Last Update: 1900-01-01