略過巡覽連結

NERChem: Adapting NERBio to Chemical Patents via Full-token Features and Named Entity Feature with Chemical Sub-Class Composition

Abstract

Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem, that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates (1) class composition, which is used for combining chemical classes whose naming conventions are similar; (2) BioNE features, which are used for distinguishing chemical mentions from other biomedical NE mentions in the patents; and (3) full-token word features, which are used to resolve the tokenization granularity problem. We evaluated our approach on the BioCreative V CHEMDNER-patent corpus, and achieved an F-score of 87.17% in the CEMP (Chemical Entity Mention in Patents) task and a sensitivity of 98.58% in the CPD (Chemical Passage Detection) task, ranking alongside the top systems.

Reference

R. T. Tsai, Y. C. Hsiao, P. T. Lai NERChem: Adapting NERBio to Chemical Patents via Full-token Features and Named Entity Feature with Chemical Sub-Class Composition, July 2016

Contact us

If you have any questions or comments, please contact us by emails:
Yu-Cheng Hsiao: leo10816@gmail.com
Po-Ting Lai: potinglai@gmail.com