BP02: Disease Named Entity Recognition Using NCBI Corpus
|Title||BP02: Disease Named Entity Recognition Using NCBI Corpus|
|Publication Type||Conference Paper|
|Year of Publication||2016|
|Authors||Hahn T, Rahman HUr, Segall R|
|Conference Name||International Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016)|
|Publisher||CEUR-ws.org Volume 1747|
Named Entity Recognition (NER) in biomedical literature is a very active research area. NER is a crucial component of biomedical text mining because it allows for information retrieval, reasoning and knowledge discovery. Much research has been carried out in this area using semantic type categories, such as ﬁDNAﬂ, ﬁRNAﬂ, ﬁproteinsﬂ and ﬁgenesﬂ. However, disease NER has not received its needed attention yet, specifically human disease NER. Traditional machine learning approaches lack the precision for disease NER, due to their dependence on token level features, sentence level features and the integration of features, such as orthographic, contextual and linguistic features. In this paper a method for disease NER is proposed which utilizes sentence and token level features based on Conditional Random Fields using the NCBI disease corpus. Our system utilizes rich features including orthographic, contextual, affixes, bigrams, part of speech and stem based features. Using these feature sets our approach has achieved a maximum F-score of 94% for the training set by applying 10 fold cross validation for semantic labeling of the NCBI disease corpus. For testing and development corpus the model has achieved an F-score of 88% and 85% respectively.