We use BioCreative V BEL corpus [1] to evaluate our approach. The corpus contains the BEL statements and the corresponding evidence sentences. The F1 measure is used to evaluate the BEL statements. For term-level evaluation, only the correctness of NEs is evaluated. NEs are regarded as correct if the identifiers are correct. For function-level evaluation, the correctness of the discovered function is evaluated. Functions are correct when both the NE’s identifier and function are correct. As for the relationship-level evaluation, only the NEs and the relationships are considered. Relation is correct when both the NEs’ identifiers and the relationship type are correct. For the BEL-level evaluation, the NEs’ identifiers, function and the relationship type are all required to be correct for a true positive case.
[1] Fluck J, Madan S, Ansari S, et al. Training and evaluation corpora for the extraction of causal relationships encoded in biological expression language (BEL). Database: The Journal of Biological Databases and Curation. 2016;2016:baw113. doi:10.1093/database/baw113.

The overall performance of each level

The performances of each type