Cite Article

Comparative Analysis of Different Data Representations for the Task of Chemical Compound Extraction

Choose citation format

BibTeX

@article{IJASEIT6432,
   author = {Basel Alshaikhdeeb and Kamsuriah Ahmad},
   title = {Comparative Analysis of Different Data Representations for the Task of Chemical Compound Extraction},
   journal = {International Journal on Advanced Science, Engineering and Information Technology},
   volume = {8},
   number = {5},
   year = {2018},
   pages = {2189--2195},
   keywords = {Chemical Compounds Extraction; Data Representation; N-gram; Detailed-Attributes; Naïve Bayes; Support Vector Machine; Attribute Selection},
   abstract = {

Chemical Compound Extraction refers to the task of recognizing chemical instances such as oxygen nitrogen and others. The majority of studies that addressed the task of chemical compound extraction used machine-learning techniques. The key challenge behind using machine-learning techniques lies in employing a robust set of features. In fact, the literature shows that there are numerous types of features used in the task of chemical compound extraction. Such dimensionality of features can be determined via data representation. Some researchers have used N-gram representation for biomedical-named entity recognition, where the most significant terms are represented as features. Meanwhile, others have used detailed-attribute representation in which the features are generalized. As a result, identifying the best combination of features to yield high-accuracy classification becomes challenging. This paper aims to apply the Wrapper Subset Selection approach using two data representations—N-gram and detailed-attributes. Since each data representation would suit a specific classification algorithm, two classifiers were utilized—Naïve Bayes (for detailed-attributes) and Support Vector Machine (for N-gram). The results show that the application of feature selection using detailed-attributes outperformed that of N-gram representation by achieving a 0.722 f-measure. Despite the higher classification accuracy, the selected features using detailed-attribute representation have more meaning and can be applied for further datasets.

},    issn = {2088-5334},    publisher = {INSIGHT - Indonesian Society for Knowledge and Human Development},    url = {http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=6432},    doi = {10.18517/ijaseit.8.5.6432} }

EndNote

%A Alshaikhdeeb, Basel
%A Ahmad, Kamsuriah
%D 2018
%T Comparative Analysis of Different Data Representations for the Task of Chemical Compound Extraction
%B 2018
%9 Chemical Compounds Extraction; Data Representation; N-gram; Detailed-Attributes; Naïve Bayes; Support Vector Machine; Attribute Selection
%! Comparative Analysis of Different Data Representations for the Task of Chemical Compound Extraction
%K Chemical Compounds Extraction; Data Representation; N-gram; Detailed-Attributes; Naïve Bayes; Support Vector Machine; Attribute Selection
%X 

Chemical Compound Extraction refers to the task of recognizing chemical instances such as oxygen nitrogen and others. The majority of studies that addressed the task of chemical compound extraction used machine-learning techniques. The key challenge behind using machine-learning techniques lies in employing a robust set of features. In fact, the literature shows that there are numerous types of features used in the task of chemical compound extraction. Such dimensionality of features can be determined via data representation. Some researchers have used N-gram representation for biomedical-named entity recognition, where the most significant terms are represented as features. Meanwhile, others have used detailed-attribute representation in which the features are generalized. As a result, identifying the best combination of features to yield high-accuracy classification becomes challenging. This paper aims to apply the Wrapper Subset Selection approach using two data representations—N-gram and detailed-attributes. Since each data representation would suit a specific classification algorithm, two classifiers were utilized—Naïve Bayes (for detailed-attributes) and Support Vector Machine (for N-gram). The results show that the application of feature selection using detailed-attributes outperformed that of N-gram representation by achieving a 0.722 f-measure. Despite the higher classification accuracy, the selected features using detailed-attribute representation have more meaning and can be applied for further datasets.

%U http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=6432 %R doi:10.18517/ijaseit.8.5.6432 %J International Journal on Advanced Science, Engineering and Information Technology %V 8 %N 5 %@ 2088-5334

IEEE

Basel Alshaikhdeeb and Kamsuriah Ahmad,"Comparative Analysis of Different Data Representations for the Task of Chemical Compound Extraction," International Journal on Advanced Science, Engineering and Information Technology, vol. 8, no. 5, pp. 2189-2195, 2018. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.8.5.6432.

RefMan/ProCite (RIS)

TY  - JOUR
AU  - Alshaikhdeeb, Basel
AU  - Ahmad, Kamsuriah
PY  - 2018
TI  - Comparative Analysis of Different Data Representations for the Task of Chemical Compound Extraction
JF  - International Journal on Advanced Science, Engineering and Information Technology; Vol. 8 (2018) No. 5
Y2  - 2018
SP  - 2189
EP  - 2195
SN  - 2088-5334
PB  - INSIGHT - Indonesian Society for Knowledge and Human Development
KW  - Chemical Compounds Extraction; Data Representation; N-gram; Detailed-Attributes; Naïve Bayes; Support Vector Machine; Attribute Selection
N2  - 

Chemical Compound Extraction refers to the task of recognizing chemical instances such as oxygen nitrogen and others. The majority of studies that addressed the task of chemical compound extraction used machine-learning techniques. The key challenge behind using machine-learning techniques lies in employing a robust set of features. In fact, the literature shows that there are numerous types of features used in the task of chemical compound extraction. Such dimensionality of features can be determined via data representation. Some researchers have used N-gram representation for biomedical-named entity recognition, where the most significant terms are represented as features. Meanwhile, others have used detailed-attribute representation in which the features are generalized. As a result, identifying the best combination of features to yield high-accuracy classification becomes challenging. This paper aims to apply the Wrapper Subset Selection approach using two data representations—N-gram and detailed-attributes. Since each data representation would suit a specific classification algorithm, two classifiers were utilized—Naïve Bayes (for detailed-attributes) and Support Vector Machine (for N-gram). The results show that the application of feature selection using detailed-attributes outperformed that of N-gram representation by achieving a 0.722 f-measure. Despite the higher classification accuracy, the selected features using detailed-attribute representation have more meaning and can be applied for further datasets.

UR - http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=6432 DO - 10.18517/ijaseit.8.5.6432

RefWorks

RT Journal Article
ID 6432
A1 Alshaikhdeeb, Basel
A1 Ahmad, Kamsuriah
T1 Comparative Analysis of Different Data Representations for the Task of Chemical Compound Extraction
JF International Journal on Advanced Science, Engineering and Information Technology
VO 8
IS 5
YR 2018
SP 2189
OP 2195
SN 2088-5334
PB INSIGHT - Indonesian Society for Knowledge and Human Development
K1 Chemical Compounds Extraction; Data Representation; N-gram; Detailed-Attributes; Naïve Bayes; Support Vector Machine; Attribute Selection
AB 

Chemical Compound Extraction refers to the task of recognizing chemical instances such as oxygen nitrogen and others. The majority of studies that addressed the task of chemical compound extraction used machine-learning techniques. The key challenge behind using machine-learning techniques lies in employing a robust set of features. In fact, the literature shows that there are numerous types of features used in the task of chemical compound extraction. Such dimensionality of features can be determined via data representation. Some researchers have used N-gram representation for biomedical-named entity recognition, where the most significant terms are represented as features. Meanwhile, others have used detailed-attribute representation in which the features are generalized. As a result, identifying the best combination of features to yield high-accuracy classification becomes challenging. This paper aims to apply the Wrapper Subset Selection approach using two data representations—N-gram and detailed-attributes. Since each data representation would suit a specific classification algorithm, two classifiers were utilized—Naïve Bayes (for detailed-attributes) and Support Vector Machine (for N-gram). The results show that the application of feature selection using detailed-attributes outperformed that of N-gram representation by achieving a 0.722 f-measure. Despite the higher classification accuracy, the selected features using detailed-attribute representation have more meaning and can be applied for further datasets.

LK http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=6432 DO - 10.18517/ijaseit.8.5.6432