International Journal on Advanced Science, Engineering and Information Technology, Vol. 7 (2017) No. 3, pages: 808-814, DOI:10.18517/ijaseit.7.3.2395

Representing Semantics of Text by Acquiring its Canonical Form

Mohammed Ahmed Taiye, Siti Sakira Kamaruddin, Farzana Kabir Ahmad

Abstract

Canonical form is a notion stating that related idea should have the same meaning representation. It is a notion that greatly simplifies task by dealing with a single meaning representation for a wide range of expression. The issue in text representation is to generate a formal approach of capturing meaning or semantics in sentences. These issues include heterogeneity and inconsistency in text. Polysemous, synonymous, morphemes and homonymous word poses serious drawbacks when trying to capture senses in sentences. This calls for a need to capture and represent senses in order to resolve vagueness and improve understanding of senses in documents for knowledge creation purposes. We introduce a simple and straightforward method to capture canonical form of sentences. The proposed method first identifies the canonical forms using the Word Sense Disambiguation (WSD) technique and later applies the First Order Predicate Logic (FOPL) scheme to represent the identified canonical forms. We adopted two algorithms in WSD, which are Lesk and Selectional Preference Restriction. These algorithms concentrate mainly on disambiguating senses in words, phrases and sentences. Also we adopted the First order Predicate Logic scheme to analyse argument predicate in sentences, employing the consequence logic theorem to test for satisfiability, validity and completeness of information in sentences.

Keywords:

Semantic; Natural Language Processing; Canonical Form; First Order Predicate Logic; Word Sense Disambiguation;

cite this paper     download