International Journal on Advanced Science, Engineering and Information Technology, Vol. 8 (2018) No. 4-2: Special Issue on Empowering the Nation via 4IR (The Fourth Industrial Revolution)., pages: 1446-1452, Chief Editor: Khairuddin Omar | Editorial Boards : Shahnorbanun Sahran Hassan, Nor Samsiah Sani, Heuiseok Lim & Danial Hoosyar, DOI:10.18517/ijaseit.8.4-2.6816

Hybrid Machine Translation with Multi-Source Encoder-Decoder Long Short-Term Memory in English-Malay Translation

Yin-Lai Yeong, Tien-Ping Tan, Keng Hoon Gan, Siti Khaotijah Mohammad

Abstract

Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) are the state-of-the-art approaches in machine translation (MT). The translation produced by a SMT is based on the statistical analysis of text corpora, while NMT uses deep neural network to model and to generate a translation. SMT and NMT have their strength and weaknesses. SMT may produce better translation with a small parallel text corpus compared to NMT. Nevertheless, when the amount of parallel text available is large, the quality of the translation produced by NMT is often higher than SMT. Besides that, study also shown that the translation produced by SMT is better than NMT in cases where there is a domain mismatch between training and testing. SMT also has an advantage on long sentences. In addition, when a translation produced by an NMT is wrong, it is very difficult to find the error. In this paper, we investigate a hybrid approach that combine SMT and NMT to perform English to Malay translation. The motivation of using a hybrid machine translation is to combine the strength of both approaches to produce a more accurate translation. Our approach uses the multi-source encoder-decoder long short-term memory (LSTM) architecture. The architecture uses two encoders, one to embed the sentence to be translated, and another encoder to embed the initial translation produced by SMT. The translation from the SMT can be viewed as a “suggestion translation” to the neural MT. Our experiments show that the hybrid MT increases the BLEU scores of our best baseline machine translation in computer science domain and news domain from 21.21 and 48.35 to 35.97 and 61.81 respectively.

Keywords:

Hybrid Machine Translation; Statistical Machine Translation; Neural Machine Translation

Viewed: 174 times (since Sept 4, 2017)

cite this paper     download