Dealing with out-of-vocabulary problem in sentence alignment using word similarity
Sentence alignment plays an essential role in building bilingual corpora
which are valuable resources for many applications like statistical
machine translation. In various approaches of sentence alignment,
length-and-word-based methods which are based on sentence length and
word correspondences have been shown to be the most effective.
Nevertheless a drawback of using bilingual dictionaries trained by IBM
Models in length-and-word-based methods is the problem of
out-of-vocabulary (OOV). We propose using word similarity learned from
monolingual corpora to overcome the problem. Experimental results showed
that our method can reduce the OOV ratio and achieve a better
performance than some other lengthand- word-based methods. This implies
that using word similarity learned from monolingual data may help to
deal with OOV problem in sentence alignment.
Title:
Dealing with out-of-vocabulary problem in sentence alignment using word similarity | |
Authors: | Trieu, H.-L. Nguyen, L.-M. Nguyen, P.-T. |
Keywords: | Monolingual data Out-ofvocabulary Sentence alignment Word similarity |
Issue Date: | 2016 |
Publisher: | Institute for the Study of Language and Information |
Citation: | Scopus |
Abstract: | Sentence alignment plays an essential role in building bilingual corpora which are valuable resources for many applications like statistical machine translation. In various approaches of sentence alignment, length-and-word-based methods which are based on sentence length and word correspondences have been shown to be the most effective. Nevertheless a drawback of using bilingual dictionaries trained by IBM Models in length-and-word-based methods is the problem of out-of-vocabulary (OOV). We propose using word similarity learned from monolingual corpora to overcome the problem. Experimental results showed that our method can reduce the OOV ratio and achieve a better performance than some other lengthand- word-based methods. This implies that using word similarity learned from monolingual data may help to deal with OOV problem in sentence alignment. |
Description: | Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation, PACLIC 2016 2016, Pages 259-266 |
URI: | http://repository.vnu.edu.vn/handle/VNU_123/29802 |
ISBN: | 978-896817428-5 |
Appears in Collections: | Bài báo của ĐHQGHN trong Scopus |
Nhận xét
Đăng nhận xét