Due to small-scale training data in low resource NMT(LNMT), many structures in test sets are not covered by the training data. We call such structures Unknown Structures. We study how these Unknown Structures lead to bad performance of LNMT. Experiments show that when these Unknown Structures become Known Structures, LNMT can perform better than or comparable to high resource NMT. We propose an efficient algorithm to improve LNMT performance. Instead of collecting large-scale parallel sentences, we mine reasonable number of parallel phrases for improving LNMT.
The code and dataset are coming soon.