Data Diversification: A Simple Strategy For Neural Machine Translation

Page view(s)

Checked on

Please use this identifier to cite or link to this item: https://astaroar.ripplewerkz.co/communities-collections/articles/16773

Title:

Data Diversification: A Simple Strategy For Neural Machine Translation

Journal Title:

Neural Information Processing Systems (NeurIPS)

DOI:

OA Status:

Publication URL:

https://proceedings.neurips.cc/paper/2020/file/7221e5c8ec6b08ef6d3f9ff3ce6eb1d1-Paper.pdf

Authors:

Xuan-Phi Nguyen, Joty Shafiq, Kui Wu, Ai Ti Aw

Keywords:

Data Diversification, Data Augmentation, Neural Machine Translation

Publication Date:

01 December 2020

Citation:

Abstract:

We introduce Data Diversification: a simple but effective strategy to boost neural machine translation (NMT) performance. It diversifies the training data by using the predictions of multiple forward and backward models and then merging them with the original dataset on which the final NMT model is trained. Our method is applicable to all NMT models. It does not require extra monolingual data like back-translation, nor does it add more computations and parameters like ensembles of models. Our method achieves state-of-the-art BLEU scores of 30.7 and 43.7 in the WMT'14 English-German and English-French translation tasks, respectively. It also substantially improves on 8 other translation tasks: 4 IWSLT tasks (English-German and English-French) and 4 low-resource translation tasks (English-Nepali and English-Sinhala). We demonstrate that our method is more effective than knowledge distillation and dual learning, it exhibits strong correlation with ensembles of models, and it trades perplexity off for better BLEU score.

License type:

Funding Info:

Xuan-Phi is supported by the A*STAR Computing and Information Science (ACIS) scholarship, provided by the Agency for Science, Technology and Research Singapore (A*STAR). Shafiq Joty would like to thank the funding support from NRF(NRF2016IDM-TRANS001-062), Singapore

Description:

URI:

https://astaroar.ripplewerkz.co/communities-collections/articles/16773

ISBN:

Collections:

Institute for Infocomm Research

Files uploaded:

Manuscripts in This Item:

File	Size	Format	Action
data-diversification-a-simple-strategy-for-neural-machine-translation.pdf	282.51 KB	PDF	Open