SlideShare una empresa de Scribd logo
1 de 6
Descargar para leer sin conexión
Fast and Accurate Preordering for SMT using Neural Networks
Adri`a de Gispert Gonzalo Iglesias Bill Byrne
SDL Research
East Road, Cambridge CB1 1BH, U.K.
{agispert|giglesias|bbyrne}@sdl.com
Abstract
We propose the use of neural networks to
model source-side preordering for faster and
better statistical machine translation. The neu-
ral network trains a logistic regression model
to predict whether two sibling nodes of the
source-side parse tree should be swapped in
order to obtain a more monotonic parallel
corpus, based on samples extracted from the
word-aligned parallel corpus. For multiple
language pairs and domains, we show that this
yields the best reordering performance against
other state-of-the-art techniques, resulting in
improved translation quality and very fast de-
coding.
1 Introduction
Preordering is a pre-processing task in translation
that aims to reorder the source sentence so that it
best resembles the order of the target sentence. If
done correctly, it has a doubly beneficial effect: it
allows a better estimation of word alignment and
translation models which results in higher transla-
tion quality for distant language pairs, and it speeds
up decoding enormously as less word movement is
required.
Preordering schemes can be automatically learnt
from source-side parsed, word-aligned parallel cor-
pora. Recently Jehl et al (2014) described a scheme
based on a feature-rich logistic regression model
that predicts whether a pair of sibling nodes in the
source-side dependency tree need to be permuted.
Based on the node-pair swapping probability pre-
dictions of this model, a branch-and-bound search
returns the best ordering of nodes in the tree.
We propose using a neural network (NN) to es-
timate this node-swapping probability. We find that
this straightforward change to their scheme has mul-
tiple advantages:
1. The superior modeling capabilities of NNs
achieve better performance at preordering and
overall translation quality when using the same
set of features.
2. There is no need to manually define which feature
combinations are to be considered in training.
3. Preordering is even faster as a result of the previ-
ous point. Our results in translating from English
to Japanese, Korean, Chinese, Arabic and Hindi
support these findings by comparing against two
other preordering schemes.
1.1 Related Work
There is a strong research and commercial in-
terest in preordering, as reflected by the exten-
sive previous work on the subject (Collins et al.,
2005; Xu et al., 2009; DeNero and Uszkor-
eit, 2011; Neubig et al., 2012). We are inter-
ested in practical, language-independent preorder-
ing approaches that rely only on automatic source-
language parsers (Genzel, 2010). The most recent
work in this area uses large-scale feature-rich dis-
riminative models, effectively treating preordering
either as a learning to rank (Yang et al., 2012), multi-
classification (Lerner and Petrov, 2013) or logistic
regression (Jehl et al., 2014) problem. In this paper
we incorporate NNs into the latter approach.
Lately an increasing body of work that uses NNs
for various NLP tasks has been published, includ-
ing language modeling (Bengio et al., 2003), POS
tagging (Collobert et al., 2011), or dependency pars-
ing (Chen and Manning, 2014). In translation, NNs
have been used for improved word alignment (Yang
et al., 2013; Tamura et al., 2014; Songyot and Chi-
ang, 2014), to model reordering under an ITG gram-
mar (Li et al., 2013), and to define additional feature
functions to be used in decoding (Sundermeyer et
al., 2014; Devlin et al., 2014). End-to-end transla-
tion systems based on NNs have also been proposed
(Kalchbrenner and Blunsom, 2013; Sutskever et al.,
2014; Bahdanau et al., 2014).
Despite the gains reported, only the approaches
that do not dramatically affect decoding times can
be directly applied to today’s commercial SMT sys-
tems. Our paper is a step towards this direction, and
to the best of our knowledge, it is the first one to
describe the usage of NNs in preordering for SMT.
2 Preordering as node-pair swapping
Jehl et al (2014) describe a preordering scheme
based on a logistic regression model that predicts
whether a pair of sibling nodes in the source-side de-
pendency tree need to be permuted in order to have
a more monotonically-aligned parallel corpus. Their
method can be briefly summarised by the pseudo-
code of Figure 1.
Let N be the set of nodes in the source tree, and let
Cn be the set of children nodes of node n. For each
node with at least two children, first extract the node
features (lines 1-2). Then, for each pair of its chil-
dren nodes: extract their respective features (lines
4-5), produce all relevant feature combinations (line
6), and store the node-pair swapping probability pre-
dicted by a logistic regression model based on all
available features (line 7). Once all pair-wise proba-
bilities are stored, search for the best global permu-
tation and sort Cn accordingly (lines 9-10).
As features, Jehl et al (2014) use POS tags and de-
pendency labels, as well as the identity and class of
the head word (for the parent node) or the left/right-
most word (for children nodes). These are combined
into conjunctions of 2 or 3 features to create new
features. For logistic regression, they train a L1-
regularised linear model using LIBLINEAR (Fan
et al., 2008). The training samples are either pos-
itive/negative depending on whether swapping the
nodes reduces/increases the number of crossed links
PREORDERPARSETREE
1 for each node n ∈ N, |Cn| > 1
2 F ← GETFEATURES(n)
3 for each pair of nodes i, j ∈ Cn, i = j
4 F ← F ∪ GETFEATURES(i)
5 F ← F ∪ GETFEATURES(j)
6 Fc ← FEATURECOMBINATIONS(F)
7 pn(i, j) = LOGREGPREDICT(F, Fc)
8 end for
9 πn ← SEARCHPERMUTATION(pn)
10 SORT(Cn, πn)
Figure 1: Pseudocode for the preordering scheme of Jehl
et al (2014)
in the aligned parallel corpus.
2.1 Applying Neural Networks
“A (feedforward) neural network is a series of logis-
tic regression models stacked on top of each other,
with the final layer being either another logistic re-
gression model or a linear regression model” (Mur-
phy, 2012).
Given this, we propose a straightforward alterna-
tive to the above framework: replace the linear logis-
tic regression model by a neural network (NN). This
way a superior modeling performance of the node-
swapping phenomenon is to be expected. Addition-
ally, feature combination need not be engineered
anymore because that is learnt by the NN in train-
ing (line 6 in Figure 1 is skipped).
Training the neural network requires the same la-
beled samples that were used by Jehl et al (2014).
We use the NPLM toolkit out-of-the-box (Vaswani
et al., 2013). The architecture is a feed-forward neu-
ral network (Bengio et al., 2003) with four layers.
The first layer i contains the input embeddings. The
next two hidden layers (h1, h2) use rectified linear
units; the last one is the softmax layer (o). We did
not experiment with deeper NNs.
For our purposes, the input vocabulary of the NN
is the set of all possible feature indicator names
that are used for preordering1. There are no OOVs.
Given the sequence of ∼ 20 features seen by the
1
Using a vocabulary of the 5K top-frequency English words,
50 word classes, approximately 40 POS tags and 50 depen-
dency labels, the largest input vocabulary in our experiments
is roughly 30,000.
preorderer, the NN is trained to predict whether the
nodes should be reordered or not, i.e. |o| = 2. For
the rest of the layers, we use |i| = 50, |h1| = 100,
|h2| = 50. We set the learning rate to 1, the mini-
batch size to 64 and the number of epochs to 20.
3 Experiments
3.1 Data and setup
We report translation results in English into
Japanese, Korean, Chinese, Arabic and Hindi. For
each language pair, we use generic parallel data ex-
tracted from the web. The number of words is about
100M for Japanese and Korean, 300M for Chinese,
200M for Arabic and 9M for Hindi.
We use two types of dev/test sets: in-domain and
mix-domain. The in-domain sets have 2K sentences
each and were created by randomly extracting par-
allel sentences from the corpus, ensuring no repe-
titions remained. The mix-domain sets have about
1K sentences each and were created to evenly rep-
resent 10 different domains, including world news,
chat/SMS, health, sport, science, business, and oth-
ers.
Additionally, we report results on the English-to-
Hindi WMT 2014 shared task (Bojar et al., 2014a)
using the data provided2. The dev and test sets
have 520 and 2507 sentences each. All dev and
test sets have one single reference. We use SVM-
Tool (Gim´enez and M`arquez, 2004) for POS Tag-
ging, and MaltParser (Nivre et al., 2007) for depen-
dency parsing.
3.2 Intrinsic reordering evaluation
We evaluate the intrinsic preordering task on a ran-
dom 5K-sentence subset of the training data which is
excluded from model estimation. We report the nor-
malized crossing score c/s, where c is the number
of crossing links (Genzel, 2010; Yang et al., 2012)
in the aligned parallel corpus, and s is the number
of source (e.g. English) words. Ideally we would
like this metric to be zero, meaning a completely
monotonic parallel corpus3; the more monotonic the
2
HindEndCorp v0.5 (Bojar et al., 2014b)
3
However this may not be achievable given the alignment
links and parse tree available. In this approach, only permuta-
tions of sibling nodes in the single source parse tree are permit-
ted.
Figure 2: Normalized crossing score for English into
Japanese, Korean, Hindi, Chinese, Arabic, Spanish and
Portuguese.
corpus, the better the translation models will be and
the faster decoding will run as less distortion will be
needed.
Normalizing over the number of source words
allows us to compare this metric across language
pairs, and so the potential impact of preordering in
translation performance becomes apparent. See Fig-
ure 2 for results across several language pairs. In all
cases our proposed NN-based preorderer achieves
the lowest normalized crossing score among all pre-
ordering schemes.
3.3 Translation performance
For translation experiments, we use a phrase-based
decoder that incorporates a set of standard features
and a hierarchical reordering model (Galley and
Manning, 2008). The decoder stack size is set to
1000. Weights are tuned using MERT to optimize
BLEU on the dev set. In English-to-Japanese and
Chinese we use character-BLEU instead. To min-
imise optimization noise, we tune all our systems
from flat parameters three times and report average
BLEU score and standard deviation on the test set.
Table 1 contrasts the performance obtained by
the system when using no preordering capabilities
(baseline), and when using three alternative pre-
ordering schemes: the rule-based approach of Gen-
zel (2010), the linear-model logistic-regression ap-
proach of Jehl et al (2014) and our NN-based pre-
orderer. We report two baselines: one with distortion
limit d = 10 and another with d = 3. For systems
d system speed eng-jpn eng-kor
ratio in mixed in mixed
10 baseline 1x 54.5 ±0.2 26.2 ±0.2 33.5 ±0.3 9.7 ±0.2
3 baseline 3.2x 50.9 ±0.2 25.0 ±0.2 28.7 ±0.1 8.2 ±0.1
3 Genzel (2010) 2.7x 54.0 ±0.1 26.4 ±0.2 30.5 ±0.2 9.8 ±0.2
3 Jehl et al (2014) 2.3x 55.0 ±0.1 26.9 ±0.2 33.1 ±0.1 10.4 ±0.1
3 this work 2.7x 55.6 ±0.2 27.2 ±0.1 33.4 ±0.1 10.6 ±0.2
d system eng-chi eng-ara eng-hin
in mixed in mixed mixed wmt14
10 baseline 46.9 ±0.5 18.4 ±0.6 25.1 ±0.1 22.7 ±0.2 10.1 ±0.3 11.7 ±0.1
3 baseline 44.8 ±0.7 18.3 ±0.4 24.6 ±0.1 21.9 ±0.2 8.3 ±0.2 9.3 ±0.3
3 Genzel (2010) 45.4 ±0.2 17.9 ±0.2 24.8 ±0.1 21.6 ±0.3 9.6 ±0.2 11.4 ±0.3
3 Jehl et al (2014) 45.8 ±0.1 18.5 ±0.3 25.1 ±0.2 22.4 ±0.2 10.0 ±0.1 12.7 ±0.3
3 this work 46.5 ±0.4 19.2 ±0.2 25.5 ±0.2 22.6 ±0.1 10.6 ±0.1 12.6 ±0.3
best WMT14 constrained system 11.1
Table 1: Translation performance for various language pairs using no preordering (baseline), and three alternative
preordering systems. Average test BLEU score and standard deviation across 3 independent tuning runs. Speed ratio
is calculated with respect to the speed of the slower baseline that uses a d = 10. Stack size is 1000. For eng-jpn and
eng-chi, character-based BLEU is used.
with preordering we only report d = 3, as increasing
d does not improve performance.
As shown, our preorderer obtains the best BLEU
scores for all reported languages and domains, prov-
ing that the neural network is modeling the depen-
dency tree node-swapping phenomenon more accu-
rately, and that the reductions in crossing score re-
ported in the previous section have a positive impact
in the final translation performance.
The bottom right-most column reports results on
the WMT 2014 English-to-Hindi task. Our system
achieves better results than the best score reported
for this task4. In this case, the two logistic regression
preorderers perform similarly, as standard deviation
is higher, possibly due to the small size of the dev
set.
3.4 Decoding Speed
The main purpose of preordering is to find a better
translation performance in fast decoding conditions.
In other words, by preordering the text we expect to
be able to decode with less distortion or phrase re-
ordering, resulting in faster decoding. This is shown
in Table 1, which reports the speed ratio between
each system and the speed of the top-most baseline,
4
Details at matrix.statmt.org/matrix/systems list/1749
as measured in English-to-Japanese in-domain5. We
find that decoding with a d = 3 is about 3 times
faster than for d = 10.
We now take this further by reducing the stack
size from 1000 to 50; see results in Table 2. As ex-
pected, all systems accelerate with respect to our ini-
tial baseline. However, this usually comes at a cost
in BLEU with respect to using a wider beam, unless
preordering is used. In fact, the logistic regression
preorderers achieve the same performance while de-
coding over 60 times faster than the baseline.
Interestingly, the NN-based preorderer turns out
to be slightly faster than any of the other preordering
approaches. This is because there is no need to ex-
plicitly create thousands of feature combinations for
each node pair; simply performing the forward ma-
trix multiplications given the input sequence of ∼20
features is more efficient. Similar observations have
been noted recently in the context of dependency
parsing with NNs (Chen and Manning, 2014). Note
also that, on average, only 25 pair-wise probabili-
ties are queried to the logistic regression model per
source sentence. Overall, we incorporate the bene-
fits of neural networks for preordering at no compu-
5
Similar speed ratios were observed for other language pairs
(not reported here for space)
d system speed eng-jpn eng-kor eng-chi
w/ stack=50 ratio in mixed in mixed mixed
10 baseline 22x 53.6 (-0.9) 25.4 (-0.8) 32.8 (-0.7) 9.3 (-0.4) 17.9 (-0.5)
3 baseline 66x 50.5 (-0.4) 24.8 (-0.2) 28.8 (+0.1) 8.1 (-0.1) 18.0 (-0.3)
3 Genzel (2010) 64x 53.8 (-0.2) 26.3 (-0.1) 30.4 (-0.1) 9.8 (0.0) 18.1 (+0.2)
3 Jehl et al (2014) 61x 55.0 (0.0) 26.5 (-0.4) 33.0 (-0.1) 10.4 (0.0) 18.3 (-0.2)
3 this work 65x 55.7 (+0.1) 27.2 (0.0) 33.2 (-0.2) 10.8 (+0.2) 19.1 (-0.1)
Table 2: Translation performance for maximum stack size of 50. The figures in parentheses indicate the difference
in BLEU scores due to using a smaller stack size, that is, compared to the same systems in Table 1. Speed ratio is
calculated with respect to the speed of the slower baseline that uses a stack of 1000, eg. the first row in Table 1.
tational cost.
Currently, our preorderer takes 6.3% of the to-
tal decoding time (including 2.6% for parsing and
3.7% for actually preordering). We believe that fur-
ther improvements in preordering will result in more
translation gains and faster decoding, as the distor-
tion limit is lowered.
4 Conclusions
To the best of our knowledge, this is the first paper to
describe the usage of NNs in preordering for SMT.
We show that simply replacing the logistic regres-
sion node-swapping model with an NN model im-
proves both crossing scores and translation perfor-
mance across various language pairs. Feature com-
bination engineering is avoided, which also results
in even faster decoding times.
References
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-
gio. 2014. Neural machine translation by jointly
learning to align and translate. CoRR, abs/1409.0473.
Yoshua Bengio, Rjean Ducharme, Pascal Vincent, and
Christian Jauvin. 2003. A neural probabilistic lan-
guage model. Machine Learning Research, 3:1137–
1155.
Ondrej Bojar, Christian Buck, Christian Federmann,
Barry Haddow, Philipp Koehn, Johannes Leveling,
Christof Monz, Pavel Pecina, Matt Post, Herve Saint-
Amand, Radu Soricut, Lucia Specia, and Aleˇs Tam-
chyna. 2014a. Findings of the 2014 workshop on
statistical machine translation. In Proceedings of the
Ninth Workshop on Statistical Machine Translation,
pages 12–58.
Ondˇrej Bojar, Vojtˇech Diatka, Pavel Rychl´y, Pavel
Straˇn´ak, V´ıt Suchomel, Aleˇs Tamchyna, and Daniel
Zeman. 2014b. HindEnCorp - Hindi-English and
Hindi-only Corpus for Machine Translation. In Pro-
ceedings of LREC, pages 3550–3555.
Danqi Chen and Christopher Manning. 2014. A fast and
accurate dependency parser using neural networks. In
Proceedings of EMNLP, pages 740–750.
Michael Collins, Philipp Koehn, and Ivona Kucerova.
2005. Clause Restructuring for Statistical Machine
Translation. In Proceedings of ACL, pages 531–540.
Ronan Collobert, Jason Weston, L´eon Bottou, Michael
Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011.
Natural language processing (almost) from scratch.
Journal of Machine Learning Research, 12:2493–
2537.
John DeNero and Jakob Uszkoreit. 2011. Inducing Sen-
tence Structure from Parallel Corpora for Reordering.
In Proceedings of EMNLP, pages 193–203.
Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas
Lamar, Richard Schwartz, and John Makhoul. 2014.
Fast and robust neural network joint models for sta-
tistical machine translation. In Proceedings of ACL,
pages 1370–1380.
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui
Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A Li-
brary for Large Linear Classification. Journal of Ma-
chine Learning Research, 9:1871–1874.
Michel Galley and Christopher D. Manning. 2008. A
Simple and Effective Hierarchical Phrase Reordering
Model. In Proceedings of EMNLP, pages 847–855.
Dmitriy Genzel. 2010. Automatically learning source-
side reordering rules for large scale machine transla-
tion. In Proceedings of COLING, pages 376–384.
Jes´us Gim´enez and Llu´ıs M`arquez. 2004. SVMTool: A
general POS tagger generator based on Support Vector
Machines. In Proceedings of LREC, pages 43–46.
Laura Jehl, Adri`a de Gispert, Mark Hopkins, and Bill
Byrne. 2014. Source-side preordering for translation
using logistic regression and depth-first branch-and-
bound search. In Proceedings of EACL, pages 239–
248.
Nal Kalchbrenner and Phil Blunsom. 2013. Recur-
rent continuous translation models. In Proceedings of
EMNLP, pages 1700–1709.
Uri Lerner and Slav Petrov. 2013. Source-Side Classifier
Preordering for Machine Translation. In Proceedings
of EMNLP, pages 513–523.
Peng Li, Yang Liu, and Maosong Sun. 2013. Recursive
autoencoders for ITG-based translation. In Proceed-
ings of EMNLP, pages 567–577.
Kevin P. Murphy. 2012. Machine Learning: A Proba-
bilistic Perspective. MIT Press, Cambridge, MA.
Graham Neubig, Taro Watanabe, and Shinsuke Mori.
2012. Inducing a Discriminative Parser to Optimize
Machine Translation Reordering. In Proceedings of
EMNLP-CoNLL, pages 843–853.
Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev,
Glsen Eryigit, Sandra K¨ubler, Svetoslav Marinov,
and Erwin Marsi. 2007. Maltparser: A language-
independent system for data-driven dependency pars-
ing. Natural Language Engineering, 13(2):95–135.
Theerawat Songyot and David Chiang. 2014. Improving
word alignment using word similarity. In Proceedings
of EMNLP, pages 1840–1845.
Martin Sundermeyer, Tamer Alkhouli, Joern Wuebker,
and Hermann Ney. 2014. Translation modeling with
bidirectional recurrent neural networks. In Proceed-
ings of EMNLP, pages 14–25.
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014.
Sequence to sequence learning with neural networks.
CoRR, abs/1409.3215.
Akihiro Tamura, Taro Watanabe, and Eiichiro Sumita.
2014. Recurrent neural networks for word alignment
model. In Proceedings of ACL, pages 1470–1480.
Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and
David Chiang. 2013. Decoding with large-scale neu-
ral language models improves translation. In Proceed-
ings of EMNLP, pages 1387–1392.
Peng Xu, Jaeho Kang, Michael Ringgaard, and Franz
Och. 2009. Using a Dependency Parser to Improve
SMT for Subject-Object-Verb Languages. In Pro-
ceedings of HTL-NAACL, pages 245–253.
Nan Yang, Mu Li, Dongdong Zhang, and Nenghai Yu.
2012. A ranking-based approach to word reordering
for statistical machine translation. In Proceedings of
ACL, pages 912–920.
Nan Yang, Shujie Liu, Mu Li, Ming Zhou, and Nenghai
Yu. 2013. Word alignment modeling with context de-
pendent deep neural network. In Proceedings of ACL,
pages 166–175.

Más contenido relacionado

La actualidad más candente

ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"nozyh
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015Conor McGrory
 
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...Fwdays
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Yuki Tomo
 
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONcsandit
 
Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Tomonari Masada
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
 
Chunking in manipuri using crf
Chunking in manipuri using crfChunking in manipuri using crf
Chunking in manipuri using crfijnlc
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextBayu Aldi Yansyah
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...csandit
 

La actualidad más candente (20)

ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"ACL読み会2014@PFI "Less Grammar, More Features"
ACL読み会2014@PFI "Less Grammar, More Features"
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015
 
Bert
BertBert
Bert
 
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
Kostiantyn Omelianchuk, Oleksandr Skurzhanskyi "Building a state-of-the-art a...
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSIONADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
ADAPTIVE AUTOMATA FOR GRAMMAR BASED TEXT COMPRESSION
 
Word embedding
Word embedding Word embedding
Word embedding
 
2224d_final
2224d_final2224d_final
2224d_final
 
Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Chunking in manipuri using crf
Chunking in manipuri using crfChunking in manipuri using crf
Chunking in manipuri using crf
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
art.pdf
art.pdfart.pdf
art.pdf
 
Access Control via Belnap Logic
Access Control via Belnap LogicAccess Control via Belnap Logic
Access Control via Belnap Logic
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
 

Destacado

Hanson Hosein: Storyteller Uprising Fall 2013
Hanson Hosein: Storyteller Uprising Fall 2013Hanson Hosein: Storyteller Uprising Fall 2013
Hanson Hosein: Storyteller Uprising Fall 2013Hanson Hosein
 
High Volume, Rapid Turn Around Localization: Lessons Learned
High Volume, Rapid Turn Around Localization: Lessons LearnedHigh Volume, Rapid Turn Around Localization: Lessons Learned
High Volume, Rapid Turn Around Localization: Lessons LearnedSDL
 
Content marketing in the age of crap
Content marketing in the age of crapContent marketing in the age of crap
Content marketing in the age of crapDeeply Digital
 
Philips Healthcare: A Case Study. Adoptiong a Test Center Approach to Launch...
Philips Healthcare: A Case Study.  Adoptiong a Test Center Approach to Launch...Philips Healthcare: A Case Study.  Adoptiong a Test Center Approach to Launch...
Philips Healthcare: A Case Study. Adoptiong a Test Center Approach to Launch...SDL
 
Top Ten Best Practices About Translation Quality Measurement
Top Ten Best Practices About Translation Quality MeasurementTop Ten Best Practices About Translation Quality Measurement
Top Ten Best Practices About Translation Quality MeasurementSDL
 
Transcreation for Deep Cross-Cultural Connection
Transcreation for Deep Cross-Cultural ConnectionTranscreation for Deep Cross-Cultural Connection
Transcreation for Deep Cross-Cultural ConnectionSDL
 
Lights, Camera, Translation... Action!
Lights, Camera, Translation... Action!Lights, Camera, Translation... Action!
Lights, Camera, Translation... Action!SDL
 
Video localization: Take Your Videos Global
Video localization: Take Your Videos GlobalVideo localization: Take Your Videos Global
Video localization: Take Your Videos GlobalSDL
 
The Power OF Story - 4 Archetypes For Content Creation Strategies
The Power OF Story - 4 Archetypes For Content Creation StrategiesThe Power OF Story - 4 Archetypes For Content Creation Strategies
The Power OF Story - 4 Archetypes For Content Creation StrategiesThe Content Advisory
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShareSlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShareSlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShareSlideShare
 

Destacado (13)

Hanson Hosein: Storyteller Uprising Fall 2013
Hanson Hosein: Storyteller Uprising Fall 2013Hanson Hosein: Storyteller Uprising Fall 2013
Hanson Hosein: Storyteller Uprising Fall 2013
 
High Volume, Rapid Turn Around Localization: Lessons Learned
High Volume, Rapid Turn Around Localization: Lessons LearnedHigh Volume, Rapid Turn Around Localization: Lessons Learned
High Volume, Rapid Turn Around Localization: Lessons Learned
 
Content marketing in the age of crap
Content marketing in the age of crapContent marketing in the age of crap
Content marketing in the age of crap
 
Philips Healthcare: A Case Study. Adoptiong a Test Center Approach to Launch...
Philips Healthcare: A Case Study.  Adoptiong a Test Center Approach to Launch...Philips Healthcare: A Case Study.  Adoptiong a Test Center Approach to Launch...
Philips Healthcare: A Case Study. Adoptiong a Test Center Approach to Launch...
 
Top Ten Best Practices About Translation Quality Measurement
Top Ten Best Practices About Translation Quality MeasurementTop Ten Best Practices About Translation Quality Measurement
Top Ten Best Practices About Translation Quality Measurement
 
Transcreation for Deep Cross-Cultural Connection
Transcreation for Deep Cross-Cultural ConnectionTranscreation for Deep Cross-Cultural Connection
Transcreation for Deep Cross-Cultural Connection
 
Lights, Camera, Translation... Action!
Lights, Camera, Translation... Action!Lights, Camera, Translation... Action!
Lights, Camera, Translation... Action!
 
Video localization: Take Your Videos Global
Video localization: Take Your Videos GlobalVideo localization: Take Your Videos Global
Video localization: Take Your Videos Global
 
The Power OF Story - 4 Archetypes For Content Creation Strategies
The Power OF Story - 4 Archetypes For Content Creation StrategiesThe Power OF Story - 4 Archetypes For Content Creation Strategies
The Power OF Story - 4 Archetypes For Content Creation Strategies
 
3 Steps to Prevent PRESENTATION POLLUTION by @slidecomet
3 Steps to Prevent PRESENTATION POLLUTION by @slidecomet3 Steps to Prevent PRESENTATION POLLUTION by @slidecomet
3 Steps to Prevent PRESENTATION POLLUTION by @slidecomet
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Similar a Fast and Accurate Preordering for SMT using Neural Networks

ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...Lifeng (Aaron) Han
 
Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...IJECEIAES
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
 
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...Lifeng (Aaron) Han
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...Lifeng (Aaron) Han
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accentsipij
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
 
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSAN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSijaia
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...tsysglobalsolutions
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
 

Similar a Fast and Accurate Preordering for SMT using Neural Networks (20)

ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
 
Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...
 
ssc_icml13
ssc_icml13ssc_icml13
ssc_icml13
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...
ACL-WMT2013.Quality Estimation for Machine Translation Using the Joint Method...
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
 
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSAN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONS
 
semeval2016
semeval2016semeval2016
semeval2016
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 

Más de SDL

Architecting Your Global Digital Experience House - Nicole Uhlig and Derek Pa...
Architecting Your Global Digital Experience House - Nicole Uhlig and Derek Pa...Architecting Your Global Digital Experience House - Nicole Uhlig and Derek Pa...
Architecting Your Global Digital Experience House - Nicole Uhlig and Derek Pa...SDL
 
The Marketer's Dilemma in Today's Global Digital Era - Liesl Leary and Henry ...
The Marketer's Dilemma in Today's Global Digital Era - Liesl Leary and Henry ...The Marketer's Dilemma in Today's Global Digital Era - Liesl Leary and Henry ...
The Marketer's Dilemma in Today's Global Digital Era - Liesl Leary and Henry ...SDL
 
SDL Vision for Digital Experience - Arjen van den Akker at SDL Connect 16
SDL Vision for Digital Experience - Arjen van den Akker at SDL Connect 16SDL Vision for Digital Experience - Arjen van den Akker at SDL Connect 16
SDL Vision for Digital Experience - Arjen van den Akker at SDL Connect 16SDL
 
The Challenge and Opportunity of Website Globalization - Joost Comperen and M...
The Challenge and Opportunity of Website Globalization - Joost Comperen and M...The Challenge and Opportunity of Website Globalization - Joost Comperen and M...
The Challenge and Opportunity of Website Globalization - Joost Comperen and M...SDL
 
SDL's Vision for Globalization - Maxwell Hoffman at SDL Connect 16
SDL's Vision for Globalization - Maxwell Hoffman at SDL Connect 16SDL's Vision for Globalization - Maxwell Hoffman at SDL Connect 16
SDL's Vision for Globalization - Maxwell Hoffman at SDL Connect 16SDL
 
Applying Machine Learning - Abdessamad Echihabi at SDL Connect 16
Applying Machine Learning - Abdessamad Echihabi at SDL Connect 16Applying Machine Learning - Abdessamad Echihabi at SDL Connect 16
Applying Machine Learning - Abdessamad Echihabi at SDL Connect 16SDL
 
Beyond Globalization: Achieving Universal Understanding - Adolfo Hernandez at...
Beyond Globalization: Achieving Universal Understanding - Adolfo Hernandez at...Beyond Globalization: Achieving Universal Understanding - Adolfo Hernandez at...
Beyond Globalization: Achieving Universal Understanding - Adolfo Hernandez at...SDL
 
Panel: Translation Quality Challenges
Panel: Translation Quality ChallengesPanel: Translation Quality Challenges
Panel: Translation Quality ChallengesSDL
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...SDL
 
Convergence: How to Bring Together Content Management & Localization to Conq...
Convergence: How to Bring Together Content Management & Localization to Conq...Convergence: How to Bring Together Content Management & Localization to Conq...
Convergence: How to Bring Together Content Management & Localization to Conq...SDL
 
iMT Language Solutions
iMT Language SolutionsiMT Language Solutions
iMT Language SolutionsSDL
 
SDL Knowledge Center: Advanced Techniques for Rapid Global Content Creation
SDL Knowledge Center:  Advanced Techniques for Rapid Global Content CreationSDL Knowledge Center:  Advanced Techniques for Rapid Global Content Creation
SDL Knowledge Center: Advanced Techniques for Rapid Global Content CreationSDL
 
Multilingual Device & L10n Testing - An Introduction to the SDL Test Lab
Multilingual Device & L10n Testing - An Introduction to the SDL Test LabMultilingual Device & L10n Testing - An Introduction to the SDL Test Lab
Multilingual Device & L10n Testing - An Introduction to the SDL Test LabSDL
 
Terminology Management Best Practices
Terminology Management Best PracticesTerminology Management Best Practices
Terminology Management Best PracticesSDL
 
How to Extend Your Content Marketing Plan to a Global Audience
How to Extend Your Content Marketing Plan to a Global AudienceHow to Extend Your Content Marketing Plan to a Global Audience
How to Extend Your Content Marketing Plan to a Global AudienceSDL
 
An Arabizi-English Social Media Statistical Machine Translation System
An Arabizi-English Social Media Statistical Machine Translation SystemAn Arabizi-English Social Media Statistical Machine Translation System
An Arabizi-English Social Media Statistical Machine Translation SystemSDL
 
Redefine Your Global Video Strategy: Video Localization
Redefine Your Global Video Strategy: Video LocalizationRedefine Your Global Video Strategy: Video Localization
Redefine Your Global Video Strategy: Video LocalizationSDL
 
The Case for Enterprise Translation Management
The Case for Enterprise Translation ManagementThe Case for Enterprise Translation Management
The Case for Enterprise Translation ManagementSDL
 
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...SDL
 
Fashion Days with Howard Beader & Andreas Meier at Forrester #CXNYC 2015
Fashion Days with Howard Beader & Andreas Meier at Forrester #CXNYC 2015Fashion Days with Howard Beader & Andreas Meier at Forrester #CXNYC 2015
Fashion Days with Howard Beader & Andreas Meier at Forrester #CXNYC 2015SDL
 

Más de SDL (20)

Architecting Your Global Digital Experience House - Nicole Uhlig and Derek Pa...
Architecting Your Global Digital Experience House - Nicole Uhlig and Derek Pa...Architecting Your Global Digital Experience House - Nicole Uhlig and Derek Pa...
Architecting Your Global Digital Experience House - Nicole Uhlig and Derek Pa...
 
The Marketer's Dilemma in Today's Global Digital Era - Liesl Leary and Henry ...
The Marketer's Dilemma in Today's Global Digital Era - Liesl Leary and Henry ...The Marketer's Dilemma in Today's Global Digital Era - Liesl Leary and Henry ...
The Marketer's Dilemma in Today's Global Digital Era - Liesl Leary and Henry ...
 
SDL Vision for Digital Experience - Arjen van den Akker at SDL Connect 16
SDL Vision for Digital Experience - Arjen van den Akker at SDL Connect 16SDL Vision for Digital Experience - Arjen van den Akker at SDL Connect 16
SDL Vision for Digital Experience - Arjen van den Akker at SDL Connect 16
 
The Challenge and Opportunity of Website Globalization - Joost Comperen and M...
The Challenge and Opportunity of Website Globalization - Joost Comperen and M...The Challenge and Opportunity of Website Globalization - Joost Comperen and M...
The Challenge and Opportunity of Website Globalization - Joost Comperen and M...
 
SDL's Vision for Globalization - Maxwell Hoffman at SDL Connect 16
SDL's Vision for Globalization - Maxwell Hoffman at SDL Connect 16SDL's Vision for Globalization - Maxwell Hoffman at SDL Connect 16
SDL's Vision for Globalization - Maxwell Hoffman at SDL Connect 16
 
Applying Machine Learning - Abdessamad Echihabi at SDL Connect 16
Applying Machine Learning - Abdessamad Echihabi at SDL Connect 16Applying Machine Learning - Abdessamad Echihabi at SDL Connect 16
Applying Machine Learning - Abdessamad Echihabi at SDL Connect 16
 
Beyond Globalization: Achieving Universal Understanding - Adolfo Hernandez at...
Beyond Globalization: Achieving Universal Understanding - Adolfo Hernandez at...Beyond Globalization: Achieving Universal Understanding - Adolfo Hernandez at...
Beyond Globalization: Achieving Universal Understanding - Adolfo Hernandez at...
 
Panel: Translation Quality Challenges
Panel: Translation Quality ChallengesPanel: Translation Quality Challenges
Panel: Translation Quality Challenges
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...
 
Convergence: How to Bring Together Content Management & Localization to Conq...
Convergence: How to Bring Together Content Management & Localization to Conq...Convergence: How to Bring Together Content Management & Localization to Conq...
Convergence: How to Bring Together Content Management & Localization to Conq...
 
iMT Language Solutions
iMT Language SolutionsiMT Language Solutions
iMT Language Solutions
 
SDL Knowledge Center: Advanced Techniques for Rapid Global Content Creation
SDL Knowledge Center:  Advanced Techniques for Rapid Global Content CreationSDL Knowledge Center:  Advanced Techniques for Rapid Global Content Creation
SDL Knowledge Center: Advanced Techniques for Rapid Global Content Creation
 
Multilingual Device & L10n Testing - An Introduction to the SDL Test Lab
Multilingual Device & L10n Testing - An Introduction to the SDL Test LabMultilingual Device & L10n Testing - An Introduction to the SDL Test Lab
Multilingual Device & L10n Testing - An Introduction to the SDL Test Lab
 
Terminology Management Best Practices
Terminology Management Best PracticesTerminology Management Best Practices
Terminology Management Best Practices
 
How to Extend Your Content Marketing Plan to a Global Audience
How to Extend Your Content Marketing Plan to a Global AudienceHow to Extend Your Content Marketing Plan to a Global Audience
How to Extend Your Content Marketing Plan to a Global Audience
 
An Arabizi-English Social Media Statistical Machine Translation System
An Arabizi-English Social Media Statistical Machine Translation SystemAn Arabizi-English Social Media Statistical Machine Translation System
An Arabizi-English Social Media Statistical Machine Translation System
 
Redefine Your Global Video Strategy: Video Localization
Redefine Your Global Video Strategy: Video LocalizationRedefine Your Global Video Strategy: Video Localization
Redefine Your Global Video Strategy: Video Localization
 
The Case for Enterprise Translation Management
The Case for Enterprise Translation ManagementThe Case for Enterprise Translation Management
The Case for Enterprise Translation Management
 
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
 
Fashion Days with Howard Beader & Andreas Meier at Forrester #CXNYC 2015
Fashion Days with Howard Beader & Andreas Meier at Forrester #CXNYC 2015Fashion Days with Howard Beader & Andreas Meier at Forrester #CXNYC 2015
Fashion Days with Howard Beader & Andreas Meier at Forrester #CXNYC 2015
 

Último

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Último (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Fast and Accurate Preordering for SMT using Neural Networks

  • 1. Fast and Accurate Preordering for SMT using Neural Networks Adri`a de Gispert Gonzalo Iglesias Bill Byrne SDL Research East Road, Cambridge CB1 1BH, U.K. {agispert|giglesias|bbyrne}@sdl.com Abstract We propose the use of neural networks to model source-side preordering for faster and better statistical machine translation. The neu- ral network trains a logistic regression model to predict whether two sibling nodes of the source-side parse tree should be swapped in order to obtain a more monotonic parallel corpus, based on samples extracted from the word-aligned parallel corpus. For multiple language pairs and domains, we show that this yields the best reordering performance against other state-of-the-art techniques, resulting in improved translation quality and very fast de- coding. 1 Introduction Preordering is a pre-processing task in translation that aims to reorder the source sentence so that it best resembles the order of the target sentence. If done correctly, it has a doubly beneficial effect: it allows a better estimation of word alignment and translation models which results in higher transla- tion quality for distant language pairs, and it speeds up decoding enormously as less word movement is required. Preordering schemes can be automatically learnt from source-side parsed, word-aligned parallel cor- pora. Recently Jehl et al (2014) described a scheme based on a feature-rich logistic regression model that predicts whether a pair of sibling nodes in the source-side dependency tree need to be permuted. Based on the node-pair swapping probability pre- dictions of this model, a branch-and-bound search returns the best ordering of nodes in the tree. We propose using a neural network (NN) to es- timate this node-swapping probability. We find that this straightforward change to their scheme has mul- tiple advantages: 1. The superior modeling capabilities of NNs achieve better performance at preordering and overall translation quality when using the same set of features. 2. There is no need to manually define which feature combinations are to be considered in training. 3. Preordering is even faster as a result of the previ- ous point. Our results in translating from English to Japanese, Korean, Chinese, Arabic and Hindi support these findings by comparing against two other preordering schemes. 1.1 Related Work There is a strong research and commercial in- terest in preordering, as reflected by the exten- sive previous work on the subject (Collins et al., 2005; Xu et al., 2009; DeNero and Uszkor- eit, 2011; Neubig et al., 2012). We are inter- ested in practical, language-independent preorder- ing approaches that rely only on automatic source- language parsers (Genzel, 2010). The most recent work in this area uses large-scale feature-rich dis- riminative models, effectively treating preordering either as a learning to rank (Yang et al., 2012), multi- classification (Lerner and Petrov, 2013) or logistic regression (Jehl et al., 2014) problem. In this paper we incorporate NNs into the latter approach. Lately an increasing body of work that uses NNs for various NLP tasks has been published, includ- ing language modeling (Bengio et al., 2003), POS
  • 2. tagging (Collobert et al., 2011), or dependency pars- ing (Chen and Manning, 2014). In translation, NNs have been used for improved word alignment (Yang et al., 2013; Tamura et al., 2014; Songyot and Chi- ang, 2014), to model reordering under an ITG gram- mar (Li et al., 2013), and to define additional feature functions to be used in decoding (Sundermeyer et al., 2014; Devlin et al., 2014). End-to-end transla- tion systems based on NNs have also been proposed (Kalchbrenner and Blunsom, 2013; Sutskever et al., 2014; Bahdanau et al., 2014). Despite the gains reported, only the approaches that do not dramatically affect decoding times can be directly applied to today’s commercial SMT sys- tems. Our paper is a step towards this direction, and to the best of our knowledge, it is the first one to describe the usage of NNs in preordering for SMT. 2 Preordering as node-pair swapping Jehl et al (2014) describe a preordering scheme based on a logistic regression model that predicts whether a pair of sibling nodes in the source-side de- pendency tree need to be permuted in order to have a more monotonically-aligned parallel corpus. Their method can be briefly summarised by the pseudo- code of Figure 1. Let N be the set of nodes in the source tree, and let Cn be the set of children nodes of node n. For each node with at least two children, first extract the node features (lines 1-2). Then, for each pair of its chil- dren nodes: extract their respective features (lines 4-5), produce all relevant feature combinations (line 6), and store the node-pair swapping probability pre- dicted by a logistic regression model based on all available features (line 7). Once all pair-wise proba- bilities are stored, search for the best global permu- tation and sort Cn accordingly (lines 9-10). As features, Jehl et al (2014) use POS tags and de- pendency labels, as well as the identity and class of the head word (for the parent node) or the left/right- most word (for children nodes). These are combined into conjunctions of 2 or 3 features to create new features. For logistic regression, they train a L1- regularised linear model using LIBLINEAR (Fan et al., 2008). The training samples are either pos- itive/negative depending on whether swapping the nodes reduces/increases the number of crossed links PREORDERPARSETREE 1 for each node n ∈ N, |Cn| > 1 2 F ← GETFEATURES(n) 3 for each pair of nodes i, j ∈ Cn, i = j 4 F ← F ∪ GETFEATURES(i) 5 F ← F ∪ GETFEATURES(j) 6 Fc ← FEATURECOMBINATIONS(F) 7 pn(i, j) = LOGREGPREDICT(F, Fc) 8 end for 9 πn ← SEARCHPERMUTATION(pn) 10 SORT(Cn, πn) Figure 1: Pseudocode for the preordering scheme of Jehl et al (2014) in the aligned parallel corpus. 2.1 Applying Neural Networks “A (feedforward) neural network is a series of logis- tic regression models stacked on top of each other, with the final layer being either another logistic re- gression model or a linear regression model” (Mur- phy, 2012). Given this, we propose a straightforward alterna- tive to the above framework: replace the linear logis- tic regression model by a neural network (NN). This way a superior modeling performance of the node- swapping phenomenon is to be expected. Addition- ally, feature combination need not be engineered anymore because that is learnt by the NN in train- ing (line 6 in Figure 1 is skipped). Training the neural network requires the same la- beled samples that were used by Jehl et al (2014). We use the NPLM toolkit out-of-the-box (Vaswani et al., 2013). The architecture is a feed-forward neu- ral network (Bengio et al., 2003) with four layers. The first layer i contains the input embeddings. The next two hidden layers (h1, h2) use rectified linear units; the last one is the softmax layer (o). We did not experiment with deeper NNs. For our purposes, the input vocabulary of the NN is the set of all possible feature indicator names that are used for preordering1. There are no OOVs. Given the sequence of ∼ 20 features seen by the 1 Using a vocabulary of the 5K top-frequency English words, 50 word classes, approximately 40 POS tags and 50 depen- dency labels, the largest input vocabulary in our experiments is roughly 30,000.
  • 3. preorderer, the NN is trained to predict whether the nodes should be reordered or not, i.e. |o| = 2. For the rest of the layers, we use |i| = 50, |h1| = 100, |h2| = 50. We set the learning rate to 1, the mini- batch size to 64 and the number of epochs to 20. 3 Experiments 3.1 Data and setup We report translation results in English into Japanese, Korean, Chinese, Arabic and Hindi. For each language pair, we use generic parallel data ex- tracted from the web. The number of words is about 100M for Japanese and Korean, 300M for Chinese, 200M for Arabic and 9M for Hindi. We use two types of dev/test sets: in-domain and mix-domain. The in-domain sets have 2K sentences each and were created by randomly extracting par- allel sentences from the corpus, ensuring no repe- titions remained. The mix-domain sets have about 1K sentences each and were created to evenly rep- resent 10 different domains, including world news, chat/SMS, health, sport, science, business, and oth- ers. Additionally, we report results on the English-to- Hindi WMT 2014 shared task (Bojar et al., 2014a) using the data provided2. The dev and test sets have 520 and 2507 sentences each. All dev and test sets have one single reference. We use SVM- Tool (Gim´enez and M`arquez, 2004) for POS Tag- ging, and MaltParser (Nivre et al., 2007) for depen- dency parsing. 3.2 Intrinsic reordering evaluation We evaluate the intrinsic preordering task on a ran- dom 5K-sentence subset of the training data which is excluded from model estimation. We report the nor- malized crossing score c/s, where c is the number of crossing links (Genzel, 2010; Yang et al., 2012) in the aligned parallel corpus, and s is the number of source (e.g. English) words. Ideally we would like this metric to be zero, meaning a completely monotonic parallel corpus3; the more monotonic the 2 HindEndCorp v0.5 (Bojar et al., 2014b) 3 However this may not be achievable given the alignment links and parse tree available. In this approach, only permuta- tions of sibling nodes in the single source parse tree are permit- ted. Figure 2: Normalized crossing score for English into Japanese, Korean, Hindi, Chinese, Arabic, Spanish and Portuguese. corpus, the better the translation models will be and the faster decoding will run as less distortion will be needed. Normalizing over the number of source words allows us to compare this metric across language pairs, and so the potential impact of preordering in translation performance becomes apparent. See Fig- ure 2 for results across several language pairs. In all cases our proposed NN-based preorderer achieves the lowest normalized crossing score among all pre- ordering schemes. 3.3 Translation performance For translation experiments, we use a phrase-based decoder that incorporates a set of standard features and a hierarchical reordering model (Galley and Manning, 2008). The decoder stack size is set to 1000. Weights are tuned using MERT to optimize BLEU on the dev set. In English-to-Japanese and Chinese we use character-BLEU instead. To min- imise optimization noise, we tune all our systems from flat parameters three times and report average BLEU score and standard deviation on the test set. Table 1 contrasts the performance obtained by the system when using no preordering capabilities (baseline), and when using three alternative pre- ordering schemes: the rule-based approach of Gen- zel (2010), the linear-model logistic-regression ap- proach of Jehl et al (2014) and our NN-based pre- orderer. We report two baselines: one with distortion limit d = 10 and another with d = 3. For systems
  • 4. d system speed eng-jpn eng-kor ratio in mixed in mixed 10 baseline 1x 54.5 ±0.2 26.2 ±0.2 33.5 ±0.3 9.7 ±0.2 3 baseline 3.2x 50.9 ±0.2 25.0 ±0.2 28.7 ±0.1 8.2 ±0.1 3 Genzel (2010) 2.7x 54.0 ±0.1 26.4 ±0.2 30.5 ±0.2 9.8 ±0.2 3 Jehl et al (2014) 2.3x 55.0 ±0.1 26.9 ±0.2 33.1 ±0.1 10.4 ±0.1 3 this work 2.7x 55.6 ±0.2 27.2 ±0.1 33.4 ±0.1 10.6 ±0.2 d system eng-chi eng-ara eng-hin in mixed in mixed mixed wmt14 10 baseline 46.9 ±0.5 18.4 ±0.6 25.1 ±0.1 22.7 ±0.2 10.1 ±0.3 11.7 ±0.1 3 baseline 44.8 ±0.7 18.3 ±0.4 24.6 ±0.1 21.9 ±0.2 8.3 ±0.2 9.3 ±0.3 3 Genzel (2010) 45.4 ±0.2 17.9 ±0.2 24.8 ±0.1 21.6 ±0.3 9.6 ±0.2 11.4 ±0.3 3 Jehl et al (2014) 45.8 ±0.1 18.5 ±0.3 25.1 ±0.2 22.4 ±0.2 10.0 ±0.1 12.7 ±0.3 3 this work 46.5 ±0.4 19.2 ±0.2 25.5 ±0.2 22.6 ±0.1 10.6 ±0.1 12.6 ±0.3 best WMT14 constrained system 11.1 Table 1: Translation performance for various language pairs using no preordering (baseline), and three alternative preordering systems. Average test BLEU score and standard deviation across 3 independent tuning runs. Speed ratio is calculated with respect to the speed of the slower baseline that uses a d = 10. Stack size is 1000. For eng-jpn and eng-chi, character-based BLEU is used. with preordering we only report d = 3, as increasing d does not improve performance. As shown, our preorderer obtains the best BLEU scores for all reported languages and domains, prov- ing that the neural network is modeling the depen- dency tree node-swapping phenomenon more accu- rately, and that the reductions in crossing score re- ported in the previous section have a positive impact in the final translation performance. The bottom right-most column reports results on the WMT 2014 English-to-Hindi task. Our system achieves better results than the best score reported for this task4. In this case, the two logistic regression preorderers perform similarly, as standard deviation is higher, possibly due to the small size of the dev set. 3.4 Decoding Speed The main purpose of preordering is to find a better translation performance in fast decoding conditions. In other words, by preordering the text we expect to be able to decode with less distortion or phrase re- ordering, resulting in faster decoding. This is shown in Table 1, which reports the speed ratio between each system and the speed of the top-most baseline, 4 Details at matrix.statmt.org/matrix/systems list/1749 as measured in English-to-Japanese in-domain5. We find that decoding with a d = 3 is about 3 times faster than for d = 10. We now take this further by reducing the stack size from 1000 to 50; see results in Table 2. As ex- pected, all systems accelerate with respect to our ini- tial baseline. However, this usually comes at a cost in BLEU with respect to using a wider beam, unless preordering is used. In fact, the logistic regression preorderers achieve the same performance while de- coding over 60 times faster than the baseline. Interestingly, the NN-based preorderer turns out to be slightly faster than any of the other preordering approaches. This is because there is no need to ex- plicitly create thousands of feature combinations for each node pair; simply performing the forward ma- trix multiplications given the input sequence of ∼20 features is more efficient. Similar observations have been noted recently in the context of dependency parsing with NNs (Chen and Manning, 2014). Note also that, on average, only 25 pair-wise probabili- ties are queried to the logistic regression model per source sentence. Overall, we incorporate the bene- fits of neural networks for preordering at no compu- 5 Similar speed ratios were observed for other language pairs (not reported here for space)
  • 5. d system speed eng-jpn eng-kor eng-chi w/ stack=50 ratio in mixed in mixed mixed 10 baseline 22x 53.6 (-0.9) 25.4 (-0.8) 32.8 (-0.7) 9.3 (-0.4) 17.9 (-0.5) 3 baseline 66x 50.5 (-0.4) 24.8 (-0.2) 28.8 (+0.1) 8.1 (-0.1) 18.0 (-0.3) 3 Genzel (2010) 64x 53.8 (-0.2) 26.3 (-0.1) 30.4 (-0.1) 9.8 (0.0) 18.1 (+0.2) 3 Jehl et al (2014) 61x 55.0 (0.0) 26.5 (-0.4) 33.0 (-0.1) 10.4 (0.0) 18.3 (-0.2) 3 this work 65x 55.7 (+0.1) 27.2 (0.0) 33.2 (-0.2) 10.8 (+0.2) 19.1 (-0.1) Table 2: Translation performance for maximum stack size of 50. The figures in parentheses indicate the difference in BLEU scores due to using a smaller stack size, that is, compared to the same systems in Table 1. Speed ratio is calculated with respect to the speed of the slower baseline that uses a stack of 1000, eg. the first row in Table 1. tational cost. Currently, our preorderer takes 6.3% of the to- tal decoding time (including 2.6% for parsing and 3.7% for actually preordering). We believe that fur- ther improvements in preordering will result in more translation gains and faster decoding, as the distor- tion limit is lowered. 4 Conclusions To the best of our knowledge, this is the first paper to describe the usage of NNs in preordering for SMT. We show that simply replacing the logistic regres- sion node-swapping model with an NN model im- proves both crossing scores and translation perfor- mance across various language pairs. Feature com- bination engineering is avoided, which also results in even faster decoding times. References Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- gio. 2014. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473. Yoshua Bengio, Rjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic lan- guage model. Machine Learning Research, 3:1137– 1155. Ondrej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint- Amand, Radu Soricut, Lucia Specia, and Aleˇs Tam- chyna. 2014a. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 12–58. Ondˇrej Bojar, Vojtˇech Diatka, Pavel Rychl´y, Pavel Straˇn´ak, V´ıt Suchomel, Aleˇs Tamchyna, and Daniel Zeman. 2014b. HindEnCorp - Hindi-English and Hindi-only Corpus for Machine Translation. In Pro- ceedings of LREC, pages 3550–3555. Danqi Chen and Christopher Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of EMNLP, pages 740–750. Michael Collins, Philipp Koehn, and Ivona Kucerova. 2005. Clause Restructuring for Statistical Machine Translation. In Proceedings of ACL, pages 531–540. Ronan Collobert, Jason Weston, L´eon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12:2493– 2537. John DeNero and Jakob Uszkoreit. 2011. Inducing Sen- tence Structure from Parallel Corpora for Reordering. In Proceedings of EMNLP, pages 193–203. Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard Schwartz, and John Makhoul. 2014. Fast and robust neural network joint models for sta- tistical machine translation. In Proceedings of ACL, pages 1370–1380. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A Li- brary for Large Linear Classification. Journal of Ma- chine Learning Research, 9:1871–1874. Michel Galley and Christopher D. Manning. 2008. A Simple and Effective Hierarchical Phrase Reordering Model. In Proceedings of EMNLP, pages 847–855. Dmitriy Genzel. 2010. Automatically learning source- side reordering rules for large scale machine transla- tion. In Proceedings of COLING, pages 376–384. Jes´us Gim´enez and Llu´ıs M`arquez. 2004. SVMTool: A general POS tagger generator based on Support Vector Machines. In Proceedings of LREC, pages 43–46. Laura Jehl, Adri`a de Gispert, Mark Hopkins, and Bill Byrne. 2014. Source-side preordering for translation using logistic regression and depth-first branch-and- bound search. In Proceedings of EACL, pages 239– 248.
  • 6. Nal Kalchbrenner and Phil Blunsom. 2013. Recur- rent continuous translation models. In Proceedings of EMNLP, pages 1700–1709. Uri Lerner and Slav Petrov. 2013. Source-Side Classifier Preordering for Machine Translation. In Proceedings of EMNLP, pages 513–523. Peng Li, Yang Liu, and Maosong Sun. 2013. Recursive autoencoders for ITG-based translation. In Proceed- ings of EMNLP, pages 567–577. Kevin P. Murphy. 2012. Machine Learning: A Proba- bilistic Perspective. MIT Press, Cambridge, MA. Graham Neubig, Taro Watanabe, and Shinsuke Mori. 2012. Inducing a Discriminative Parser to Optimize Machine Translation Reordering. In Proceedings of EMNLP-CoNLL, pages 843–853. Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev, Glsen Eryigit, Sandra K¨ubler, Svetoslav Marinov, and Erwin Marsi. 2007. Maltparser: A language- independent system for data-driven dependency pars- ing. Natural Language Engineering, 13(2):95–135. Theerawat Songyot and David Chiang. 2014. Improving word alignment using word similarity. In Proceedings of EMNLP, pages 1840–1845. Martin Sundermeyer, Tamer Alkhouli, Joern Wuebker, and Hermann Ney. 2014. Translation modeling with bidirectional recurrent neural networks. In Proceed- ings of EMNLP, pages 14–25. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. CoRR, abs/1409.3215. Akihiro Tamura, Taro Watanabe, and Eiichiro Sumita. 2014. Recurrent neural networks for word alignment model. In Proceedings of ACL, pages 1470–1480. Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang. 2013. Decoding with large-scale neu- ral language models improves translation. In Proceed- ings of EMNLP, pages 1387–1392. Peng Xu, Jaeho Kang, Michael Ringgaard, and Franz Och. 2009. Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages. In Pro- ceedings of HTL-NAACL, pages 245–253. Nan Yang, Mu Li, Dongdong Zhang, and Nenghai Yu. 2012. A ranking-based approach to word reordering for statistical machine translation. In Proceedings of ACL, pages 912–920. Nan Yang, Shujie Liu, Mu Li, Ming Zhou, and Nenghai Yu. 2013. Word alignment modeling with context de- pendent deep neural network. In Proceedings of ACL, pages 166–175.