Table 2 基于混合权重的主路径

节点id节点类型节点颜色中文概括知识单元
J90-2002&J93-1006研究方法/流程Purple统计翻译方法takes a large corpus of text with aligned translations statistical approach to translationA Statistical Approach to Machine Translation
C90-3044&J93-1006研究方法/流程Purple提出统计方法The method proposed there requires a database to be maintained of the syntactic structures of sentences together with the structures of the corresponding translations.This paper proposes a method to solve this problem.
J93-1006&J93-2003研究方法/流程Purple句子对齐automatically obtaining pairs of aligned sentences from parallel corporaText-Translation Alignment
J93-2003&J99-4005数据集Green源-信道模型参数估计built a source-channel model of translation between English and FrenchThe Mathematics of Statistical Machine Translation: Parameter Estimation
J99-4005&10.1162_089120103321337458模型Red解码问题NP-Completedecoding problem for the IBM-4 translation model is provably NP-completeDecoding complexity in word-replacement translation models this problem is NP-complete
10.1162_089120103321337458&10.1162_0891201042544884研究方法/流程Purple词重排与波束搜索similar algorithmWord reordering and a dynamic programming beam search algorithm for statistical machine translation
10.1162_0891201042544884&J07-2003模型Red对齐模板方法alignment template translation model phrase-based systemsA phrase-based statistical machine translation approach-the alignment template approach-is described. The alignment template approach to statistical machine translation
J07-2003&P08-1066评估方法PinkHiero分层短语模型Hiero decoder does not require explicit syntactic representation Hiero system achieved about 1 to 3 point improvement in BLEU similar to similar to the Hiero systemHierarchical Phrase-Based Translation
P08-1066&E09-1044模型Red目标端依赖语言模型target dependency trees target dependency language model during decodingA New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
E09-1044&W11-2211研究方法/流程Purple浅层规则剪枝using shallow rulesreductions in the size of the rule sets used in translation modifications to hypothesis expansion in cube pruning
W11-2211&10.1186_1471-2105-14-146研究方法/流程Purple利用自动翻译语料using a corpus of automatic translations in addition to human translations provides a (small) improvement of translation qualityWe employ bitexts that have been built by automatically translating large amounts of monolingual data as additional parallel training corpora Our results show that integrating a second translation model with only non-hierarchical phrases extracted from the automatically generated bitexts is a reasonable approach
10.1186_1471-2105-14-146&10.1016_j.artmed.2014.01.004研究方法/流程Purple构建MEDLINE平行语料propose a method for obtaining in-domain parallel corpora from titles and abstracts of publications in the MEDLINEdatabaseCombining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text
10.1016_j.artmed.2014.01.004&W14-3328研究方法/流程Purple利用Web资源选数据use of web resources to complement the training resourcesin-domain training and tuning intelligent training data selection
W14-3328&W14-3302研究方法/流程PurpleWMT14医疗翻译任务WMT14 Medical Translation Taskattend the medical summary sentence unconstrained translation task of the Ninth Workshop on Statistical Machine Translation (WMT2014)
W14-3302&Q15-1013数据集GreenWMT 2014评测WMT 2014 shared translation taskFindings of the 2014 Workshop on Statistical Machine Translation
Q15-1013&D15-1248模型Red关系依赖语言模型dependency language model (RDLM) relational dependency language model We propose a language model for dependency structures that is relational rather than configurational as a feature function in string-to-tree SMT from English to German and Russian
D15-1248&000493806800162模型Red联合形态句法模型syntaxbased SMT resultsimprovements in translation quality of 1.4-1.8 BLEU We propose to model syntactic and morphological structure jointly in a dependency translation model A Joint Dependency Model of Morphological and Syntactic Structure for Statistical Machine Translation
000493806800162&000493806800009工具或软件库BlueBPE子词分割represent unseen words as sequences of subword units follow The network vocabulary size is 90 000 represent rare words via BPE settings and training procedure described by represent rare words (or morphemes in the case of Turkish) as character bigram sequences Neural Machine Translation of Rare Words with Subword Units segmentation based on the byte pair encoding compression algorithm encoding rare and unknown words as sequences of subword units simple character n-gram models
000493806800009&D16-1162评估方法PinkBLEU分数提升gains in BLEU gains made by the proposed method state-of-the-art performance on several language pairswe obtain substantial improvements on the WMT 15 task English <-> German (+2.8-3.7 BLEU) obtaining new state-of-the-art results
D16-1162&000485630703053研究方法/流程Purple低频词翻译错误NMT is prone to generate words that seem to be natural in the target sentence, but do not reflect the original meaning of the source sentence.Neural machine translation (NMT) often makes mistakes in translating low-frequency content words that are essential to understanding the meaning of the sentence.
000485630703053&000493984800064背景信息/通用知识Brown中译英性能提升Chinese-to-English translationExperimental results on Chinese-English translation show that the proposed approach achieves significant and consistent improvements over state-of-the-art NMT and SMT systems on multiple NIST test sets.
000493984800064&000485488905004研究方法/流程Purple建模源端句法syntax informationModeling Source Syntax for Neural Machine Translation
000485488905004&000386658300285数据集Green多领域平行语料库discourse parallel corpus 500 thousand Portuguese-English sentence pairs in various domains such as news, laws, microblog etc. 2 million Chinese-EnglishExperimental results on both Chinese English
000386658300285&W11-2132研究方法/流程PurpleSMT领域自适应approachDomain Adaptation for Statistical Machine Translation
W11-2132&E12-1014研究方法/流程Purple利用单语数据use an existing SMT system to discover parallel sentences within independent monolingual texts, and use them to re-train and enhance the systemInvestigations on Translation Model Adaptation Using Monolingual Data
E12-1014&W13-2233数据集Green利用可比语料use comparable corpora to score an existing Spanish-English phrase table extracted from the Europarl corpus compute both phrasal features and lexically smoothed features (using word alignments, like the Moses lexical translation probabilities) we use time-stamped web crawls as well as interlingually linked Wikipedia documents the techniques proposed by and We extend existing research on bilingual lexicon induction to estimate both lexical and phrasal translation probabilities for MT-scale phrasetables we examine an idealization where a phrase-table is given We propose a novel algorithm to estimate reordering probabilities from monolingual data Toward Statistical Machine Translation without Parallel Corpora
W13-2233&D14-1061研究方法/流程Purple低资源词典构建build a translation lexicon from non parallel or comparable dataCombining Bilingual and Comparable Corpora for Low Resource Machine Translation
D14-1061&000493806800185研究方法/流程Purple联合对齐与破译decipherment problemBeyond Parallel Data: Joint Word Alignment and Decipherment Improves Machine Translation
000493806800185&D16-1160研究方法/流程Purple半监督学习propose a similar semi-supervised framework similar semi-supervised reconstruction methodSemi-Supervised Learning for Neural Machine Translation
D16-1160&D16-1249研究方法/流程Purple利用源端单语数据achieve better results monolingual corporaExploiting Source-side Monolingual Data in Neural Machine Translation
D16-1249&D16-1096研究方法/流程Purple监督注意力机制supervised alignmentsSupervised Attentions for Neural Machine Translation
D16-1096&000493984800177模型Red覆盖率模型coverage modelCoverage Embedding Models for Neural Machine Translation
000493984800177&D17-1012模型RedNMT引入句法use of syntactic structures in NMT modelsImproved Neural Machine Translation with a Syntax-Aware Encoder and Decoder
D17-1012&000493992300012模型Red共享负采样shared the negative samples of each target word in a sentence in training time
000493992300012&W17-5708模型Red融合RNNG到NMTor the decoder modifying the encodercombining the recurrent neural network grammar into the attention-based neural machine translation
W17-5708&N18-2084研究方法/流程Purple预训练词嵌入初始化pre-training is properly integrated into the NMT system using pre-trained embeddings in NMT pre-trained word embeddings have been used either in standard translation systems pre-training is usefulA Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size
N18-2084&000865723400103数据集Green低资源翻译挑战multilingual training for low-resource translationThe performance of Neural Machine Translation (NMT) systems often suffers in lowresource scenarios where sufficiently largescale parallel corpora cannot be obtained.
000865723400103&000900116903092评估方法PinkNMT新语言快速适应same four languages as -Azerbeijani (Az), Belarusian (Be), Galician (Gl) and Slovak (Sk) trained many-to-one models from 58 languages into English use tokenized BLEU in order to be comparable with recently published results on this dataset best fine-tuned many-to-one models of Rapid Adaptation of NMT to New Languages by training multilingual models the 59-language TED Talks corpus best many-to-one models in massively multilingual many-to-many models with up to 58 languages to-and-from Englishimproves over other adaptation methods by 1.7 BLEU points average over 4 LRL settings jointly train on both a LRL of interest and a similar high-resourced language to prevent over-fitting Rapid Adaptation of Neural Machine Translation to New Languages massively multilingual models, even without any explicit adaptation, are surprisingly effective massively multilingual seed models achieving BLEU scores of up to 15.5
000900116903092&000736531900047模型Red大规模多对多模型multilingual translation jointly train one translation model that translates multiple language directions at the same time shares representations to improve the translation performance on low-resource languagesmassively multilingual many-to-many models are effective in low resource settings Massively Multilingual Neural Machine Translation training a single model that supports translation from multiple source languages into multiple target languages
000736531900047&001181866502027模型RedmBART降噪预训练mBART fine-tunes with traditional on-the-fly backtranslation pre-trains on a variety of language configurationsMultilingual Denoising Pre-training for Neural Machine Translation
001181866502027&000570978201083研究方法/流程Purple统一概率框架generative modelingour approach results in higher BLEU scores over state-of-the-art unsupervised models We present a probabilistic framework for multilingual neural machine translation that encompasses supervised and unsupervised setups, focusing on unsupervised translation
000570978201083&000663162000001数据集Green在线回译online backtranslation to improve the performance of non-English pairs multilingual translation models factorize computation when translating to many languages and share information between similar languages English-Centric multilingual model from on the OPUS100 corpus Using backtranslation thus requires the ability to translate in both directions Mining data for each and every language pair is prohibitive - previous work circumvents this issue by focusing only on the 99 pairs that go through English train on 100 directions increasing model capacityExperiments on OPUS-100 (a novel multilingual dataset with 100 languages) Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation stronger modeling capacity OPUS-100 (a novel multilingual dataset with 100 languages) propose random online backtranslation
000663162000001&000923421200003评估方法Pink非英语中心多语言翻译multilingual machine translationBeyond English-Centric Multilingual Machine Translation
000923421200003&2023.acl-long.524数据集GreenFLORES-101评测基准FLORES-101 evaluation set FLORES-101 evaluation dataThe Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
2023.acl-long.524&2023.acl-long.859研究方法/流程Purple偶发双语主义present in the training data as a marker for multilingual content attributing LLM MT capabilities to the presence of incidental bilingual examplesSearching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM’s Translation Capability PaLM is exposed to over 30 million translation pairs across at least 44 languages incidental bilingualism—the unintentional consumption of bilingual signals, including translation examples
2023.acl-long.859&001371932502043评估方法Pink大模型提示(Prompting)large-scale models with parameters in the hundreds of billionsPrompting PaLM for Translation: Assessing Strategies and Performance
001371932502043&001156229800010模型RedLLM翻译能力与错误分析LLMs can produce fluent and adequate translations, especially for high-resource English-centric language pairs, that are competitive with those of dedicated supervised translation models translation errors, even severely critical ones, obtained via prompting a LLM are different from those produced by traditional machine translation modelsa transformer decoder-only model trained solely with self-supervised learning is able to match specialized supervised state-of-the-art models
001156229800010&2023.emnlp-main.733研究方法/流程PurpleLLM幻觉研究apply the minimal perturbations of , including misspelling and title-casing words, and inserting frequent tokens at the beginning of the source sentenceHallucinations in Large Multilingual Translation Models
2023.emnlp-main.733&2024.emnlp-main.802研究方法/流程PurpleMBR解码与重排序reference-based and QE metrics to rerank multiple hypotheses generated by dedicated MT models or large language models (LLMs), aiming to improve translation qualitythe strategy to produce the final translation (instruction-based, quality-based reranking, and minimum Bayes risk (MBR) decoding) Our results show that MBR decoding is a very effective method
2024.emnlp-main.802&2024.emnlp-main.1152评估方法Pink评估指标与人类判断examining their correlation with human judgments, as well as their Precision, Recall, and F -score evaluated MT metrics’ ability to assess high-quality translationsCan Automatic Metrics Assess High-Quality Translations?
2024.emnlp-main.1152&2024.emnlp-main.914背景信息/通用知识Brown评估指标的可解释性not easy to interpretBeyond Correlation: Interpretable Evaluation of Machine Translation Metrics