| J90-2002&J93-1006 | 研究方法/流程 | Purple | 统计翻译方法 | takes a large corpus of text with aligned translations statistical approach to translationA Statistical Approach to Machine Translation |
| C90-3044&J93-1006 | 研究方法/流程 | Purple | 提出统计方法 | The method proposed there requires a database to be maintained of the syntactic structures of sentences together with the structures of the corresponding translations.This paper proposes a method to solve this problem. |
| J93-1006&J93-2003 | 研究方法/流程 | Purple | 句子对齐 | automatically obtaining pairs of aligned sentences from parallel corporaText-Translation Alignment |
| J93-2003&J99-4005 | 数据集 | Green | 源-信道模型参数估计 | built a source-channel model of translation between English and FrenchThe Mathematics of Statistical Machine Translation: Parameter Estimation |
| J99-4005&10.1162_089120103321337458 | 模型 | Red | 解码问题NP-Complete | decoding problem for the IBM-4 translation model is provably NP-completeDecoding complexity in word-replacement translation models this problem is NP-complete |
| 10.1162_089120103321337458&10.1162_0891201042544884 | 研究方法/流程 | Purple | 词重排与波束搜索 | similar algorithmWord reordering and a dynamic programming beam search algorithm for statistical machine translation |
| 10.1162_0891201042544884&J07-2003 | 模型 | Red | 对齐模板方法 | alignment template translation model phrase-based systemsA phrase-based statistical machine translation approach-the alignment template approach-is described. The alignment template approach to statistical machine translation |
| J07-2003&P08-1066 | 评估方法 | Pink | Hiero分层短语模型 | Hiero decoder does not require explicit syntactic representation Hiero system achieved about 1 to 3 point improvement in BLEU similar to similar to the Hiero systemHierarchical Phrase-Based Translation |
| P08-1066&E09-1044 | 模型 | Red | 目标端依赖语言模型 | target dependency trees target dependency language model during decodingA New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model |
| E09-1044&W11-2211 | 研究方法/流程 | Purple | 浅层规则剪枝 | using shallow rulesreductions in the size of the rule sets used in translation modifications to hypothesis expansion in cube pruning |
| W11-2211&10.1186_1471-2105-14-146 | 研究方法/流程 | Purple | 利用自动翻译语料 | using a corpus of automatic translations in addition to human translations provides a (small) improvement of translation qualityWe employ bitexts that have been built by automatically translating large amounts of monolingual data as additional parallel training corpora Our results show that integrating a second translation model with only non-hierarchical phrases extracted from the automatically generated bitexts is a reasonable approach |
| 10.1186_1471-2105-14-146&10.1016_j.artmed.2014.01.004 | 研究方法/流程 | Purple | 构建MEDLINE平行语料 | propose a method for obtaining in-domain parallel corpora from titles and abstracts of publications in the MEDLINEdatabaseCombining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text |
| 10.1016_j.artmed.2014.01.004&W14-3328 | 研究方法/流程 | Purple | 利用Web资源选数据 | use of web resources to complement the training resourcesin-domain training and tuning intelligent training data selection |
| W14-3328&W14-3302 | 研究方法/流程 | Purple | WMT14医疗翻译任务 | WMT14 Medical Translation Taskattend the medical summary sentence unconstrained translation task of the Ninth Workshop on Statistical Machine Translation (WMT2014) |
| W14-3302&Q15-1013 | 数据集 | Green | WMT 2014评测 | WMT 2014 shared translation taskFindings of the 2014 Workshop on Statistical Machine Translation |
| Q15-1013&D15-1248 | 模型 | Red | 关系依赖语言模型 | dependency language model (RDLM) relational dependency language model We propose a language model for dependency structures that is relational rather than configurational as a feature function in string-to-tree SMT from English to German and Russian |
| D15-1248&000493806800162 | 模型 | Red | 联合形态句法模型 | syntaxbased SMT resultsimprovements in translation quality of 1.4-1.8 BLEU We propose to model syntactic and morphological structure jointly in a dependency translation model A Joint Dependency Model of Morphological and Syntactic Structure for Statistical Machine Translation |
| 000493806800162&000493806800009 | 工具或软件库 | Blue | BPE子词分割 | represent unseen words as sequences of subword units follow The network vocabulary size is 90 000 represent rare words via BPE settings and training procedure described by represent rare words (or morphemes in the case of Turkish) as character bigram sequences Neural Machine Translation of Rare Words with Subword Units segmentation based on the byte pair encoding compression algorithm encoding rare and unknown words as sequences of subword units simple character n-gram models |
| 000493806800009&D16-1162 | 评估方法 | Pink | BLEU分数提升 | gains in BLEU gains made by the proposed method state-of-the-art performance on several language pairswe obtain substantial improvements on the WMT 15 task English <-> German (+2.8-3.7 BLEU) obtaining new state-of-the-art results |
| D16-1162&000485630703053 | 研究方法/流程 | Purple | 低频词翻译错误 | NMT is prone to generate words that seem to be natural in the target sentence, but do not reflect the original meaning of the source sentence.Neural machine translation (NMT) often makes mistakes in translating low-frequency content words that are essential to understanding the meaning of the sentence. |
| 000485630703053&000493984800064 | 背景信息/通用知识 | Brown | 中译英性能提升 | Chinese-to-English translationExperimental results on Chinese-English translation show that the proposed approach achieves significant and consistent improvements over state-of-the-art NMT and SMT systems on multiple NIST test sets. |
| 000493984800064&000485488905004 | 研究方法/流程 | Purple | 建模源端句法 | syntax informationModeling Source Syntax for Neural Machine Translation |
| 000485488905004&000386658300285 | 数据集 | Green | 多领域平行语料库 | discourse parallel corpus 500 thousand Portuguese-English sentence pairs in various domains such as news, laws, microblog etc. 2 million Chinese-EnglishExperimental results on both Chinese English |
| 000386658300285&W11-2132 | 研究方法/流程 | Purple | SMT领域自适应 | approachDomain Adaptation for Statistical Machine Translation |
| W11-2132&E12-1014 | 研究方法/流程 | Purple | 利用单语数据 | use an existing SMT system to discover parallel sentences within independent monolingual texts, and use them to re-train and enhance the systemInvestigations on Translation Model Adaptation Using Monolingual Data |
| E12-1014&W13-2233 | 数据集 | Green | 利用可比语料 | use comparable corpora to score an existing Spanish-English phrase table extracted from the Europarl corpus compute both phrasal features and lexically smoothed features (using word alignments, like the Moses lexical translation probabilities) we use time-stamped web crawls as well as interlingually linked Wikipedia documents the techniques proposed by and We extend existing research on bilingual lexicon induction to estimate both lexical and phrasal translation probabilities for MT-scale phrasetables we examine an idealization where a phrase-table is given We propose a novel algorithm to estimate reordering probabilities from monolingual data Toward Statistical Machine Translation without Parallel Corpora |
| W13-2233&D14-1061 | 研究方法/流程 | Purple | 低资源词典构建 | build a translation lexicon from non parallel or comparable dataCombining Bilingual and Comparable Corpora for Low Resource Machine Translation |
| D14-1061&000493806800185 | 研究方法/流程 | Purple | 联合对齐与破译 | decipherment problemBeyond Parallel Data: Joint Word Alignment and Decipherment Improves Machine Translation |
| 000493806800185&D16-1160 | 研究方法/流程 | Purple | 半监督学习 | propose a similar semi-supervised framework similar semi-supervised reconstruction methodSemi-Supervised Learning for Neural Machine Translation |
| D16-1160&D16-1249 | 研究方法/流程 | Purple | 利用源端单语数据 | achieve better results monolingual corporaExploiting Source-side Monolingual Data in Neural Machine Translation |
| D16-1249&D16-1096 | 研究方法/流程 | Purple | 监督注意力机制 | supervised alignmentsSupervised Attentions for Neural Machine Translation |
| D16-1096&000493984800177 | 模型 | Red | 覆盖率模型 | coverage modelCoverage Embedding Models for Neural Machine Translation |
| 000493984800177&D17-1012 | 模型 | Red | NMT引入句法 | use of syntactic structures in NMT modelsImproved Neural Machine Translation with a Syntax-Aware Encoder and Decoder |
| D17-1012&000493992300012 | 模型 | Red | 共享负采样 | shared the negative samples of each target word in a sentence in training time |
| 000493992300012&W17-5708 | 模型 | Red | 融合RNNG到NMT | or the decoder modifying the encodercombining the recurrent neural network grammar into the attention-based neural machine translation |
| W17-5708&N18-2084 | 研究方法/流程 | Purple | 预训练词嵌入初始化 | pre-training is properly integrated into the NMT system using pre-trained embeddings in NMT pre-trained word embeddings have been used either in standard translation systems pre-training is usefulA Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size |
| N18-2084&000865723400103 | 数据集 | Green | 低资源翻译挑战 | multilingual training for low-resource translationThe performance of Neural Machine Translation (NMT) systems often suffers in lowresource scenarios where sufficiently largescale parallel corpora cannot be obtained. |
| 000865723400103&000900116903092 | 评估方法 | Pink | NMT新语言快速适应 | same four languages as -Azerbeijani (Az), Belarusian (Be), Galician (Gl) and Slovak (Sk) trained many-to-one models from 58 languages into English use tokenized BLEU in order to be comparable with recently published results on this dataset best fine-tuned many-to-one models of Rapid Adaptation of NMT to New Languages by training multilingual models the 59-language TED Talks corpus best many-to-one models in massively multilingual many-to-many models with up to 58 languages to-and-from Englishimproves over other adaptation methods by 1.7 BLEU points average over 4 LRL settings jointly train on both a LRL of interest and a similar high-resourced language to prevent over-fitting Rapid Adaptation of Neural Machine Translation to New Languages massively multilingual models, even without any explicit adaptation, are surprisingly effective massively multilingual seed models achieving BLEU scores of up to 15.5 |
| 000900116903092&000736531900047 | 模型 | Red | 大规模多对多模型 | multilingual translation jointly train one translation model that translates multiple language directions at the same time shares representations to improve the translation performance on low-resource languagesmassively multilingual many-to-many models are effective in low resource settings Massively Multilingual Neural Machine Translation training a single model that supports translation from multiple source languages into multiple target languages |
| 000736531900047&001181866502027 | 模型 | Red | mBART降噪预训练 | mBART fine-tunes with traditional on-the-fly backtranslation pre-trains on a variety of language configurationsMultilingual Denoising Pre-training for Neural Machine Translation |
| 001181866502027&000570978201083 | 研究方法/流程 | Purple | 统一概率框架 | generative modelingour approach results in higher BLEU scores over state-of-the-art unsupervised models We present a probabilistic framework for multilingual neural machine translation that encompasses supervised and unsupervised setups, focusing on unsupervised translation |
| 000570978201083&000663162000001 | 数据集 | Green | 在线回译 | online backtranslation to improve the performance of non-English pairs multilingual translation models factorize computation when translating to many languages and share information between similar languages English-Centric multilingual model from on the OPUS100 corpus Using backtranslation thus requires the ability to translate in both directions Mining data for each and every language pair is prohibitive - previous work circumvents this issue by focusing only on the 99 pairs that go through English train on 100 directions increasing model capacityExperiments on OPUS-100 (a novel multilingual dataset with 100 languages) Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation stronger modeling capacity OPUS-100 (a novel multilingual dataset with 100 languages) propose random online backtranslation |
| 000663162000001&000923421200003 | 评估方法 | Pink | 非英语中心多语言翻译 | multilingual machine translationBeyond English-Centric Multilingual Machine Translation |
| 000923421200003&2023.acl-long.524 | 数据集 | Green | FLORES-101评测基准 | FLORES-101 evaluation set FLORES-101 evaluation dataThe Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation |
| 2023.acl-long.524&2023.acl-long.859 | 研究方法/流程 | Purple | 偶发双语主义 | present in the training data as a marker for multilingual content attributing LLM MT capabilities to the presence of incidental bilingual examplesSearching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM’s Translation Capability PaLM is exposed to over 30 million translation pairs across at least 44 languages incidental bilingualism—the unintentional consumption of bilingual signals, including translation examples |
| 2023.acl-long.859&001371932502043 | 评估方法 | Pink | 大模型提示(Prompting) | large-scale models with parameters in the hundreds of billionsPrompting PaLM for Translation: Assessing Strategies and Performance |
| 001371932502043&001156229800010 | 模型 | Red | LLM翻译能力与错误分析 | LLMs can produce fluent and adequate translations, especially for high-resource English-centric language pairs, that are competitive with those of dedicated supervised translation models translation errors, even severely critical ones, obtained via prompting a LLM are different from those produced by traditional machine translation modelsa transformer decoder-only model trained solely with self-supervised learning is able to match specialized supervised state-of-the-art models |
| 001156229800010&2023.emnlp-main.733 | 研究方法/流程 | Purple | LLM幻觉研究 | apply the minimal perturbations of , including misspelling and title-casing words, and inserting frequent tokens at the beginning of the source sentenceHallucinations in Large Multilingual Translation Models |
| 2023.emnlp-main.733&2024.emnlp-main.802 | 研究方法/流程 | Purple | MBR解码与重排序 | reference-based and QE metrics to rerank multiple hypotheses generated by dedicated MT models or large language models (LLMs), aiming to improve translation qualitythe strategy to produce the final translation (instruction-based, quality-based reranking, and minimum Bayes risk (MBR) decoding) Our results show that MBR decoding is a very effective method |
| 2024.emnlp-main.802&2024.emnlp-main.1152 | 评估方法 | Pink | 评估指标与人类判断 | examining their correlation with human judgments, as well as their Precision, Recall, and F -score evaluated MT metrics’ ability to assess high-quality translationsCan Automatic Metrics Assess High-Quality Translations? |
| 2024.emnlp-main.1152&2024.emnlp-main.914 | 背景信息/通用知识 | Brown | 评估指标的可解释性 | not easy to interpretBeyond Correlation: Interpretable Evaluation of Machine Translation Metrics |