Academic research has paid considerable attention to statistical machine translation (MT), but in practice traditional rule-based machine translation (RBMT) architectures continue to dominate. This is due to a number of reasons and shortcomings on both sides.
To address this, the project
HYGHTRA
(A hybrid high quality translation system) proposed a hybrid
architecture combining the strengths of both approaches and minimising
their weaknesses for high-quality MT. the undertaking was carried out as
collaborative project between a university translation studies centre
and a language engineering company, both German.
Project members devised a methodology and created a set of computational tools and resources that rapidly integrate new languages and translation directions into a rule-based MT system using statistical MT techniques.
Work also led to a modular development infrastructure. This opened the way to a new range of products and services that the industrial partner has since offered to new markets beyond traditional MT users. These include modules for rich linguistic analysis and generation, terminology extraction and support of the collaborative translation process.
The team also explored novel uses of MT technology in areas such as language learning and translator training. They proposed a pedagogically grounded method and scenarios for using MT for advanced language learners to support the process.
New methodologies are relevant to induction of richly annotated dictionaries and grammars from large text collections, extracting databases of translation equivalents, bootstrapping electronic dictionaries and grammars for new closely related languages, and statistical disambiguation of competing applications of parsing rules. Pedagogically motivated scenarios of using MT for generating negative linguistic evidence for advanced language learning and translator training was tested in the teaching of a university-level module 'English for Translators'.
Other project activities include a series of workshops at leading international conferences on Computational Linguistics. These have brought together a community of MT researchers and industrial MT developers who are interested in hybrid approaches to MT.
HYGHTRA's major scientific contribution is development of a new way of building hybrid MT systems, adding statistical techniques to an existing wide-coverage RBMT. The core system architecture remains rule-based, but statistical methods support rapid development of grammars and dictionaries for new translation directions. This helps to maintain the accuracy of the linguistic analysis and a smaller computational footprint.