SYSTRAN's MT ARCHITECTURE



OverView

The general framework which SYSTRAN utilizes in all its MT systems is proven to be powerful and effective. In its long history, many improvements have been made to the original design, resulting in great modularity.

Use of existing modules, as well as consistent use of similar methods across different languages, when applicable, will allow quick and efficient development of a functional prototype system for any new language pair.

SYSTRAN's architecture is also very flexible and allows introduction of innovative methods. In fact, with every new language added to the SYSTRAN inventory some new techniques have been tried in response to new challenges of that language. Often such innovations are later found to be also applicable to other language pair systems.

Methodology

SYSTRAN's methodology is a sentence by sentence approach, concentrating first on individual words and their dictionary data, then on the parse of the sentence unit, followed by the translation of the parsed sentence.

Modularity

Three major groups describe the SYSTRAN architecture: Dictionary, Systems Software and Linguistic Software. Each of these consists of a great number of modules which all work together to create fully automatic MT (Machine Translation) system.

Dictionary

SYSTRAN traditionally employs three distinct, but interconnected types of dictionaries for the MT systems of all languages.

System Software

A body of systems software, consistent across the various SYSTRAN language pairs, handles formatting, character conversion, user interface, sentence and word boundary determination, dictionary and morphology lookup, and not-found word treatment. It controls the flow of linguistic modules and creates final formatted output. Also supported are a variety of tools for dictionary preparation, quality assurance, corpus manipulation, and parsing diagnostics.

Linguistic Software

Development of Additional Language

Development of new language-pair translation capability between languages for which SYSTRAN already has source and target modules, is the easiest to accomplish. Only a new transfer module and the transfer/target dictionaries need to be created.

Development of additional target language capability for each source system is possible and quite economical because SYSTRAN systems are set up as "Multi-target" systems. Adding another target language would necessitate only the development of a new Transfer module and a new Synthesis module, as well as building up the Transfer / Target dictionaries.

Development of additional source language capability for each target system is more difficult, if a completely new parser has to be created. However, if the new source language is closely related to one of the existing SYSTRAN source languages, development of a new parser can take advantage of common rules within a language family via the use of existing "Trunk Parsers", (such as Romance Trunk, Slavic Trunk,...).



Copyright © 1995 - 2001 GY.com Inc. All Rights Reserved.