Vision of multilingual document engineering

Author: Chris Turner,


The multilingual documents are the result of a team of people collaborating, each contributing a particular skill or expertise. The collaborators are sharing common documents and terminology termbanks and most can read and write to these documents and termbanks. Good communications and the CyTerm computer software support tools ensure that read and write access to the documents is coordinated to minimise conflicts and detect inconsistencies. The software provides automatic notification of changes in the documents so that new workflows can be initiated with minimal delay. The termbanks provide a resource which ensures that language specific terms are mapped to the correct concepts unambiguously and consistently.

The team of collaborators includes individuals and members from different companies and also includes the monolingual source language document authors as well as the monolingual target language document adapters.

Documents change with time and the CyTerm software tools ensure that small changes to the source documents result in proportionately small changes to the target derived or dependant documents, automating the implementation of derived document changes as much as possible. The software system has a memory so that information entered once does not need to be reentered.

The actors, processes and supporting resources will now be described in an approximately linear flow but the real system will be iterative and some processes will be performed concurrently. The actors are described by their responsibilities and roles. There is not necessarily one person for each role and sometimes one person may fulfil several roles.

Concepts, Terms, and Mark-up

Terms are words in a particular language that are used to express a concept. The concept itself is ideally universal and independent of language. The Concept is an artificial device that is used to group several language dependent Terms that all express the same concept. It is convenient to use this Concept device since it provides a place to store information (e.g. a picture, or a link to a related concept) that is common to all terms that express the concept and it provides a pathway to navigate from a term in one language to a term in another language. Mark-up is annotation that has been added to a part of the text which provides extra clarification of the meaning of the main text. This mark-up is normally invisible to expert audiences that do not require to see it but it can be made visible to less expert audiences such as non-specialists and computer software who would have difficulty decoding the text without this supplementary mark-up..

The Original Source Author

The Original Source Author has a critically important role to play in the document engineering process. The source author is the highest authority on the subject message that he intends to convey to the readers and of what concepts he intends to convey when he uses a particular term, phrase, or sentence structure. His understanding of what is meant by a term or sentence might not be shared by others. When composing the source document he should ensure that every term and sentence that he uses is unambiguously mapped to a concept or concepts stored in a terminology databank. If he finds a term is not in the termbank he shall provide an entry and a definition. If he finds that a term maps to several different concepts in the termbank he shall mark-up his document to mark which concept he intends. If he finds that a sentence can be interpreted ambiguously he shall rewrite the sentence resolving the ambiguity or provide a mark-up resolving the ambiguity.

If the Original Source Author cannot be persuaded to fulfil all these tasks, then a proxy for the original source author must be found. The Author Proxy will be a subject specialist and be monolingual. He will be capable of unambiguously decoding all the writing of the Original Source Author.

The Target Subject Specialist

The Target Subject Specialist knows the terms used for his specialist subject in his native (target) language. He is substantially a monolingual but is able to read sufficiently well in a foreign source language to be able to recognise a concept when it has been verbosely defined in that foreign source language. Note that he does not need to recognise specialist terms in the source language, nor does he need to be able to decode complex specialist language constructs in the source language. His role is to contribute terms and definitions of the term to the termbank for his specialism and native language. He may work reactively by being notified of new concepts that have been added to the termbank for which there is no term that has yet been entered in his language.

The Translator

The Translator is bilingual and can read and understand texts in the source language and can accurately express the text in the target (native) language for a range of specialist subjects. Note that the Translator does not need to disambiguate complex language constructs since the Original Source Author (or his Author Proxy) has already done that by marking-up the document. Neither does the Translator need to know the meaning of all terms in the source document since the meaning can be looked up in the termbank. Neither does the Translator need to know the specialist target language term since that can also be looked up in the termbank (it was placed there by the Target Subject Specialist).

The Target Document Adapter

He is a monolingual and knows the needs and abilities of a target audience for a document. His role is to adapt the message of the author to suit a particular audience and purpose that could not be fully considered by the author. He only needs to be monolingual since an accurate message of the author has been provided by the translator.

Changes to Documents

Where changes are made to a document, then the entire process involving all collaborators can be repeated treating only the changed part of the text plus sufficient locating context as input to the process. Software tools and the information provided by mark-up provide sufficient means for locating and managing the edits required in all documents.

Why this vision is likely to become reality

In the current translation market, an individual translator takes on all roles to a greater or lesser extent. This multiplication of ability to perform a role automatically means that it is less likely that a customer can find a free individual translator who can satisfy all the roles. An example might be an icelandic to chinese specialist in hot metal galvanising machines. If such a specialist did exist he might not be able find enough regular work to support himself and would probably be doing something else. In contrast there is a much better chance of finding the abilities in different persons. The reason that it has not happened on a large scale to the present time is because the communication overhead between team members is not efficiently supported and there is no economic framework which allows all collaborators to benefit. I believe that communications and document mark-up technology and electronic trading technology is now sufficiently advanced to permit such collaborative work to be practically efficient. This efficiency will reduce the costs of the suppliers. The separation of skills will also increase the capacity of the suppliers. This gives such suppliers a competive advantage which will lead to increasing market share.

Missing components

The following components are poorly implemented at present. The CyTerm project aims to provide good implementations. Deliverables and funding

The CyTerm project will deliver software tools implementing the missing components listed above. Funding will be by forming a club and requiring a member subscription.

There will also be an on-line service for coordinating software and termbank updates and also providing a trading mechanism for member to member trade. This will be funded and managed by Cycom Limited.

The deliverables will be determined in consultation with the members but the infrastructure components already identified include:-

