「2023.004|EN」Best Practices Series: Multi-Language Support for Technical Documents

Why Do Technical Documents Need to Support Multiple Languages?

  • Problem one: Most mainstream development languages and frameworks originated from Europe and the United States, so their initial documentation is mostly in English. As a developer, even if your native language is not English, reading English documents is gradually becoming a necessary skill. However, the reading efficiency of developers when reading the English version and the native language version varies from person to person. Fortunately, many volunteers usually emerge in these non-English countries to help translate the official English documents.
  • Problem two: Many people have personal blogs, especially technology personnel, they are willing to record the problems they encountered and solutions in their own blog. This is not only to summarize their experience, but also to share their knowledge with others. But due to the language used to write blog articles, people whose mother tongue is other languages may not be able to search or understand his articles. This limits the spread of articles and the sharing of knowledge, as well as the convenience of finding solutions to problems for others.

What Kind of Multi-Language Support Do We Want?

  • High translation quality: This will always be the most important
  • Easy to reference the original text
    • It is best to display the translation and original text on the same page
      • This makes it easy for us to view the original text to confirm when we find that a sentence is not translated properly
    • It is best not to change the original web page structure
  • Minimize manual intervention workload
    • As mentioned earlier, there are usually volunteers who help translate the official documents of the software, but the dilemma still exists is that due to lack of volunteers or limited time, the latest official documents have not been translated timely
    • Some multi-language support solutions maintain pages or websites in other languages, which may increase development and maintenance costs
  • Good feedback and improvement mechanism
    • Even ChatGPT, which currently has high translation quality, sometimes translates unsatisfactorily
    • When the translation quality of a word or a sentence is low, there should be a mechanism for manual intervention, modification, and feedback
    • The translator needs to be able to learn from user feedback on translation content and improve its translation ability

Current Mainstream Multi-language Support Ways

Google Translate

Google Translate can be directly integrated in the website:

Overall Evaluation (score 0-5):

Evaluation Score Remarks
High Translation Quality 2 The quality of Google Translate is not high at present stage
Convenient to reference original text 3 Need to switch through menu, or open two webpages
Low workload of human involvement 4 This way is the simplest and most primitive with lowest cost, and that's why it is the most common
Feedback and improvement mechanism 0.5 Current users have almost no way to feedback translation results

Multi-language Page Switching

Switching to other language versions of current page via language switching button, is actually a webpage with a different URL. This way is actually similar to integrating Google Translate in the website.

Overall Evaluation (score 0-5):

Criteria Score Remarks
High Translation Quality 4.5+ As in the example above, this type of official documentation should be translated with machine translation plus manual proofreading
Convenient to reference original text 3 Need to switch through menu, or open two webpages
Low workload of human involvement 1
  • As mentioned before, machine translation plus proofreading needs certain human involvement
  • In the example above, Python 3.12.0a7 is the latest official dev version, in which new content will be added and contents from previous version will be modified. Such contents cannot be translated timely due to limited number and strength of volunteers.
  • Also maintaining independent pages require certain workload
Feedback and improvement mechanism 1.5 For open source software like Python, its documentation is usually maintained as a separate project. If the user wants to modify the translation content, they usually need to feedback through issue tracker, refer to section Dealing with Bugs of official Python documentation.

Establish Different Language Websites

Some software documents of different languages are located in different domain names, and some are placed in different subdomains. Compared to the switching of multi-language pages above, this method has a higher workload for maintaining web servers.

Overall Evaluation (score 0-5):

Criteria Score Remark
High quality translation 4.5+ Similar to the situation mentioned in the last section
Easy reference to the original document 2? According to the example of the Angular official website, in a certain document page, you cannot switch directly to the corresponding page of other language websites (Angular.cn has its unique solution, which will be explained in detail later)
Less manual involvement 0.5 The workload is similar to that mentioned in the last section, but with additional costs for maintaining extra web servers
Feedback and improvement mechanism 1.5 Similar to the situation mentioned in the last section

Digression - Auxiliary Translation Tools for Reading Foreign Documents

The previous article was from the perspective of the developer of technical software documentation websites to discuss various optional technical solutions to support multiple languages.

On the other hand, as document readers, we always advocate reading the original foreign documents, and at this time we can use auxiliary translation tools to enhance our reading efficiency.

Although this is a digression considering the title of this article, it is still necessary to talk about it.

Google Translate in Chrome

Using built-in functions/plugins in the client (such as browsers) to call, its implementation, translation results and website-side introduction of Google translation are similar.

Word-Selection Translation

When I read the official documents in the original version, study Japanese or English, I often use word selection translation tools to look up words that I don't know.

Combining some word selection tools with Anki, you can directly make Anki cards after word selection query for word memorization. I will write an article specially introducing this powerful memory software - Anki in the future.

Online Dictionary Helper

Online Dictionary Helper has built-in many dictionaries and supports translations between various mainstream languages and English and Chinese. In addition to words, it also supports the word-selection translation of sentences, but the translation quality is not high. It can also word-select to directly make Anki cards.

Yomichan

Yomichan is mainly used for Japanese word queries and can download various dictionaries, but all definitions are in English. It does not support word-selection translation of sentences. It also can word-select to directly make an Anki card.

openai-translator

openai-translator uses the recently popular ChatGPT for translation, which has the following characteristics:

  • Advantages
    • Relatively high translation quality: because ChatGPT has learned a large amount of literature, documents, websites, etc., its translation accuracy is relatively high.
    • Certain feedback and improvement mechanisms: when the translation quality is poor, you can use the polishing button to request a re-translation, and usually you can get a better version.
  • Disadvantages
    • Privacy and security issues: this is an old problem, word selection translation will leak the content we are looking at to ChatGPT
    • Word-selection translation limits translation capabilities: when ChatGPT works as a translator, it can most powerfully increase the learning quality of the entire article based on analysis and learning of the context of the article. But word-selection translation is equivalent to only providing it with partial content. Although it can translate based on historical learning experience in the cloud, the effect is certainly not as good as if the entire document is provided.
    • Feedback and improvement mechanisms need to be improved: if official ChatGPT is used to translate certain text, and the translation quality is not high, we can feedback to it and give better translation suggestions, which helps improve its translation ability. However, openai-translator can only use the polishing button to let ChatGPT "re-translate", and cannot "tell it how to translate better".

Best Practices for Supporting Multiple Languages in Technical Documentation

Coming back to the topic, what is the best practice to make a technical software documentation website support multiple languages from the point of view of a website developer?

Based on the previous part, we can understand that it needs to meet the following requirements as much as possible:

  1. High quality of translation
  2. Convenient reference to original text
  3. Minimizing manual workload
  4. Good feedback and improvement mechanism

Let's talk about what existing products have solved these problems well.

The best practice of angular.cn

It is difficult to solve 2. Convenient reference to original text. In all kinds of translation schemes I have seen so far, only angular.cn has solved this problem ingeniously.

From the figure we can see that when you click a paragraph in the document of angular.cn, the original English text will be displayed, which is very convenient for comparison and reference. When the translation of Chinese is ambiguous, it is particularly useful.

Angular.jp does not implement this function, so I guess it is the inspiration of the maintainer of angular.cn. After checking the issue of the angular-cn document project, it is found that it is indeed the case, and the author also introduced the implementation idea. After viewing a source file of markdown, it is found that the original and translated versions are indeed as the author intended, written in one document.

The Hope Brought by ChatGPT

Angular.cn (https://angular.cn/docs) is a good way to solve the problem of 2. Conveniently Refer to the Original, and ChatGPT suddenly appeared, which can better solve the problems of 1. High Translation Quality and 3. Minimize Manual Intervention, but it has certain limitations in the point of 4. Good Feedback and Improvement Mechanism. We have already talked about this when introducing openai-translator.

My Opinion on the Best Practices

To achieve the best practices, the four points mentioned above need to be done well. No products that I have used can do them all at present. If you know, welcome to comment and share.

Currently, the overall more ideal option seems to be openai-translator, although it is a user-side plugin. However, as discussed earlier regarding the integration of Google Translate on the website and browser plugins, it can be seen that there is little difference in their technical implementation. The only difference is whether the external API translation call is made on the web server-side or on the user's browser-side.

Now, let's discuss my detailed views on the best practices. Here, we will no longer distinguish between server side and user side:

  1. High quality translation: Advanced tools such as ChatGPT API are called to translate.
  2. Easy reference of the original text: Combined with Google Translate and angular.cn way.
    • The strength of Google Translate is that it can accurately replace only the text in the web page without modifying the HTML elements or Javascript in the web page source code.
      • Google has the innate technology advantage of marking and extracting text from the web page, since it started as a search engine.
        • Google was originally a search engine, so there is strong crawler support behind it, which gives Google rich experience in extracting text from web pages.
        • Google developed the Chrome browser, whose core function is to do DOM parsing and rendering, which also gives Google rich experience in marking and extracting text from web pages.
      • If we want to achieve this effect ourselves, the technical difficulty cannot be ignored, fortunately Chromium is open source and can be referenced. In addition, I guess there should be other open source libraries specially designed to solve this problem.
    • The mouse clicking of angular.cn can switch between hiding and displaying the original text, which is very advanced.
    • How to combine the above two ways
      1. Use Chrome-like ability to parse DOM, extract text
      2. (Optional) Send all relevant webpages directly to ChatGPT for learning
        • It is necessary for improving the overall translation effect of ChatGPT, and the previous sections have discussed it.
        • Generally, the entire website can be sent to ChatGPT, so that it can read all the content on the website and prepare to translate contents on a specific page
      3. Extract the text of the current webpage and call ChatGPT for translation
      4. Copy the original DOM elements and add them before the original DOM elements, replacing the text with the translated text
      5. Use CSS to hide the original DOM elements
      6. Use CSS and Javascript to show the original DOM when user clicks on the translated DOM element
    • The above steps 1-6 can actually be completely done by ChatGPT, but it requires appropriate training for it.
      • As with Google, ChatGPT must also have powerful web page DOM parsing and text extraction abilities
  3. Minimize manual workload: If points 1, 2, 4 can be done well, this point will naturally become true.
  4. Good feedback and improvement mechanism: Make full use of ChatGPT’s learning and self-improvement capabilities. Here we continue the discussion of the technical implementation in “Combine two ways” in 2
    1. When a translation in the DOM element contains an inappropriate translation, a user should be able to select the relevant words and call out the ChatGPT dialog box to report the problem found, or even further tell it the correct translation, which will help improve it
    2. After we point out problems to ChatGPT and give it translation guidance, we need ChatGPT to re-translate and then update the relevant DOM elements with the new translations, wherein the optimized translated parts can be highlighted

Conclusion

The advent of ChatGPT has brought great convenience to developers and greatly changed the translation of documents and reading of foreign documents. However, because it can do far more than that, some people are worried that it will make some occupations unemployed. I think there is no need to worry at present. Familiarizing with it and making good use of it will only make us more powerful.

If one day AI can also think out the best practices and automatically implement them, then it may not just replace humans, but eliminate humanity altogether :)