「2023.004|EN」Best Practices Series: Multi-Language Support for Technical Documents
Why Do Technical Documents Need to Support Multiple Languages?
- Problem one: Most mainstream development languages and frameworks originated from Europe and the United States, so their initial documentation is mostly in English. As a developer, even if your native language is not English, reading English documents is gradually becoming a necessary skill. However, the reading efficiency of developers when reading the English version and the native language version varies from person to person. Fortunately, many volunteers usually emerge in these non-English countries to help translate the official English documents.
- Problem two: Many people have personal blogs, especially technology personnel, they are willing to record the problems they encountered and solutions in their own blog. This is not only to summarize their experience, but also to share their knowledge with others. But due to the language used to write blog articles, people whose mother tongue is other languages may not be able to search or understand his articles. This limits the spread of articles and the sharing of knowledge, as well as the convenience of finding solutions to problems for others.
What Kind of Multi-Language Support Do We Want?
- High translation quality: This will always be the most important
- Easy to reference the original text
- It is best to display the translation and original text on the same page
- This makes it easy for us to view the original text to confirm when we find that a sentence is not translated properly
- It is best not to change the original web page structure
- It is best to display the translation and original text on the same page
- Minimize manual intervention workload
- As mentioned earlier, there are usually volunteers who help translate the official documents of the software, but the dilemma still exists is that due to lack of volunteers or limited time, the latest official documents have not been translated timely
- Some multi-language support solutions maintain pages or websites in other languages, which may increase development and maintenance costs
- Good feedback and improvement mechanism
- Even ChatGPT, which currently has high translation quality, sometimes translates unsatisfactorily
- When the translation quality of a word or a sentence is low, there should be a mechanism for manual intervention, modification, and feedback
- The translator needs to be able to learn from user feedback on translation content and improve its translation ability
Current Mainstream Multi-language Support Ways
Google Translate
Google Translate can be directly integrated in the website:
![website multiple language support using Google Translate](/zh-CN/post/Best-Practices-Series-Multiple-Language-Support-for-Technical-Documentation/1.gif)
Overall Evaluation (score 0-5):
Evaluation | Score | Remarks |
---|---|---|
High Translation Quality | 2 | The quality of Google Translate is not high at present stage |
Convenient to reference original text | 3 | Need to switch through menu, or open two webpages |
Low workload of human involvement | 4 | This way is the simplest and most primitive with lowest cost, and that's why it is the most common |
Feedback and improvement mechanism | 0.5 | Current users have almost no way to feedback translation results |
Multi-language Page Switching
Switching to other language versions of current page via language switching button, is actually a webpage with a different URL. This way is actually similar to integrating Google Translate in the website.
![different language on different page](/zh-CN/post/Best-Practices-Series-Multiple-Language-Support-for-Technical-Documentation/2.gif)
Overall Evaluation (score 0-5):
Criteria | Score | Remarks |
---|---|---|
High Translation Quality | 4.5+ | As in the example above, this type of official documentation should be translated with machine translation plus manual proofreading |
Convenient to reference original text | 3 | Need to switch through menu, or open two webpages |
Low workload of human involvement | 1 |
|
Feedback and improvement mechanism | 1.5 |
For open source software like Python, its documentation is usually maintained as a separate project. If the user wants to modify the translation content, they usually need to feedback through issue tracker, refer to section Dealing with Bugs of official Python documentation.
|
Establish Different Language Websites
Some software documents of different languages are located in different domain names, and some are placed in different subdomains. Compared to the switching of multi-language pages above, this method has a higher workload for maintaining web servers.
![different language on different site](/zh-CN/post/Best-Practices-Series-Multiple-Language-Support-for-Technical-Documentation/3.gif)
Overall Evaluation (score 0-5):
Criteria | Score | Remark |
---|---|---|
High quality translation | 4.5+ | Similar to the situation mentioned in the last section |
Easy reference to the original document | 2? | According to the example of the Angular official website, in a certain document page, you cannot switch directly to the corresponding page of other language websites (Angular.cn has its unique solution, which will be explained in detail later) |
Less manual involvement | 0.5 | The workload is similar to that mentioned in the last section, but with additional costs for maintaining extra web servers |
Feedback and improvement mechanism | 1.5 | Similar to the situation mentioned in the last section |
Digression - Auxiliary Translation Tools for Reading Foreign Documents
The previous article was from the perspective of the developer of technical software documentation websites to discuss various optional technical solutions to support multiple languages.
On the other hand, as document readers, we always advocate reading the original foreign documents, and at this time we can use auxiliary translation tools to enhance our reading efficiency.
Although this is a digression considering the title of this article, it is still necessary to talk about it.
Google Translate in Chrome
Using built-in functions/plugins in the client (such as browsers) to call, its implementation, translation results and website-side introduction of Google translation are similar.
![translate website into another language using Chrome embedded Google Translate](/zh-CN/post/Best-Practices-Series-Multiple-Language-Support-for-Technical-Documentation/4.gif)
Word-Selection Translation
When I read the official documents in the original version, study Japanese or English, I often use word selection translation tools to look up words that I don't know.
Combining some word selection tools with Anki, you can directly make Anki cards after word selection query for word memorization. I will write an article specially introducing this powerful memory software - Anki in the future.
Online Dictionary Helper
Online Dictionary Helper has built-in many dictionaries and supports translations between various mainstream languages and English and Chinese. In addition to words, it also supports the word-selection translation of sentences, but the translation quality is not high. It can also word-select to directly make Anki cards.
![Word-Selection Translation using Online Dictionary Helper](/zh-CN/post/Best-Practices-Series-Multiple-Language-Support-for-Technical-Documentation/5.gif)
Yomichan
Yomichan is mainly used for Japanese word queries and can download various dictionaries, but all definitions are in English. It does not support word-selection translation of sentences. It also can word-select to directly make an Anki card.
![Word-Selection Translation using Yomichan](/zh-CN/post/Best-Practices-Series-Multiple-Language-Support-for-Technical-Documentation/6.gif)
openai-translator
openai-translator uses the recently popular ChatGPT for translation, which has the following characteristics:
- Advantages
- Relatively high translation quality: because ChatGPT has learned a large amount of literature, documents, websites, etc., its translation accuracy is relatively high.
- Certain feedback and improvement mechanisms: when the translation quality is poor, you can use the
polishing
button to request a re-translation, and usually you can get a better version.
- Disadvantages
- Privacy and security issues: this is an old problem, word selection translation will leak the content we are looking at to ChatGPT
- Word-selection translation limits translation capabilities: when ChatGPT works as a translator, it can most powerfully increase the learning quality of the entire article based on analysis and learning of the context of the article. But word-selection translation is equivalent to only providing it with partial content. Although it can translate based on historical learning experience in the cloud, the effect is certainly not as good as if the entire document is provided.
- Feedback and improvement mechanisms need to be improved: if official ChatGPT is used to translate certain text, and the translation quality is not high, we can feedback to it and give better translation suggestions, which helps improve its translation ability. However, openai-translator can only use the
polishing
button to let ChatGPT "re-translate", and cannot "tell it how to translate better".
![Word-Selection Translation using openai-translator](/zh-CN/post/Best-Practices-Series-Multiple-Language-Support-for-Technical-Documentation/7.gif)
Best Practices for Supporting Multiple Languages in Technical Documentation
Coming back to the topic, what is the best practice to make a technical software documentation website support multiple languages from the point of view of a website developer?
Based on the previous part, we can understand that it needs to meet the following requirements as much as possible:
- High quality of translation
- Convenient reference to original text
- Minimizing manual workload
- Good feedback and improvement mechanism
Let's talk about what existing products have solved these problems well.
The best practice of angular.cn
It is difficult to solve 2. Convenient reference to original text
. In all kinds of translation schemes I have seen so far, only angular.cn has solved this problem ingeniously.
![angular.cn made it easy to refer to original text](/zh-CN/post/Best-Practices-Series-Multiple-Language-Support-for-Technical-Documentation/8.gif)
From the figure we can see that when you click a paragraph in the document of angular.cn, the original English text will be displayed, which is very convenient for comparison and reference. When the translation of Chinese is ambiguous, it is particularly useful.
Angular.jp does not implement this function, so I guess it is the inspiration of the maintainer of angular.cn. After checking the issue of the angular-cn document project, it is found that it is indeed the case, and the author also introduced the implementation idea. After viewing a source file of markdown, it is found that the original and translated versions are indeed as the author intended, written in one document.
The Hope Brought by ChatGPT
Angular.cn (https://angular.cn/docs) is a good way to solve the problem of 2. Conveniently Refer to the Original
, and ChatGPT suddenly appeared, which can better solve the problems of 1. High Translation Quality
and 3. Minimize Manual Intervention
, but it has certain limitations in the point of 4. Good Feedback and Improvement Mechanism
. We have already talked about this when introducing openai-translator.
My Opinion on the Best Practices
To achieve the best practices, the four points mentioned above need to be done well. No products that I have used can do them all at present. If you know, welcome to comment and share.
Currently, the overall more ideal option seems to be openai-translator, although it is a user-side plugin. However, as discussed earlier regarding the integration of Google Translate on the website and browser plugins, it can be seen that there is little difference in their technical implementation. The only difference is whether the external API translation call is made on the web server-side or on the user's browser-side.
Now, let's discuss my detailed views on the best practices. Here, we will no longer distinguish between server side and user side:
- High quality translation: Advanced tools such as ChatGPT API are called to translate.
- Easy reference of the original text: Combined with Google Translate and angular.cn way.
- The strength of Google Translate is that it can accurately replace only the text in the web page without modifying the HTML elements or Javascript in the web page source code.
- Google has the innate technology advantage of marking and extracting text from the web page, since it started as a search engine.
- Google was originally a search engine, so there is strong crawler support behind it, which gives Google rich experience in extracting text from web pages.
- Google developed the Chrome browser, whose core function is to do DOM parsing and rendering, which also gives Google rich experience in marking and extracting text from web pages.
- If we want to achieve this effect ourselves, the technical difficulty cannot be ignored, fortunately Chromium is open source and can be referenced. In addition, I guess there should be other open source libraries specially designed to solve this problem.
- Google has the innate technology advantage of marking and extracting text from the web page, since it started as a search engine.
- The mouse clicking of angular.cn can switch between hiding and displaying the original text, which is very advanced.
- How to combine the above two ways
- Use Chrome-like ability to parse DOM, extract text
- (Optional) Send all relevant webpages directly to ChatGPT for learning
- It is necessary for improving the overall translation effect of ChatGPT, and the previous sections have discussed it.
- Generally, the entire website can be sent to ChatGPT, so that it can read all the content on the website and prepare to translate contents on a specific page
- Extract the text of the current webpage and call ChatGPT for translation
- Copy the original DOM elements and add them before the original DOM elements, replacing the text with the translated text
- Use CSS to hide the original DOM elements
- Use CSS and Javascript to show the original DOM when user clicks on the translated DOM element
- The above steps 1-6 can actually be completely done by ChatGPT, but it requires appropriate training for it.
- As with Google, ChatGPT must also have powerful web page DOM parsing and text extraction abilities
- The strength of Google Translate is that it can accurately replace only the text in the web page without modifying the HTML elements or Javascript in the web page source code.
- Minimize manual workload: If points 1, 2, 4 can be done well, this point will naturally become true.
- Good feedback and improvement mechanism: Make full use of ChatGPT’s learning and self-improvement capabilities. Here we continue the discussion of the technical implementation in “Combine two ways” in 2
- When a translation in the DOM element contains an inappropriate translation, a user should be able to select the relevant words and call out the ChatGPT dialog box to report the problem found, or even further tell it the correct translation, which will help improve it
- After we point out problems to ChatGPT and give it translation guidance, we need ChatGPT to re-translate and then update the relevant DOM elements with the new translations, wherein the optimized translated parts can be highlighted
Conclusion
The advent of ChatGPT has brought great convenience to developers and greatly changed the translation of documents and reading of foreign documents. However, because it can do far more than that, some people are worried that it will make some occupations unemployed. I think there is no need to worry at present. Familiarizing with it and making good use of it will only make us more powerful.
If one day AI can also think out the best practices and automatically implement them, then it may not just replace humans, but eliminate humanity altogether :)