Translation: Give more context for auto-translate #447

benbucksch · 2025-02-19T06:33:11Z

Problem

When sending a translation to auto-translators, we need to give more context of the string. Otherwise, the translator cannot correctly translate. The translations are really good. But translating 1-word strings often goes wrong.

For example:

"End time" -> translated as "Apocalypse" in German
"Decline" -> translated as "Going down" in German. The meaning was in context of calendar invitation "Refuse".
"Quote" -> translated in many languages as "to give a price". The meaning was "to cite".

In most other cases, when there was context, because there were more words in the string, it was auto-translated correctly. So, that makes a big difference.

I understand that it's difficult to give the context of the strings in the same dialog, because the "extract strings" and hash-IDs of strings already removed the entire context.

Approach

So, is there a way to tell json-autotranslate to add some context to the string? E.g. the other strings around it?

Solutions - Ideas

1. IDs with code file name

It might be sufficient to generate IDs that include the code file name and file path in the ID, and to include that ID in the translation. And then remove it from the translated result. Maybe the file name and path already gives the translation enough context. E.g. instead of sending Decline and getting Niedergang, rather send "Calendar/Invitation/Display#hzlsh0" : "Decline", translated as "Kalender/Einladung/Anzeige#something" : "Ablehnen" -> (remove context) "Ablehnen" -> add that as translation.

Disadvantage: Changing the string IDs will significantly increase the size of the translation files in the distribution, and the amount of characters to be translated -> price. I worry more about the size of the app distribution.

2. Find source code file and context from ID

It would be much better, if the auto-translation engine could take the ID as we have them right now, go back to English string, find that string in the source code, and then find the other strings in that same source file, and give these strings as context. For example:

German: "hztle": ""
-> English: "hztle": "Decline"
Find source file: Search for "tDecline" and "gtDecline" in source code. Find the source code file name.
Find context: Find other t and gt strings in the dialog.
Build translation source string: "Decline" ### "Accept" "Maybe" "Time" "Topic" "Description"
Send that to auto-translation
Get back "Ablehnen" ### "Annehmen" "Vielleicht", "Zeit", "Thema", "Beschreibung"
Extract the first string "Ablehnen"
Put that into translation: German: "hztle": "Ablehnen"

If you want to optimize it, you can combine multiple strings that are untranslated.

3. Send multiple strings at once

Given that we likely add all the strings in the same dialog at the same time, a simple solution might be to not send individual strings one-by-one to the auto-translator, but to send multiple strings at the same time.

We have new untranslated strings "Decline" "Accept" "Maybe" "Time" "Topic" "Description"
Concatenate them and send Decline | Accept| Maybe | Time | Topic | Description (with these exact string separators)
Get back Ablehnen | Annehmen | Vielleicht | Zeit | Thema | Beschreibung
Parse the separators, and verify that we get back the same amount of strings that we sent. If not, fall back to individual strings.
Put them into the translation.

The text was updated successfully, but these errors were encountered:

benbucksch · 2025-02-19T06:33:52Z

Please try Solution 3 first, then 2, and 1 only as fallback when everything else fails.

jermy-c · 2025-02-20T00:47:38Z

Problems

Even if you manage to translate the strings with different variations the hash IDs would be the same for all variations
You would need to a custom function that records the fileName while extracting and if the string is a single word
The format of the context would need to be supported by the script that would submit the strings to be translated
The script would need to have the feature to group single word strings that appear in the same dialog together
The extra context would make the file much larger in the repo but it can be removed while compiling
If all strings were to have it's own hash and have the fileName context then that would make the file really large since there are instances where the same string is used in multiple places even though the translation is exactly the same

Solutions

Tolgee
Tolgee has the feature to group strings in the same dialog and submit together for translation

Custom Lib
We would need the t function to know when the string is single worded so it generates a hash based on string+fileName.
Then the extractor would also need to know the hash and output a format compatible for the auto-translate script. And then the auto-translate script would need to have the feature to do that.

Tweak Settings
Use the Lingui style output which already has the fileName but the hashID is still an issue. Or add the fileName as a comment in the t function if it's a single word then the hash would be different. Then we'd still need to modify the auto-translate script to support grouping the strings together.

Merge the Extractor and Auto Translate scripts together
json-autotranslate would need to have the extractor within also and then it could go into the source code and search for it and also they're would be the need for the extra context in the message files.

jermy-c · 2025-02-26T02:38:10Z

Solution 3 seems to be working well.

benbucksch · 2025-02-26T02:41:56Z

Solution 3 = "Send multiple strings at once" or "Tweak Settings" ?

jermy-c · 2025-02-26T16:20:19Z

Solution 3 = "Send multiple strings at once" or "Tweak Settings" ?

Send multiple strings at once

jermy-c · 2025-02-26T16:36:52Z

The process would be:

Extract strings from source code with the format as messages.template.json

{
  "Options": {
    "abcd": "A user-readable string",
  },
}

Auto-translate uses the above format for translating with context and output messages.json

{
  "abcd": "A user-readable string",
}

Since messages.template.json is not imported then it would not be bundled making it smaller.

So the files that are in the repo would be.

en/messages.template.json - the template file with format grouping by dialog
en/messages.json - the source language file that would be imported and bundled
{locale}/messages.json - the translations for target languages, it would be imported and bundled

benbucksch · 2025-02-26T16:42:57Z

That's great! Do it.

benbucksch · 2025-03-12T23:40:46Z

https://developers.deepl.com/docs/best-practices/working-with-context#text-translation-feature-context-parameter

benbucksch · 2025-03-14T08:58:28Z

@jermy-c What's the status here? Is this done? Or still TODO?

benbucksch · 2025-03-14T09:00:11Z

BTW: Your fix to not send a huge files to auto-translate worked. I was now able to translate only a few strings, with minimal amount of char in the API. That's great, thank you! This means that we can run the Auto-Translate script more often.

jermy-c · 2025-03-14T15:30:10Z

@jermy-c What's the status here? Is this done? Or still TODO?

No, it is not done. It might require a big change to json-autotranslate because it already has contexts but it is only one for the entire translation file/app. And also, it requires changes at the json-autotranslate script level and then at the DeepL API service level. I've started it but it's still not finished.

benbucksch · 2025-03-14T15:43:09Z

OK, thanks for the update. Lets put this on hold for the moment. This is important long-term, but not short-term.

Is there something valuable in what you already did? If so, can you push to a branch what you did, and post it here? And shortly describe (mostly for yourself, not for me) what your implementation plan is and what you already finished, so it's easier for you to pick up later? If you didn't do much yet (that's OK), then nevermind.

jermy-c · 2025-03-14T15:55:45Z

You're welcome.

Sorry, I implemented group words together plan. Not, the comment/context one.

Implementation Plan

Batch strings without context together for translation (that's how it is keep it the some way), to make it faster and avoid hitting the rate limit
Send individual strings for strings with context, because you can only one context.
Watch out, not to break the script, since all services use the same request structure

benbucksch · 2025-03-26T06:40:06Z

Hey Jeremy, you have implemented the commentSeparator = *=>. This is useful to disambiguate terms, e.g. email has been "read" vs. "Read"ing. I've commited some of such comments.

Unfortunately, the comment doesn't seem to be considered by the auto-translation.

I see that messages.template.json has these comments as description property. Is that submitted to the machine translation? If no, could you make that happen? If yes, why does it have no effect on the translation?

Another way is to simply submit our string with the *=> to the machine translation, and strip it afterward the translation. The comment is stripped before the string is written into the translated file messages.json. During runtime, when reading messages.json, we then no longer need to strip the comment, saving a tiny bit of runtime.

benbucksch · 2025-03-26T07:14:19Z

@jermy-c This is important. I have huge problems in the translations, due to lack of context.

"Read" (message is read) was translated in German as 'You shall read it'
"Decline" (the meeting) was translated in many languages as "Going down"
Merely to work around the bad translations, I changed "Decline" to "Refuse" (the meeting), and that was translated in a few languages as "Trash" 🤦 👊🤡 🤦

Without context or manual human translators, there's no way out of this. This is urgent.

Bad translations are highly embarrassing and give the impression of a very low quality product.

benbucksch assigned jermy-c Feb 19, 2025

benbucksch mentioned this issue Feb 19, 2025

Translations #444

Merged

benbucksch added the major label Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translation: Give more context for auto-translate #447

Translation: Give more context for auto-translate #447

benbucksch commented Feb 19, 2025

benbucksch commented Feb 19, 2025

jermy-c commented Feb 20, 2025 •

edited

Loading

jermy-c commented Feb 26, 2025

benbucksch commented Feb 26, 2025 •

edited

Loading

jermy-c commented Feb 26, 2025

jermy-c commented Feb 26, 2025

benbucksch commented Feb 26, 2025

benbucksch commented Mar 12, 2025

benbucksch commented Mar 14, 2025

benbucksch commented Mar 14, 2025

jermy-c commented Mar 14, 2025

benbucksch commented Mar 14, 2025 •

edited

Loading

jermy-c commented Mar 14, 2025

benbucksch commented Mar 26, 2025 •

edited

Loading

benbucksch commented Mar 26, 2025 •

edited

Loading

Translation: Give more context for auto-translate #447

Translation: Give more context for auto-translate #447

Comments

benbucksch commented Feb 19, 2025

Problem

Approach

Solutions - Ideas

1. IDs with code file name

2. Find source code file and context from ID

3. Send multiple strings at once

benbucksch commented Feb 19, 2025

jermy-c commented Feb 20, 2025 • edited Loading

Problems

Solutions

jermy-c commented Feb 26, 2025

benbucksch commented Feb 26, 2025 • edited Loading

jermy-c commented Feb 26, 2025

jermy-c commented Feb 26, 2025

benbucksch commented Feb 26, 2025

benbucksch commented Mar 12, 2025

benbucksch commented Mar 14, 2025

benbucksch commented Mar 14, 2025

jermy-c commented Mar 14, 2025

benbucksch commented Mar 14, 2025 • edited Loading

jermy-c commented Mar 14, 2025

Implementation Plan

benbucksch commented Mar 26, 2025 • edited Loading

benbucksch commented Mar 26, 2025 • edited Loading

jermy-c commented Feb 20, 2025 •

edited

Loading

benbucksch commented Feb 26, 2025 •

edited

Loading

benbucksch commented Mar 14, 2025 •

edited

Loading

benbucksch commented Mar 26, 2025 •

edited

Loading

benbucksch commented Mar 26, 2025 •

edited

Loading