public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jiang Xin <worldhello.net@gmail•com>
To: Junio C Hamano <gitster@pobox•com>, Git List <git@vger•kernel.org>
Cc: "Jiang Xin" <worldhello.net@gmail•com>,
	"Alexander Shopov" <ash@kambanaria•org>,
	"Mikel Forcada" <mikel.forcada@gmail•com>,
	"Ralf Thielow" <ralf.thielow@gmail•com>,
	"Jean-Noël Avila" <jn.avila@free•fr>,
	"Bagas Sanjaya" <bagasdotme@gmail•com>,
	"Dimitriy Ryazantcev" <DJm00n@mail•ru>,
	"Peter Krefting" <peter@softwolves•pp.se>,
	"Emir SARI" <bitigchi@me•com>, "Arkadii Yakovets" <ark@cho•red>,
	"Vũ Tiến Hưng" <newcomerminecraft@gmail•com>,
	"Teng Long" <dyroneteng@gmail•com>,
	"Yi-Jyun Pan" <pan93412@gmail•com>
Subject: [PATCH v2 5/5] docs(l10n): add AI agent instructions to review translations
Date: Tue,  3 Mar 2026 23:33:32 +0800	[thread overview]
Message-ID: <d7a7a07acdcf15520019fc58be5e6a1a1e24791a.1772551123.git.worldhello.net@gmail.com> (raw)
In-Reply-To: <cover.1772551123.git.worldhello.net@gmail.com>

Add a new "Reviewing po/XX.po" section to po/AGENTS.md that provides
comprehensive guidance for AI agents to review translation files.

Translation diffs lose context, especially for multi-line msgid and
msgstr entries. Some LLMs ignore context and cannot evaluate
translations accurately; others rely on scripts to search for context
in source files, making the review process time-consuming. To address
this, git-po-helper implements a compare subcommand that extracts new
or modified translations with full context (complete msgid/msgstr
pairs), significantly improving review efficiency.

A limitation is that extracted content lacks other already translated
content for reference, which may affect terminology consistency. This
is mitigated by including a glossary in the PO file header.
git-po-helper-generated review files include the header entry and
glossary (if present) by default.

The review workflow leverages git-po-helper subcommands:

- git-po-helper compare: Extract new or changed entries between two PO
  file versions into a valid PO file for review. Supports multiple modes:

  * Compare HEAD with working tree (local changes)
  * Compare parent of commit with the commit (--commit)
  * Compare commit with working tree (--since)
  * Compare two arbitrary revisions (-r)

- git-po-helper msg-select: Split large review files into smaller
  batches by entry index range for manageable review sessions. Supports
  range formats like "-50" (first 50), "51-100", "101-" (to end).

Evaluation test using qwen model:

    git-po-helper agent-run review --commit 2000abefba --agent qwen

Benchmark results:

    | Metric           | Value                            |
    |------------------|----------------------------------|
    | Num turns        | 22                               |
    | Input tokens     | 537263                           |
    | Output tokens    | 4397                             |
    | API duration     | 167.84 s                         |
    | Review score     | 96/100                           |
    | Total entries    | 63                               |
    | With issues      | 4 (1 critical, 2 major, 1 minor) |

Signed-off-by: Jiang Xin <worldhello.net@gmail•com>
---
 po/AGENTS.md | 194 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 193 insertions(+), 1 deletion(-)

diff --git a/po/AGENTS.md b/po/AGENTS.md
index 3bb8fb3858..08be73ada5 100644
--- a/po/AGENTS.md
+++ b/po/AGENTS.md
@@ -10,6 +10,7 @@ most commonly used housekeeping tasks:
 1. Generating or updating po/git.pot
 2. Updating po/XX.po
 3. Translating po/XX.po
+4. Reviewing translation quality
 
 
 ## Background knowledge for localization workflows
@@ -729,6 +730,191 @@ and fuzzy entry; do not stop before the loop completes.
    ```
 
 
+### Task 4: Review translation quality
+
+Review may target the full `po/XX.po`, a specific commit, or changes since a
+commit. When asked to review, follow the steps below. **Note**: This task uses
+`git-po-helper compare`; if `git-po-helper` is not available, the task
+cannot be performed.
+
+1. **Check for existing review**: Evaluate the following in order:
+
+   - If `po/review-input.po` does **not** exist, proceed to step 2 regardless
+     of any other files (e.g., batch or JSON files).
+   - If both `po/review-input.po` and `po/review-result.json` exist, go
+     directly to step 5 (Merge and summary) and display the report.
+     Do **not** check for batch or other temporary files; no further review
+     steps are needed.
+   - If `po/review-input.po` exists but `po/review-result.json` does not,
+     go to step 4 (Process one batch) to continue the previous review.
+
+2. **Extract entries**: Run `git-po-helper compare` with the desired range and
+   redirect the output to `po/review-input.po`. Do not use `git show` or
+   `git diff`—they can fragment or lose PO context (see "Comparing PO files
+   for translation and review" under git-po-helper).
+
+3. **Prepare review batches**: Run the script below to clean up any leftover
+   files from previous reviews and split `po/review-input.po` into one or
+   more `po/review-input-<N>.json` files (dynamic batch sizing). Run as a
+   single script (define the function, then call it):
+
+   ```shell
+   review_split_batches () {
+       min_batch_size=${1:-50}
+       rm -f po/review-input-*.json
+       rm -f po/review-result-*.json
+       rm -f po/review-result.json
+       rm -f po/review-output.po
+
+       ENTRY_COUNT=$(grep -c '^msgid ' po/review-input.po 2>/dev/null || true)
+       ENTRY_COUNT=$((ENTRY_COUNT > 0 ? ENTRY_COUNT - 1 : 0))
+
+       if test "$ENTRY_COUNT" -gt $min_batch_size
+       then
+           if test "$ENTRY_COUNT" -gt $((min_batch_size * 8))
+           then
+               NUM=$((min_batch_size * 2))
+           elif test "$ENTRY_COUNT" -gt $((min_batch_size * 4))
+           then
+               NUM=$((min_batch_size + min_batch_size / 2))
+           else
+               NUM=$min_batch_size
+           fi
+           BATCH_COUNT=$(( (ENTRY_COUNT + NUM - 1) / NUM ))
+           for i in $(seq 1 "$BATCH_COUNT")
+           do
+               START=$(((i - 1) * NUM + 1))
+               END=$((i * NUM))
+               if test "$END" -gt "$ENTRY_COUNT"
+               then
+                   END=$ENTRY_COUNT
+               fi
+               if test "$i" -eq 1
+               then
+                   git-po-helper msg-select --json --range "-$NUM" \
+                       -o "po/review-input-$i.json" po/review-input.po 
+               elif test "$END" -ge "$ENTRY_COUNT"
+               then
+                   git-po-helper msg-select --json --range "$START-" \
+                       -o "po/review-input-$i.json" po/review-input.po 
+               else
+                   git-po-helper msg-select --json --range "$START-$END" \
+                       -o "po/review-input-$i.json" po/review-input.po 
+               fi
+           done
+       else
+           git-po-helper msg-cat --json \
+               -o po/review-input-1.json po/review-input.po
+       fi
+   }
+   # Parameter controls batch size; reduce if the batch file is too large for
+   # the Agent to process.
+   review_split_batches 20
+   ```
+
+4. **Process one batch (repeat until none left)**:
+
+   a. If no `po/review-input-*.json` files exist, proceed to step 5.
+
+   b. Select the smallest remaining index N (e.g. `po/review-input-1.json`).
+      The current batch is `po/review-input-<N>.json`.
+
+   c. Review translation quality in the current batch: Read the current
+      batch file (`po/review-input-<N>.json`) and:
+      - Consult the "Background knowledge for localization workflows" section
+        for PO format, JSON format, placeholder rules, and terminology. If the
+        current batch file has a glossary in the `header_comment` field, add
+        it to your context for consistent terminology.
+      - Do not review or modify the header entry (in PO format: empty `msgid`
+        with metadata in `msgstr`; in JSON format: `header_comment` and
+        `header_meta`).
+      - For all other entries, check the quality of translations in `msgstr`
+        (singular form) and `msgstr_plural` (plural forms) against `msgid` and
+        `msgid_plural`. See the "Quality checklist" above for criteria.
+
+   d. After reviewing all entries in the current batch, write the issues you
+      found to `po/review-result-<N>.json` using the format described in the
+      "Review result JSON format" section below. If no issues found, write
+      `{"issues": []}` to `po/review-result-<N>.json`. Always write this file;
+      it marks the batch as complete.
+
+   e. Delete the current batch file (`po/review-input-<N>.json`).
+
+   f. Return to step 4a.
+
+   This loop is resumable: remaining `po/review-input-*.json` files indicate
+   batches still to process.
+
+5. **Merge and summary**: Run the command below to merge all
+   `po/review-result-*.json` files into `po/review-result.json`, apply the
+   result to `po/review-output.po`, and display the report.
+
+   ```shell
+   git-po-helper agent-run report
+   ```
+
+   **Do not delete** `po/review-result.json`, `po/review-output.po`, or
+   `po/review-input.po`.
+
+**Review result JSON format**:
+
+The **Review result JSON** format defines the structure for translation
+review reports. For each entry with translation issues, create an issue
+object as follows:
+
+- Copy the original entry's `msgid`, `msgstr`, `msgid_plural` and
+  `msgstr_plural` (if present) to the corresponding fields in the
+  result issue object.
+- Write a summary of all issues found for this entry in `description`.
+- Set `score` according to the severity of issues found for this entry,
+  from 0 to 3 (3 = perfect, no issues; 0 = critical, 1 = major, 2 = minor).
+- Place the suggested translation in `suggest_msgstr` (singular) or
+  `suggest_msgstr_plural` (plural).
+- Include only entries with issues (score less than 3). When no issues
+  are found in the batch, write `{"issues": []}`.
+
+Example review result (with issues):
+
+```json
+{
+  "issues": [
+    {
+      "msgid": "commit",
+      "msgid_plural": "",
+      "msgstr": "委托",
+      "msgstr_plural": [],
+      "suggest_msgstr": "提交",
+      "suggest_msgstr_plural": [],
+      "score": 0,
+      "description": "Terminology error: 'commit' should be translated as '提交'"
+    },
+    {
+      "msgid": "repository",
+      "msgid_plural": "repositories",
+      "msgstr": "",
+      "msgstr_plural": ["版本库", "版本库"],
+      "suggest_msgstr": "",
+      "suggest_msgstr_plural": ["仓库", "仓库"],
+      "score": 2,
+      "description": "Consistency issue: '版本库' and '仓库' are used interchangeably; suggest using '仓库' consistently"
+    }
+  ]
+}
+```
+
+Field descriptions for each issue object (element of the `issues` array):
+
+- `msgid` (and `msgid_plural` for plural entries): Original source text.
+- `msgstr` (and `msgstr_plural` for plural entries): Original translation.
+- `suggest_msgstr`: Suggested translation for the singular form.
+- `suggest_msgstr_plural`: Array of suggested translations for plural forms;
+  `suggest_msgstr` is empty for plural-only entries.
+- `score`: 0–3 (see scale below).
+- `description`: Brief summary of the issue.
+- Score scale: 0 = critical (must fix before release), 1 = major (should fix),
+  2 = minor (improve later), 3 = perfect.
+
+
 ## Human translators remain in control
 
 Git translation is human-driven; language team leaders and contributors are
@@ -741,7 +927,13 @@ responsible for:
 - Building and maintaining language glossaries
 - Reviewing and approving all changes before submission
 
-AI tools, if used, only accelerate routine tasks.
+AI tools, if used, only accelerate routine tasks:
+
+- First-draft translations for new or updated messages
+- Finding untranslated or fuzzy entries
+- Checking consistency with glossary and existing translations
+- Detecting technical errors (placeholders, formatting)
+- Reviewing against quality criteria
 
 AI-generated output should always be treated as rough drafts requiring human
 review, editing, and approval by someone who understands both the technical
-- 
2.53.0.rc2.20.g532543fa46


  parent reply	other threads:[~2026-03-03 15:34 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-04  9:31 [RFC] Introducing AI Agents to Git Localization Jiang Xin
2026-02-04 11:58 ` Peter Krefting
2026-02-04 13:00   ` Michal Suchánek
2026-02-04 14:38     ` 依云
2026-02-05  2:06     ` Jiang Xin
2026-02-05  8:30       ` Michal Suchánek
2026-02-05 11:16         ` Jiang Xin
2026-02-05 13:18           ` Michal Suchánek
2026-02-05  1:04   ` Jiang Xin
2026-02-05  1:53     ` brian m. carlson
     [not found] ` <0207CD38-C811-499D-AFA6-131B0CA825CD@gmail.com>
2026-02-05 12:54   ` Jiang Xin
2026-02-05 13:00 ` [RFC PATCH 1/2] l10n: add .gitattributes to simplify location filtering Jiang Xin
2026-02-05 20:07   ` Junio C Hamano
2026-02-05 13:00 ` [RFC PATCH 2/2] l10n: README: document AI assistant guidelines Jiang Xin
2026-02-05 20:35   ` Junio C Hamano
2026-02-06  2:38     ` Jiang Xin
2026-03-03 15:33       ` [PATCH v2 0/5] docs(l10n): AI agent instructions and workflow improvements Jiang Xin
2026-03-03 15:33         ` [PATCH v2 1/5] l10n: add .gitattributes to simplify location filtering Jiang Xin
2026-03-03 15:33         ` [PATCH v2 2/5] docs(l10n): add AGENTS.md with optimized update-pot instructions Jiang Xin
2026-03-12  2:11           ` Jiang Xin
2026-03-03 15:33         ` [PATCH v2 3/5] docs(l10n): add AI agent instructions for updating po/XX.po files Jiang Xin
2026-03-03 15:33         ` [PATCH v2 4/5] docs(l10n): add AI agent instructions for translating PO files Jiang Xin
2026-03-12  2:26           ` Jiang Xin
2026-03-03 15:33         ` Jiang Xin [this message]
2026-03-12  2:34           ` [PATCH v2 5/5] docs(l10n): add AI agent instructions to review translations Jiang Xin
2026-03-14 14:38       ` [PATCH v3 0/5] docs(l10n): AI agent instructions and workflow improvements Jiang Xin
2026-03-14 14:38         ` [PATCH v3 1/5] l10n: add .gitattributes to simplify location filtering Jiang Xin
2026-03-15 11:13           ` Johannes Sixt
2026-03-15 16:11             ` Junio C Hamano
2026-03-16  5:44               ` Jiang Xin
2026-03-16  3:21             ` Jiang Xin
2026-03-16 12:43               ` Johannes Sixt
2026-03-14 14:38         ` [PATCH v3 2/5] docs(l10n): add AGENTS.md with optimized update-pot instructions Jiang Xin
2026-03-14 14:38         ` [PATCH v3 3/5] docs(l10n): add AI agent instructions for updating po/XX.po files Jiang Xin
2026-03-14 14:38         ` [PATCH v3 4/5] docs(l10n): add AI agent instructions for translating PO files Jiang Xin
2026-03-14 14:38         ` [PATCH v3 5/5] docs(l10n): add AI agent instructions to review translations Jiang Xin
2026-03-16 23:54       ` [PATCH v4 0/5] docs(l10n): AI agent instructions and workflow improvements Jiang Xin
2026-03-16 23:54         ` [PATCH v4 1/5] l10n: add .gitattributes to simplify location filtering Jiang Xin
2026-03-16 23:54         ` [PATCH v4 2/5] docs(l10n): add AGENTS.md with optimized update-pot instructions Jiang Xin
2026-03-16 23:54         ` [PATCH v4 3/5] docs(l10n): add AI agent instructions for updating po/XX.po files Jiang Xin
2026-03-16 23:54         ` [PATCH v4 4/5] docs(l10n): add AI agent instructions for translating PO files Jiang Xin
2026-03-16 23:54         ` [PATCH v4 5/5] docs(l10n): add AI agent instructions to review translations Jiang Xin
2026-03-31  0:52         ` [PATCH v4 0/5] docs(l10n): AI agent instructions and workflow improvements Jiang Xin
2026-03-31  3:38           ` Junio C Hamano
2026-03-31  4:37             ` Jiang Xin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7a7a07acdcf15520019fc58be5e6a1a1e24791a.1772551123.git.worldhello.net@gmail.com \
    --to=worldhello.net@gmail$(echo .)com \
    --cc=DJm00n@mail$(echo .)ru \
    --cc=ark@cho$(echo .)red \
    --cc=ash@kambanaria$(echo .)org \
    --cc=bagasdotme@gmail$(echo .)com \
    --cc=bitigchi@me$(echo .)com \
    --cc=dyroneteng@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=jn.avila@free$(echo .)fr \
    --cc=mikel.forcada@gmail$(echo .)com \
    --cc=newcomerminecraft@gmail$(echo .)com \
    --cc=pan93412@gmail$(echo .)com \
    --cc=peter@softwolves$(echo .)pp.se \
    --cc=ralf.thielow@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox