zohar-translator

A long-form LLM translation system: tens of thousands of paragraphs, subscription-limit bypass, automatic publishing. Deployment runs through RUN_ME.md, which is read by the operator's LLM agent.

What it is and why it exists

Translating the 1700-page commentary «Perush ha-Sulam» on the Book of Zohar from Hebrew and Aramaic into Russian, constrained only by one's own Claude subscription (using the Opus 4.7 model) — turned out to be possible. zohar-translator is the core of that very system, which performs such a translation and now works with any long corpus and any language pair.

The input is a text catalog (chapters → articles → paragraphs) and a Claude subscription. The output is a static site with the translated corpus, continuous paragraph numbering, chapters, and footnotes. Between them sits the orchestrator: it slices articles into chunks by character budget, runs translator agents in parallel, bypasses the 5-hour and weekly subscription windows, and commits the result to GitHub Pages as each chapter closes.

A working reference is our translation of the Book of Zohar: imyavel.github.io/zohar-sulam (CC BY 4.0 license, stated on the site). Under the hood is exactly the bundle you can deploy yourself.

To deploy the system for your own corpus, the operator installs Claude Code and tells it «read RUN_ME.md and walk me through it step by step». From there the LLM agent guides them through 8 adaptation stages; no technical background is required.

Deployment stages

Each stage lives in its own file stages/NN_*.md. The operator's LLM agent loads them one by one, asks the operator questions of the form «(Q N of NN: …)», and records the answers in progress.json — so a session can be interrupted at any point and resumed in a new one from the same place.

  1. Environment Installing Python dependencies and verifying that the stock GUI launches. Without this, the following stages are meaningless. stages/01_setup.md →
  2. Source loader Where the corpus is loaded from. For texts on Sefaria — a ready fast-path through reference/source_loader/download_sefaria.py. For your own source, the operator writes a loader in the same format (JSON paragraphs grouped by chapter). stages/02_source_loader.md →
  3. Text structure Chunking units: what counts as an «article» (the unit of translation) and how paragraphs are sliced into chunks by character budget. For Zohar: chapters → articles → Sulam paragraphs; for another corpus, an analogous three-level hierarchy. stages/03_text_structure.md →
  4. Glossary Glossary of terms. You can take our Zohar glossary as a starting point (for translating Zohar itself), or take only the file structure and methodology (the translator agent works with the glossary through a CLI tool, rather than being loaded with the entire content). stages/04_glossary.md →
  5. Prompt template Translation style (literal / literary / mixed), formatting rules, how to mark «creative» passages with translator footnotes. The template in templates/translation_prompt.md is adapted by the LLM agent according to the operator's choices. stages/05_prompt.md →
  6. Publish target Where the result is published: GitHub Pages via our template (auto-deploy through src/gh_deploy.py), your own channel (S3 / GitLab / your own server), or local-only without publishing. stages/06_publish.md →
  7. Smoke run A short end-to-end run on a synthetic mini-corpus: verifies that the whole pipeline (chunking → translator → resume → commit) works on the adapted system in minutes, without burning real subscription on the full corpus. stages/07_smoke.md →
  8. Hand-off The operator launches the GUI on the full corpus and monitors it through the Telegram bot. From this point the deployment LLM agent steps away; the system runs on its own. stages/08_handoff.md →

Translator architecture (GUI + Telegram)

A detailed description is in ARCHITECTURE.md (9 sections: orchestrator FSM, parallelism, limit bypass, chunking+resume, gh_deploy, extension points, recovery scripts). The essentials are here.

  • GUI (src/gui.pyw) — the main window with the batch queue, article statuses, chunking budget, and start/stop controls. This is the operator's entry point.
  • Telegram bot (src/bot.py) — notifications about chapter completion, hit-limit (5h), weekly-limit, and errors. Commands for resume and status checks. Optional (launch with --no-bot).
  • Orchestrator (src/orchestrator.py) — an FSM with states PREPARING → RUNNING → COMPLETED / HIT_LIMIT / WEEKLY_LIMIT / FAILED. Handles retries, restores state after crashes, manages parallelism of translator agents.
  • Chunking — paragraphs are grouped into chunks by character budget of the source text (~7500 by default). A paragraph is never cut in the middle; a large paragraph becomes its own chunk in full.
  • Resume — if a translator dies in the middle of an article (hit-limit, network, OOM), the next run reads what was already translated, finds the last fully written paragraph, and continues from the next one. No duplicates are written, numbering stays continuous.
  • Limit bypass — on a 5-hour subscription window, the orchestrator puts the batch into WAITING, sleeps until the window resets, and continues. On the weekly limit — pauses until reset with a TG notification. No manual operator work between windows is required.
  • gh_deploy (src/gh_deploy.py) — after each closed chapter, performs a commit + push to main; GitHub Pages picks it up and updates the public site. Finished chapters appear on the site as the translation progresses; no waiting for the full corpus.

Feedback

Since this deployment mechanism has not yet been broken in on other machines and other hands, I will be grateful to the first volunteers who decide to use it and go through the installation and adaptation process to their own corpus on their own — for feedback on any rough edges, omissions, or outright errors in the instructions — write to me at imyavel@gmail.com.

Sources, RUN_ME, and issues are on github.com/imyavel/zohar-translator. License: MIT for code and documentation; the Zohar reference translation is CC BY 4.0.