Ten languages, one answer

Board Game Librarian answers rules questions in the language you ask them in. No language selector, no profile setting, no dropdown. Type in Polish, get Polish back. Type in Japanese, Japanese comes out.

Under the hood, this involves three distinct problems: detecting what language you're using, finding the right answer (almost always from English rulebooks), and writing that answer fluently in your language. Getting any one of them wrong breaks the whole thing.

How language detection works

Every question passes through language detection before anything else happens. The orchestrator runs it before any game lookup, before any database query, before any AI call. Language first. Everything else second.

Detection runs via lingua-js, a Node.js library that uses statistical models trained on actual text corpora. It doesn't just check character sets — it analyses grammatical patterns, common word sequences, and phonotactic signatures to assign a confidence score across all 10 supported languages: English, Italian, German, French, Spanish, Portuguese, Russian, Japanese, Polish, and Chinese (simplified).

Once the language is detected, it travels through the entire pipeline as a tagged property on the request. Every prompt template receives it. The synthesis instruction at the end always includes an explicit directive: write the response in {language}. Not implied. Not hoped for. Required.

Why structural words, not content words

Board game names are language-neutral. "Root" is "Root" in Italian. "Wingspan" is "Wingspan" in French. Players use the original title regardless of the language they're asking in. If detection tried to work from the nouns and proper nouns in a question, it'd constantly misfire on game-specific vocabulary.

Structural words are different. Articles, pronouns, prepositions, and conjunctions are language-specific in a way that content words aren't. "Come funziona il combattimento?" has "come" and "il" — Italian structure words. "Wie funktioniert der Kampf?" has "wie" and "der" — German. Both questions ask the same thing about combat mechanics, but the structure words nail the language unambiguously.

The implication: lingua-js isn't looking at what you're asking about. It's looking at how you're asking it. That distinction matters when your corpus is full of English game terms regardless of the question language.

Short questions are the hard case. A two-word query like "fog combat?" gives the detector very little to work with. In practice, the system falls back to the user's saved language preference when confidence is low. Longer, more grammatically complete questions get more reliable detection.

The cross-lingual embedding trick

The rulebooks are almost entirely in English. German players ask in German, the answer gets retrieved from an English PDF, and the response comes back in German. This works because of how the embedding model operates.

The model in use is jina-v2-small-en, a 768-dimensional sentence transformer trained on multilingual data despite its name. A French question about resource production and an English rulebook paragraph about resource production will produce similar vectors. Not identical — but close enough that vector search retrieves the right content.

The model has learned that concepts map to similar regions of the vector space regardless of the surface language. "Wie viele Ressourcen produziert ein Wald?" and "How many resources does a forest produce?" point at roughly the same neighbourhood in embedding space.

The quality of this transfer depends on how much multilingual data the model saw during training. English, German, French, Spanish, and Italian are well-represented. Russian, Polish, Japanese, and Chinese work, but the vector alignment is somewhat looser. Retrieval is still reliable — just not quite as tight.

One thing that doesn't change: the retrieved content is always in English (from English rulebooks), and the answer-writing step is always in the detected question language. The cross-lingual bridge handles retrieval. Explicit language instruction in the prompt handles output.

Priority: question language always wins

Three inputs can influence the output language:

The language of the current question
The user's saved language preference
The widget's configured locale (if accessed through a partner widget)

The question language is always primary. If you've saved Italian as your preference but ask a question in German, you get a German answer. The system trusts what you're doing right now over what you usually do.

Widget locale comes last. A publisher might configure their widget for an Italian-speaking audience, but if a user types in English, they still get English. The widget's locale setting acts as a fallback, not an override.

Why this ordering? Language is a real-time signal, not a stored attribute. People switch languages. Non-native speakers sometimes ask in English for precision even if they'd normally use their mother tongue. A system that ignores what you actually typed in favour of a stored preference would be actively annoying. The saved preference only activates when the current question doesn't give enough signal — typically very short queries where detection confidence falls below the threshold.

Which languages work best

Not all 10 languages are equal.

English, Italian, German, French, and Spanish produce the most fluent, natural-sounding answers. This reflects the AI model's training data distribution — these languages have the most text in large language model training corpora, which means more examples of grammatical structures, idiomatic expressions, and domain-specific vocabulary.

Portuguese is close behind. Slightly less natural in very technical rule explanations, but solid for most questions.

Russian works well in terms of accuracy but occasionally shows minor stylistic awkwardness. The Cyrillic script is handled correctly throughout the pipeline and in the frontend rendering.

Polish, Japanese, and Chinese are genuinely supported — the answers are correct and the content is accurately conveyed — but grammatical precision can slip on complex nested rule explanations. The facts will be right. The phrasing might not be perfect.

This isn't a flaw in detection or retrieval. Supporting a language and producing native-quality prose in that language are different bars, and current AI models don't clear both bars equally for all 10 languages.

A walkthrough

Say a Spanish-speaking player asks: "¿Puedo jugar una acción cuando no es mi turno?"

The question hits the orchestrator. Before anything else, lingua-js analyses the structural words — "¿Puedo", "no es", "mi" are unambiguous Spanish markers. Language set to es.

The question gets embedded by jina-v2-small-en. The resulting 768-dimensional vector captures the meaning — a question about out-of-turn actions.

Vector search runs against the game's PDF chunks. English chunks about turn order, reactions, and interrupt actions are retrieved based on semantic similarity. The model's multilingual training means the Spanish question vector is close enough to those English chunks.

The top relevant chunks go into the synthesis prompt. The prompt ends with: "Write your response in Spanish."

The AI writes the answer in Spanish, citing the relevant rulebook pages. Total detection overhead: milliseconds.

Known limitations

Very short questions are sometimes misdetected. A single-word query like "simultaneity?" gives the detector almost nothing to work with.

Code-switching — mixing languages in a single question — can confuse detection. A player who writes "In Root, wann darf ich Dominance Cards spielen?" (mixing English game terms with German structure) will probably get tagged as German, but it's not guaranteed.

Less-resourced languages produce answers that are correct but occasionally grammatically imprecise. This is documented honestly in the section above. It's a limitation of current language model capabilities, not a bug.

Traditional Chinese isn't supported. The system uses simplified Chinese (zh-Hans). Players writing in traditional characters may get responses in simplified form.

Rulebook accuracy is language-independent. If the English rulebook is ambiguous, the answer in any language will reflect that ambiguity. Translation doesn't fix unclear source material.

Questions

Can I override the detected language?

No. The system always uses the language of your current question. If you want a different language, ask your question in that language.

Does it work if I mix English and another language?

Usually. The structural words in the non-English part of a mixed question typically dominate detection. But it's not guaranteed for very mixed or very short queries.

Why doesn't it detect from my browser's language settings?

Browser locale is a poor signal. Many people use browsers set to one language while doing other things in another. The question itself is always the more reliable indicator.

Will more languages be added?

The system can technically support any language the underlying AI models cover. New languages require validation of detection accuracy and output fluency before being added.

Does it work for non-Latin scripts?

Yes. Cyrillic (Russian), hiragana/katakana/kanji (Japanese), and Chinese characters are all handled correctly in detection, storage, retrieval, and display.