How this started

The problem I was tired of

I play a lot of board games. Complex ones — the kind where mid-game rule disputes stop the table for 10 minutes while someone excavates a 60-page rulebook looking for the one sentence that covers this exact edge case.

I got tired of it.

Not the complexity — I love complex games. I got tired of the friction. The rulebook was right there, the answer was somewhere in it, and yet finding it took longer than just playing the disputed move and moving on.

The first version

The first version of Board Game Librarian was a Telegram bot. I built it in late 2025. It could answer questions about exactly one game — the one I was playing most at the time. The pipeline was simple: upload PDF, chunk it, embed it, query it. The bot lived in our gaming group Telegram channel.

It worked well enough that I kept improving it.

From one game to a library

The jump from "works for one game" to "works for any game" required solving the game detection problem. Not every question includes the game name. "Can I take back a move?" could be about any of a hundred games. I built the game context system to track which game the conversation is about, and the game discovery system to find new games in our library.

By the time I had 100 games in the library, the architecture had evolved significantly:

On-demand chunking — batch-processing thousands of PDFs nobody asks about is wasteful
Two-tier escalation — confidence-based escalation to forum knowledge handles 95% of edge cases
Multi-language — most of my users were not English speakers

What it is now

Board Game Librarian is now a multi-channel system: Telegram bot, web chat, and an embeddable widget for publisher websites. The library has 3,300+ rulebooks. Ten languages are supported.

The architecture is more complex than I originally intended — 20 background services, a custom RAG pipeline, vector search, BERTopic corpus analysis. But each piece was added because the simpler version had a real failure mode that needed addressing.

What I learned

The hardest problem was not AI or vector search. It was data quality. Getting clean text out of PDFs with two-column layouts, footnotes, and tables-within-tables is genuinely hard. Apache Tika handles it better than I expected, but it still produces errors on edge cases.

The second hardest problem was language. Not adding language support — that was straightforward. The hard part was ensuring that the system never confidently answers in the wrong language, never mixes languages mid-response, and never uses game-specific vocabulary to guess the language (because "Wingspan" appears in Italian sentences too).