AI-Powered D&D Campaign Manager & Session Archival Pipeline
Spitwater Campaign — 4 Sessions ArchivedGygaxBot turns raw Discord D&D sessions into illustrated, narrated adventure archives — from voice capture to searchable archive, fully automated.
Native Discord voice recording captures per-speaker WAV files from the session
Discord.jsSpeaker-tagged audio is transcribed into a full session transcript with timestamps
faster-whisperLocal LLM analyzes the transcript and extracts 4-6 key dramatic scenes with visual descriptions
Nemotron / OllamaEach scene gets an AI illustration using character reference images for visual consistency
ComfyUI / SDXL LightningDramatic narration generated with campaign-specific voice profiles across 10 available voices
Kokoro TTSSessions are indexed into a vector database for semantic search across campaign history
ChromaDBCharacter reference images are fed to the AI image generator to maintain visual consistency across all scene illustrations.
The party explores an ancient airship wreck, battles turquoise raptors, discovers a targeting system, and makes a very democratic decision involving a very large cannon. Each scene below was automatically extracted, illustrated, and narrated by the pipeline.
As the party explores the airship wreck, a sudden burst of light illuminates the observation dome. Three raptors emerge from the shadows, their turquoise scales glistening in the fading light. Threx stands firm against the onslaught, but the team's coordination is put to the test as they work together to take down the attackers.
Airship Observation Dome
As they venture deeper into the wreck, the party stumbles upon a mysterious targeting system. The airship's advanced technology lies dormant, its blue energy sphere pulsating with an otherworldly glow. Crazy88's skilled hands work to repair the malfunctioning circuit board, and the team holds their breath as he rolls a 23.
Airship Observation Dome
The party makes a democratic decision to fire the cannon at Gus's oasis, motivated by his previous extortion attempts. Chad enters the coordinates with precision, and the cannon roars to life as it unleashes a devastating blast that obliterates the oasis. The recoil is so immense that the cannon destroys itself.
Airship Observation Deck
As the night watch begins, Threx takes his turn to gaze out into the darkness. Suddenly, he notices a series of dust silhouettes pointing towards the deeper wreck entrance. A figure emerges from the shadows, gesturing wildly before vanishing into thin air. Was it real or just a mirage?
Airship Wreck
As Chad takes his turn on night watch, he's tasked with keeping an eye out for any potential threats. His attention is drawn to a rust rat scurrying across the hangar floor, caught in the faint light of his lantern. The party breathes a collective sigh of relief as they realize there are no immediate dangers lurking in the shadows.
Airship Hangar
Crazy88 takes his turn on night watch, but it's a quiet and uneventful shift. He stands vigilant, scanning the shadows for any signs of danger, but the only sound is the gentle creaking of metal as the airship adjusts to its new surroundings.
Airship Wreck
The full technology stack powering automated D&D session archival.
GygaxBot joins the Discord voice channel and records each speaker to separate WAV files. Per-speaker audio isolation ensures clean transcription even with overlapping voices, crosstalk, and background noise. Recording starts and stops with simple bot commands, capturing the entire session as raw audio.
Per-speaker audio files are processed through faster-whisper, a CTranslate2-optimized Whisper implementation that runs locally. Speaker tags are preserved throughout, producing a full session transcript with timestamps and speaker attribution. The result is a structured document ready for scene extraction.
The transcribed session is sent to a locally-hosted LLM via Ollama (currently Nemotron). A carefully crafted prompt instructs the model to identify 4-6 key dramatic moments, prioritizing combat, dramatic events, exploration, and atmospheric scenes. The model outputs structured JSON with scene titles, visual descriptions for image generation, narration scripts for TTS, character lists, and locations.
Each scene's visual description is enhanced with campaign-specific style tags (western frontier aesthetic, sepia tones, warm sunset lighting) and sent to ComfyUI running locally on GPU with SDXL Lightning. The system automatically finds character reference images on disk and includes them for visual consistency — Chad always looks like a centaur, Threx always looks like a red dragonborn. Alternative backends including Gemini image generation are available as fallback.
Kokoro TTS provides high-quality dramatic narration with 10 available voice profiles. Each campaign can be assigned a specific narrator voice for consistency across sessions. The TTS service runs locally on CPU, outputs 24kHz WAV files, and integrates directly with the pipeline for fully automated narration generation.
Every session is chunked and indexed into a ChromaDB vector store using sentence transformer embeddings. This enables semantic search across all campaign history — ask "when did the party find the cannon?" and get relevant transcript sections, scene descriptions, and narration from any session. The RAG system powers campaign context recall and cross-session narrative continuity.
The entire pipeline is orchestrated by a FastAPI server. When GygaxBot triggers archival, the server processes everything asynchronously: transcription, scene extraction, TTS narration (parallel), and image generation (GPU, sequential). Results are posted back to Discord and published to the web archive. Campaign management, party rosters, dice rolling, and session history are all handled through the bot.