Reel.
Short-form video production is repetitive, expensive, and slow when done manually. Reel is a seven-stage automated pipeline that takes a topic brief and produces a fully assembled, published short-form video. Script generation, image synthesis, video rendering, voice synthesis, audio mixing, thumbnail creation, and YouTube upload all happen in sequence with two human review gates and zero manual production work.
The Problem
Content at volume
costs a fortune.
A single short-form video from a production agency costs between £150 and £400. A freelancer charges £50 to £100. Either way, producing daily content at any real volume is financially unsustainable for most operations.
The underlying work is not creative in the way that justifies that cost. Script structure follows patterns. Visual styles follow templates. Voice performance follows tone guides. Audio mixing follows formulas. Every step is describable, repeatable, and therefore automatable.
Reel was built on the premise that the production layer of content creation should cost almost nothing, leaving human creative energy for strategy, not execution.
How It Works
Seven stages.
Two human gates.
Claude generates a structured script in five segments: hook, setup, build, tension, and cliffhanger. Each segment has word count constraints, a visual bible defining the colour and camera arc, and image prompt instructions per scene. The script is 145 words maximum targeting a 55 to 60 second final video.
The script is sent to Telegram for human review before any API spend on visuals or voice. Approve or regenerate with one tap. This gate prevents wasted spend on bad scripts and keeps quality high without blocking automation.
Imagen generates 19 images across the five narrative segments using the visual bible from the script. Each image uses the segment-specific colour palette, camera distance, and lighting defined at script time. Hook images are wide and high contrast. Tension images are extreme close-up. Cliffhanger images are near-black.
Veo animates the hook image into a short video clip using a purpose-built motion prompt rather than the full script description. The pipeline extracts motion-relevant verbs and nouns from the scene description and formats them into a 30-word Veo-executable brief. Still images for remaining segments get Ken Burns motion treatment scaled by narrative intensity.
ElevenLabs generates the voiceover from the approved script. Voice casting is pillar-aware: cosmos topics use different voice profiles to body or modern life topics. Speed is set at 0.85x to 0.9x for clarity and pacing.
A four-layer audio mix is assembled programmatically. Music volume follows a segment envelope (silence on hook, creeping through build, near-silent on cliffhanger). Hard silence punches are inserted at the build-to-tension and tension-to-cliffhanger transitions. Voice is ducked slightly on the cliffhanger to create the lean-in effect.
The assembled video is sent to Telegram for final review before upload. A second human gate catches any assembly errors or visual failures before they reach the channel. Archive on skip preserves the run record regardless of outcome.
Approved videos upload automatically via the YouTube API with metadata, tags, and scheduling. Run IDs are linked to upload records. Analytics are pulled back and logged against each run for performance analysis.
The Prompt Bridge Problem
The most significant engineering challenge in Reel was a disconnect between the script model and the video generation model. Claude writes cinematic scene descriptions of 150 words or more. Veo, the video model, responds to 30-word motion verb prompts and ignores narrative description.
The solution was a purpose-built prompt bridge: a template library of proven Veo motion formulas organised by domain, scene type, and camera direction. The bridge extracts key nouns from the Claude description and fills the template slots. Claude writes for humans. The bridge translates for Veo. The two models never need to understand each other directly.
Domain Collision Fix
When a cosmic character's story involves biological vocabulary (skin cells, melanin, cellular landscapes), the image model routes to medical microscopy rather than alien terrain. The anti-collision rules layer detects these domain crossings and reframes the visual prompt with explicit geological terrain framing before the image is generated.
Technology
Five models.
One coherent pipeline.
Reel orchestrates five different AI models in a single production pipeline. Each model handles the task it is actually best at. Claude handles language and structure. Imagen handles photorealistic image generation. Veo handles video motion. ElevenLabs handles voice performance. FFmpeg handles assembly and audio engineering.
The pipeline architecture separates concerns completely. Adding a new model or replacing an existing one does not require rewriting the adjacent stages. Each stage has a defined input and output schema.
Full Stack
Proof of Work
Published output.
Real channel.
Gate 1 and Gate 2 Telegram approval flows, assembled video examples, and cost-per-run breakdowns. Being added shortly.
Links to videos produced entirely by Reel without manual production work. Channel link being added shortly.
What We Learned
The failures that
taught the most.
Different models need different prompt formats. Writing prompts for Claude and then passing them to Veo does not work. Each model has its own vocabulary of instructions. The prompt bridge was the most valuable piece of engineering in the entire pipeline.
Video length determines performance more than content. Analysis of published videos showed that videos under 62 seconds consistently outperformed videos over 90 seconds, regardless of topic quality. The word count cap at 145 words came directly from this data.
Posting 3 to 4 videos per day cannibalises your own distribution. YouTube gives each Short an initial algorithmic test audience. Flooding the channel dilutes that test across too many videos. One video per day at the right time outperforms four videos posted overnight.
Human gates are not a failure of automation, they are a design choice. The two approval gates add 5 minutes to the total production time and prevent every class of catastrophic quality failure. They exist by design, not because the automation is incomplete.