What each scene entry contains
visual_prompt — describes the frame for an image model. Includes composition, lighting, subject framing, and visual style in generative terms. Drop into Midjourney, DALL·E, Adobe Firefly, or any image model.
animation_prompt — extends the visual prompt with camera direction and motion type for animation tools. Designed for Kling I2V or SeedAnce.
style_anchor — a short fingerprint of the visual language. Keep this consistent across scenes to maintain cohesion in your remade video.
Where to use them
| Tool | What to paste | Notes |
|---|---|---|
| Midjourney | visual_prompt | Add --ar 16:9 --style raw for video-oriented output |
| DALL·E / ChatGPT | visual_prompt | Works well for realistic styles |
| Adobe Firefly | visual_prompt | Good for brand-safe stock-style outputs |
| Kling I2V | Image from above + animation_prompt | 5–10s clips; use V3 for realistic, Omni for stylised |
| SeedAnce | Image from above + animation_prompt | Better for longer motion arcs |
| A production pipeline | The full recipe | Hand it off to handle image, animation, and assembly automatically |
Production workflow
Option A — Manual (scene by scene)
- Pick the 3–5 scenes that carry the most visual weight (hook, proof moment, CTA)
- Generate a keyframe image for each using the
visual_prompt - Animate each image using the
animation_promptin Kling or SeedAnce - Assemble the clips with your VO in your editor
Option B — Hand it to a production pipeline
Export the full recipe and pass it to a connected production pipeline. It receives the entire analysis and runs image generation, animation, and assembly automatically — no scene-by-scene work on your side.Tips
You don’t need all the scenes. Most videos have 15–25 scenes. Pick the ones that carry visual weight — hook scene, the proof moment, and the CTA close. 5–8 scenes is usually enough for a 30–60 second video. Keep the style anchor consistent. If the original wasUGC-realist, warm desaturated, use that fingerprint across all your generated scenes. Visual style drift across cuts reads as low-effort production.
Adapt, don’t copy. The prompts describe the type of shot, not the exact content. “Close-up of hands on a keyboard” is a shot type — replace the laptop with whatever your product interface looks like.
For B2B ads: The prompts often lean into tight talking-head + product-screen combinations. Those are the two workhorses of B2B video. If the recreation prompts confirm that pattern, you have your shot list.