Intermediate8 min read

Idea → Talking Head

Type a topic — get a video of you talking about it in your voice.

What you’ll build

Type a topic — get a video of you talking about it in your voice. Every step runs locally on your own GPU. No subscription. No per-render meter. The chain is yours, the file is yours, the render lives on your drive forever.

The chain

4 pipeline nodes, in order:

  1. 1
    Flux + Hybrig LoRALocal~30s – 1 min

    Renders a still photo of you using your trained LoRA.

  2. 2
    Wan 2.2 I2VLocal~3 min – 4 min

    Animates a still photo into a short video clip.

  3. 3
    F5-TTS (chunked)Local~8s – 20s

    Reads your script in your cloned voice.

  4. 4
    LatentSyncLocal~20s – 45s

    Syncs the on-screen mouth to the spoken audio.

Prerequisites

  • An NVIDIA RTX GPU with 12 GB+ VRAM. 24 GB recommended for the full pipeline. Check system requirements →
  • Hybrig desktop installed and your worker connected. Get the desktop app →
  • A trained character LoRA. Train one from 4–10 reference photos at /characters.
  • A 10–30 second clean voice sample (your voice, recorded quietly). F5-TTS clones from the reference clip locally — audio never leaves your machine.

Try it

The fastest way to learn this chain is to drop it onto the Studio canvas and run it on your own rig. The whole graph is pre-wired.

Watch

Walkthrough video coming

Video walkthrough coming. For now, run the workflow and watch the pipeline timeline — every stage exposes its intermediate artifact and a plain-language description of what that node did.