Cohort Ad Pipeline (N prospects, 1 click)
Upload a prospect CSV + Deep Intel + voice ref. Get one personalized ad per row, voice-cloned and leak-stripped. The pipeline behind Roof Radar's Peak Custom render.
What you’ll build
Upload a prospect CSV + Deep Intel + voice ref. Get one personalized ad per row, voice-cloned and leak-stripped. The pipeline behind Roof Radar's Peak Custom render. Every step runs locally on your own GPU. No subscription. No per-render meter. The chain is yours, the file is yours, the render lives on your drive forever.
The chain
3 pipeline nodes, in order:
- 1Deep Intel lookupLocal~~
Match each CSV row to a Deep Intel record by company name (case-insensitive). Adds painPoint, goldDetail, paidLeadsOn, etc. — the per-prospect ammo that personalizes the script. Misses pass through with empty intel fields.
- 2F5-TTS chunked + leak stripLocal~~
Per-line F5-TTS voice clone with chunked generation and Whisper-based leak stripping. WHAT: Generates one WAV per script line, transcribes each chunk with faster-whisper, and trims any phantom prefix the model hallucinates from the reference clip (e.g. 'AI pipeline' bleeding from the ref transcript). Concatenates clean chunks with 250ms pads. WHY: F5-TTS bleeds reference audio into chunk starts ~80% of the time. Whisper detection + trim catches it deterministically. Probes the final WAV duration so downstream nodes can size visuals to fit. HOW: voice_ref + voice_ref_text describe the speaker; lines is an array of script lines (one per chunk). Output audio is concatenated, output duration is exact seconds. WHERE: Use anywhere you need multi-line voice clone (cohort ads, narration, long-form). The single-line F5-TTS node skips the leak strip — use this one when the reference text contains industry/brand words.
- 3VO-driven compositionLocal~~
Remotion composition with VO-driven beat sizing. WHAT: Renders a beat-by-beat slideshow where the final beat's duration tracks the actual VO length. No silent tails, no audio cutoffs. WHY: James's doctrine 2026-05-13: 'Every scene needs a reason to exist; if there's no audio paired with it, the shot doesn't need to keep going.' Hardcoded durations break this when VO length varies per prospect. VO-driven sizing makes it structural. HOW: Probes the voiceover WAV duration, expands or shrinks the elastic beat (Beat 20 yard-sign in SKB) so total visual length = VO length + 0.5s breathing room. Clamped [3, 30] for safety. WHERE: Anywhere a slideshow ad has variable VO length across prospects.
Prerequisites
- An NVIDIA RTX GPU with 12 GB+ VRAM. 24 GB recommended for the full pipeline. Check system requirements →
- Hybrig desktop installed and your worker connected. Get the desktop app →
Try it
The fastest way to learn this chain is to drop it onto the Studio canvas and run it on your own rig. The whole graph is pre-wired.
Watch
Walkthrough video coming
Video walkthrough coming. For now, run the workflow and watch the pipeline timeline — every stage exposes its intermediate artifact and a plain-language description of what that node did.