Beginner3 min read

Cohort Ad Pipeline (N prospects, 1 click)

Upload a prospect CSV + Deep Intel + voice ref. Get one personalized ad per row, voice-cloned and leak-stripped. The pipeline behind Roof Radar's Peak Custom render.

What you’ll build

Upload a prospect CSV + Deep Intel + voice ref. Get one personalized ad per row, voice-cloned and leak-stripped. The pipeline behind Roof Radar's Peak Custom render. Every step runs locally on your own GPU. No subscription. No per-render meter. The chain is yours, the file is yours, the render lives on your drive forever.

The chain

3 pipeline nodes, in order:

  1. 1
    Deep Intel lookupLocal~~

    Match each CSV row to a Deep Intel record by company name (case-insensitive). Adds painPoint, goldDetail, paidLeadsOn, etc. — the per-prospect ammo that personalizes the script. Misses pass through with empty intel fields.

  2. 2
    F5-TTS chunked + leak stripLocal~~

    Per-line F5-TTS voice clone with chunked generation and Whisper-based leak stripping. WHAT: Generates one WAV per script line, transcribes each chunk with faster-whisper, and trims any phantom prefix the model hallucinates from the reference clip (e.g. 'AI pipeline' bleeding from the ref transcript). Concatenates clean chunks with 250ms pads. WHY: F5-TTS bleeds reference audio into chunk starts ~80% of the time. Whisper detection + trim catches it deterministically. Probes the final WAV duration so downstream nodes can size visuals to fit. HOW: voice_ref + voice_ref_text describe the speaker; lines is an array of script lines (one per chunk). Output audio is concatenated, output duration is exact seconds. WHERE: Use anywhere you need multi-line voice clone (cohort ads, narration, long-form). The single-line F5-TTS node skips the leak strip — use this one when the reference text contains industry/brand words.

  3. 3
    VO-driven compositionLocal~~

    Remotion composition with VO-driven beat sizing. WHAT: Renders a beat-by-beat slideshow where the final beat's duration tracks the actual VO length. No silent tails, no audio cutoffs. WHY: James's doctrine 2026-05-13: 'Every scene needs a reason to exist; if there's no audio paired with it, the shot doesn't need to keep going.' Hardcoded durations break this when VO length varies per prospect. VO-driven sizing makes it structural. HOW: Probes the voiceover WAV duration, expands or shrinks the elastic beat (Beat 20 yard-sign in SKB) so total visual length = VO length + 0.5s breathing room. Clamped [3, 30] for safety. WHERE: Anywhere a slideshow ad has variable VO length across prospects.

Prerequisites

Try it

The fastest way to learn this chain is to drop it onto the Studio canvas and run it on your own rig. The whole graph is pre-wired.

Watch

Walkthrough video coming

Video walkthrough coming. For now, run the workflow and watch the pipeline timeline — every stage exposes its intermediate artifact and a plain-language description of what that node did.