Transcript-Based Assembly From A Text Script
This workflow creates a new sequence from the exact lines or passages you want in the final edit.
Instead of manually searching long source recordings for every sentence, you provide a target text file that lists the transcript passages you want to keep. The agent then matches those passages against source transcripts, locates the corresponding source time ranges, and assembles a new sequence from the matching clips.
This recipe uses Automation Agent for Adobe Premiere through the agent workflow via MCP. If you are here for the workflow solution first, that is fine. Use those pages for the product overview and setup path.
Video Walkthrough
When This Is Useful
Use this workflow when:
- you already know the lines, answers, or story beats you want in the edit
- you have a text document that describes the desired spoken result
- you want a transcript-driven assembly pass before manual fine cutting
- you need to pull moments from a long interview, presenter recording, podcast, or multiclip timeline
- you want likely alternate takes preserved for review instead of silently discarded
Typical examples:
- interview paper edits
- documentary selects based on an editor's text outline
- podcast excerpts assembled from written pull quotes
- presenter videos where a cleaned-up script should be rebuilt from raw takes
- rough cuts built from client-approved transcript passages
What You Need First
Before you run the prompt:
- Prepare a plain text file with the transcript passages you want in the final assembly.
- Attach that text file to your MCP client message.
- Choose the source scope: one selected source clip in the Project panel, or all relevant clips in the active sequence.
- Make sure Automation Agent is connected through your MCP workflow.
- Make sure the source clips have, or can get, usable transcripts.
Write the target text file in the order you want the final sequence to follow. Short paragraphs or one passage per line usually work better than a single large block of text.
Recommended Execution Permissions
This workflow creates a new sequence and places source ranges into it. prSequencePlaceItem uses explicit source ranges by temporarily setting and restoring source in/out on each source clip, so the run needs write access to the source clips it places as well as to the output bin or sequence. It should not need to modify unrelated existing edits.
For the safest setup:
- Create a new empty Project bin for the output, such as
Automation Agent Output. - Open the Execution Permissions tab.
- Enable Restrict write access in the Project section.
- Add that empty Project bin as a write-access exception.
- Add the selected source clip, or the narrow source bin that contains the clips you want the agent to place, as an additional write-access exception.
- In the prompt, tell the agent to create the new sequence inside the output bin and to use only the approved source clip or source bin.
With that setup, the agent can create the assembly sequence in the approved output bin and temporarily set and restore source in/out on the approved source clips. It cannot write elsewhere in the project. If the output bin or source clip is missing or not writable, the agent should stop instead of expanding the permission scope itself.
If you want the most conservative first pass, ask the agent to stop after the matching report and wait for approval before creating the sequence.
Copy And Paste Prompt
Paste this prompt into your MCP client and attach the target transcript text file in the same message:
Using Automation Agent in Adobe Premiere, create a transcript-based assembly sequence from the attached target transcript text file.
Source scope:
- Use the currently selected clip project item in the Project panel as the source.
- If no single source clip is selected, stop and ask me to select the source clip.
Goal:
Build a new non-destructive sequence containing the source ranges that best match the target transcript text, in the same order as the target text.
The target text may contain:
- clean exact transcript excerpts
- near matches with minor wording differences
- fuzzy matches that are similar but not identical
- intentionally missing or invented text that should not be forced into a source range
- repeated or ambiguous phrases that may correspond to multiple source locations
Output location:
- Create the new sequence inside the Project bin named `Automation Agent Output`.
- If that bin does not exist or is not writable, stop and ask me to create or approve the output bin before continuing.
Follow these steps carefully:
1. Read and segment the target transcript text.
* Treat the attached text file as the intended assembly script.
* Split it into ordered target units.
* Use paragraph breaks, line breaks, sentence boundaries, and speaker changes as initial hints only.
* Preserve the target order.
* Ignore blank lines and obvious section labels unless they help interpret the edit.
* Do not rely on the initial segmentation too rigidly.
* If a target unit is too long, too short, or only partially matchable, dynamically split it into smaller meaningful sub-units for matching.
* If consecutive small target units clearly belong to one continuous source range, you may match and place them as one combined range.
* Do not manufacture a sentence by combining unrelated fragments from different parts of the source.
* Do not require speaker labels in the target text. If speaker labels are present, use them only as optional matching hints when they clearly align with the source transcript.
2. Prepare the source transcript.
* Inspect the selected project item.
* Check whether it has a usable transcript.
* If the transcript is not ready, use the safest available workflow to make it available, then export and verify the transcript before continuing.
* Do not rely on `HAS_TRANSCRIPT` alone.
* Export the transcript data and verify that it contains real usable words and timings.
3. Match target units to source transcript ranges.
* For each target unit or sub-unit, find the best matching source transcript range.
* Prefer exact or near-exact wording matches.
* Base matching primarily on spoken text and timing, not on speaker labels.
* Use speaker labels only as secondary evidence when both the target text and source transcript contain reliable labels.
* Do not reject an otherwise strong text match just because speaker labels are missing or unavailable.
* Allow small differences such as punctuation changes, capitalization, filler-word removal, contractions, minor transcription errors, or obvious typos.
* Treat short prefix differences such as `And`, `Okay`, or similar interviewer lead-ins as minor differences when the core spoken phrase is the same.
* If the target text spans multiple adjacent transcript segments, combine adjacent source transcript segments into one source range.
* Keep a small natural handle before and after each selected range when the source timing allows it.
* Do not stretch a source range across unrelated speech just to force a match.
4. Classify match quality.
Use the following match types:
* `Exact`: the target wording is essentially present in the source transcript, allowing only punctuation, capitalization, or speaker label differences.
* `Near`: the target wording is very close, with only small harmless differences such as contractions, filler words, or obvious transcript errors.
* `Fuzzy`: the target wording is similar enough to be useful for review, but not close enough to treat as a clean match.
* `Ambiguous`: multiple source ranges plausibly match the same target unit.
* `Partial`: only part of the target unit can be matched reliably.
* `Missing`: no reliable source wording exists.
Important:
* Do not treat a passage as matched merely because it is topically related.
* If the concrete target wording was not actually said and only a related idea exists in the source, classify it as `Missing` or, at most, `Fuzzy` if it is useful for editorial review.
* Do not silently correct target text to make it easier to match.
* Do not invent source ranges for invented target text.
5. Handle ambiguity.
* If multiple source ranges plausibly match the same target unit, treat them as alternate takes.
* Create one take group for that target unit.
* Place the best or closest match on the lowest video/audio tracks, usually V1/A1.
* Place every plausible alternate source range in the sequence too.
* Stack each alternate above the primary on its own higher video/audio tracks, such as V2/A2, V3/A3, at the same timeline start time.
* Do not resolve ambiguity with marker comments or report notes alone; those notes document the take group but are not a substitute for physically placing the alternate takes.
* Do not discard an alternate merely because one take seems slightly cleaner.
* Add a review marker at the take group start explaining why the unit is ambiguous and listing all candidate source ranges.
* Include the primary range and all alternate ranges in the final report.
* This is especially important for short repeated interviewer questions or repeated phrases.
6. Handle imperfect and missing matches.
* If a target unit does not match exactly, choose the closest usable source range only when the wording or meaning is clearly close enough to be useful.
* If only part of a target unit can be matched reliably, place the reliable part only when it is editorially useful, classify it as `Partial`, and warn about the missing text.
* If the difference is more than a typo, filler-word difference, contraction difference, or harmless phrasing shift, add a review marker and include a warning in the final report.
* If no reliable source match exists, do not invent one.
* Leave a visible gap in the sequence for the missing target unit if practical, or skip it and report it clearly.
7. Build the assembly sequence.
* Create a new sequence named `Transcript Assembly - [source name]`.
* Create or place that sequence inside the Project bin named `Automation Agent Output`.
* Place matched source ranges in target-script order.
* Use `prSequencePlaceItem` with explicit `sourceInSeconds` and `sourceOutSeconds` for each placement.
* Preserve the source clip non-destructively.
* Add a small fixed gap between target units.
* For ambiguous matches, stack every plausible alternate take vertically at the same timeline position as the primary.
* Keep the primary match on V1/A1 when possible, with alternates on higher tracks.
8. Add review markers for uncertainty.
* Add sequence markers wherever the assembly needs editorial attention.
* Use markers for fuzzy matches, near matches with meaningful wording differences, partial matches, ambiguous take groups, missing passages, and skipped target text.
* Place the marker at the start of the affected placed range, take group, or gap.
* If a missing passage produces a visible gap, create a duration marker that covers the gap.
* Keep the marker name short and scannable, such as:
* `Fuzzy match - Unit 4`
* `Partial match - Unit 7`
* `Ambiguous takes - Unit 9`
* `Missing text - Unit 12`
* Put the detailed explanation in the marker comment, including:
* the target unit number
* the missing, changed, or uncertain text
* the source range or candidate ranges involved
* why the match was not exact
* any suggested editor action
* Do not create markers for clean exact matches unless there is another reason for review.
9. Output a matching report in the chat.
* If the matching report looks too uncertain to build a useful edit, output the report before creating the sequence and wait for approval.
* Otherwise, output the report after building the sequence.
* For every target unit, list:
- target unit number
- short target quote
- chosen source time range
- match type: Exact, Near, Fuzzy, Ambiguous, Partial, or Missing
- whether the unit was matched as one range or split into sub-matches
- any alternate take ranges
- whether a review marker was added
- warning notes for imperfect, uncertain, or missing matches
* Mention the name of the new sequence.
* If transcript readiness or transcript quality limited the result, mention that clearly.
Important quality guidelines:
* Transcript timing is the source of truth for placement.
* Prefer a reviewable assembly over a falsely confident one.
* Preserve ambiguity structurally by stacking candidate takes; marker comments and reports are required review metadata, but they do not replace timeline placement.
* Use split matches only to recover real spoken sub-passages, not to manufacture a sentence from unrelated fragments.
* Treat speaker labels as optional hints, not as required matching keys.
* Put important uncertainty directly into sequence markers so review notes stay attached to the relevant timeline position.
* Warn about semantic mismatches instead of silently accepting weak matches.
* Stop before creating the sequence if the matching report looks too uncertain to build a useful edit.
What The Agent Will Do
In practical terms, a good run of this workflow should:
- read the attached target transcript file
- split the target text into ordered assembly units and adjust that segmentation when matching requires it
- inspect the selected source clip and verify transcript readiness
- export and verify source transcript content before matching
- locate exact, near, fuzzy, ambiguous, partial, and missing matches
- create a new sequence with the matched source ranges in target order
- stack alternate takes vertically when several source ranges match the same target unit
- preserve small handles around spoken sections
- add sequence markers for fuzzy, partial, ambiguous, missing, or skipped passages
- return a unit-by-unit matching report with warnings
Expected Result In Premiere
The result should be a new non-destructive assembly sequence where:
- the target transcript order becomes the sequence order
- each matched passage is placed as a trimmed source range
- alternate takes are aligned vertically for fast review
- markers and gaps make missing and uncertain passages visible
- the original source clip remains untouched
This is meant as an editorial assembly pass. The editor should still review match quality, trim handles, choose alternate takes, and fix any transcript-driven timing issues.
Prompt Variants
You can adapt this workflow depending on the source material and how much risk you want the agent to take.
1. Work from all clips in the active sequence
Use this variant when the raw material is already organized in a timeline and you want the agent to search only the source clips that appear there:
Change the source scope:
- Use all clips in the currently active sequence as the source pool.
- For each timeline clip, obtain the transcript of its source media.
- Map transcript ranges back to the source clip timing, then place the matching source ranges into a new assembly sequence.
- If the same passage appears in multiple clips, treat the matches as alternate takes and stack them vertically.
- If a clip is already trimmed in the active sequence, prefer searching only the source transcript portion represented by that timeline clip unless I explicitly ask you to search the full underlying source clip.
That is useful when:
- you already made a loose selects timeline
- multiple source clips may contain matching passages
- you want to avoid searching unrelated media in the whole project
2. Stop after the matching report
If you want to approve the transcript matching before Premiere is modified, add:
First produce only the matching report. Do not create or modify any sequence yet. Wait for my approval before building the assembly sequence.
That is useful when:
- the target transcript is rough
- the source contains many similar takes
- client-approved wording must be matched carefully
3. Be stricter about fuzzy matches
If exact wording matters, add:
Only use fuzzy matches when the spoken meaning is clearly the same and the difference is limited to transcription noise, filler words, or tiny phrasing changes. Mark anything more different as Missing instead of placing it.
That is useful when:
- legal, compliance, or quote accuracy matters
- the target text is a verbatim paper edit
- similar but not identical statements would be misleading
4. Prefer editorial continuity over verbatim matching
If you are making a rough story assembly and can tolerate paraphrases, add:
You may use semantically equivalent paraphrases when no exact wording exists, but every paraphrase match must be labeled as Fuzzy and explained in the final report.
That is useful when:
- the target text is an outline rather than a verbatim transcript
- interview answers repeat the same idea with different wording
- speed of assembly matters more than exact wording
Limitations And Review Points
Transcript-based assembly is only as reliable as the transcript timing and the matching rules. Watch for:
- repeated takes with nearly identical wording
- speaker labels that are missing, inconsistent, or wrong if you choose to use speaker labels as optional hints
- transcript segments that start or end too late
- target passages that combine words from different source moments
- words that were cleaned up in the target text but never spoken exactly that way
- overlapping dialogue, interruptions, and crosstalk
- translated or heavily rewritten target text
The safest pattern is to let the agent make uncertainty visible: exact matches can be placed confidently, ambiguous matches should be stacked for review, and weak matches should produce warnings instead of quiet editorial decisions.