Skip to main content

Video Intake Guard

The Video Intake Guard runs automatically when you pass --url to the generate command. It validates the video URL, classifies whether it's a technical tutorial, and confirms you've provided your own transcript.

URL-only generation is blocked

A YouTube URL alone is not enough. You must always provide your own transcript file with --file. DevDocs Forge Agent never scrapes transcripts, downloads captions, or downloads videos.

Why it exists

DevDocs Forge Agent is designed for processing content you own or have permission to use. The Video Intake Guard helps enforce this by:

  • Verifying the URL points to a real, supported video
  • Checking that the video looks like a technical tutorial (not a music video, vlog, etc.)
  • Requiring that you supply your own transcript — no scraping allowed

What the guard never does

  • Scrapes YouTube transcripts or caption tracks
  • Downloads video files or audio
  • Accesses YouTube captions API
  • Auto-fetches transcript text from any URL
  • Generates docs from a URL alone without a user-provided file

The three checks

1. URL validation

Only these domains are accepted:

  • youtube.comyoutube.com/watch?v=VIDEO_ID
  • youtu.beyoutu.be/VIDEO_ID
  • vimeo.comvimeo.com/NUMERIC_ID

Malformed URLs, unsupported platforms, and YouTube channel URLs (without a video ID) are rejected immediately.

2. Tech video classification (0–100 score)

The classifier scores the video based on metadata signals:

SignalPoints
Title has a strong tech keyword (angular, docker, typescript)+30
Description mentions technical terms+20
Tags contain programming terms+15
Category is Science & Technology / Education / Howto & Style+20
Duration > 3 minutes+10
Title contains blocked keyword (song, vlog, gaming)−30
Category is Music / Entertainment / Sports / Gaming−20

Thresholds:

  • Score ≥ 60 → High confidence — allowed
  • Score 35–59 → Medium confidence — allowed with warning
  • Score < 35 → Low confidence — blocked (unless --force)
Optional YouTube API key

Full metadata (title, description, tags, category, duration) requires YOUTUBE_API_KEY in your .env. Without it, classification uses URL and filename heuristics only and typically gives a low-confidence score.

3. Transcript requirement

You must always provide --file with your own transcript. URL-only generation is blocked:

Blocked — no transcript provided
# This fails by design
npm run generate -- --url "https://www.youtube.com/watch?v=W6NZfCO5SIk"
Correct — transcript provided
npm run generate -- \
--url "https://www.youtube.com/watch?v=W6NZfCO5SIk" \
--file input/my-transcript.md \
--type docusaurus

Test the guard with a real URL

Step 1 — Inspect a video URL

Check how a video URL would be classified without providing a transcript:

Inspect a video URL
npm run devdocs-forge-agent -- inspect-url "https://www.youtube.com/watch?v=W6NZfCO5SIk"

Step 2 — Validate URL and transcript together

Run all intake checks on a URL + transcript pair:

Validate URL and transcript
npm run devdocs-forge-agent -- validate-source \
--url "https://www.youtube.com/watch?v=W6NZfCO5SIk" \
--file input/my-transcript.md

Expected output:

devdocs-forge-agent validate-source
──────────────────────────────────────
OK URL is valid (youtube)
OK Tech classification: 75/100 high confidence
OK Transcript: input/my-transcript.md 842 words

✓ Source is valid. Ready to generate.

Step 3 — Generate docs safely

Generate docs with Video Intake Guard
npm run generate -- \
--url "https://www.youtube.com/watch?v=W6NZfCO5SIk" \
--file input/my-transcript.md \
--type docusaurus

--force flag

Bypass low-confidence classification if you own the content:

Force bypass low-confidence classification
npm run generate -- \
--url "https://www.youtube.com/watch?v=W6NZfCO5SIk" \
--file input/my-transcript.md \
--force

Using --force appends a warning to metadata.json and review-checklist.md so you have a record of the bypass.

Configuration

Customize classification in config/devdocs-forge-agent.yml:

config/devdocs-forge-agent.yml
video_intake:
enabled: true
min_tech_confidence_score: 60 # lower to be more permissive
min_transcript_words: 150
technical_keywords:
- your-framework
- your-library
blocked_keywords:
- song
- vlog