Undertow Engine API: Headless Short-Form Video at the API Layer
A standalone Python microservice that takes a JSON brief and outputs a fully composited, captioned, posted short-form video — built on FastAPI, Celery, MoviePy, and Playwright.
Undertow Engine API is a video pipeline you call with a POST. Submit a brief — assets, template, captions, target platforms — and it returns a job ID. The service composites the video, renders it, posts (or schedules) it to TikTok / Instagram Reels / YouTube Shorts, and webhooks you when each stage completes.
It's deliberately a standalone microservice, not an embedded library: the upstream caller stays clean (a Next.js app, a scheduler, an n8n flow, another Python service), and all the heavy MoviePy + Chromium dependencies live in one place.
End-to-end
1. POST /jobs ── brief JSON (assets, template, captions, targets)
│
▼
2. FastAPI queues ── Celery task, returns job_id immediately
│
▼
3. MoviePy renders ── intro / B-roll / overlay / outro composited to MP4
│
▼
4. Captioning ── Whisper transcription → styled caption burn-in
│
▼
5. Playwright posts ── headless browser uploads to each target platform
│ (TikTok, Reels, Shorts) with the right metadata
▼
6. Webhook callback ── per-stage status + final platform URLs
The split matters. MoviePy handles rendering (CPU-bound, ffmpeg under the hood). Playwright handles posting (network + DOM-bound). Celery keeps both off the FastAPI request thread, so the API stays snappy and the long-running work survives restarts.
Stack
| Layer | Tech |
|---|---|
| API | FastAPI (Python 3.11), async request handlers |
| Job queue | Celery + Redis broker |
| Video rendering | MoviePy + ffmpeg |
| Captioning | OpenAI Whisper (transcription) → Pillow caption layer |
| Headless posting | Playwright (Chromium), per-platform DOM automation |
| Storage | AWS S3 — raw assets in, rendered MP4 out |
| Deploy | Docker → ECR → SSM-managed EC2 (test + prod environments) |
| CI | pytest + ruff; image tagged test-{SHA} / prod-{SHA} for immutable prod images |
Why headless Playwright for the post step
Every major short-form platform has a posting API… on paper. In practice, the official APIs are slow to gain features (TikTok captions, music tracks, scheduling) and have rate limits that don't match creator-tool needs. A Playwright-driven post — using a signed-in session — matches what a human poster would do, including the platform-specific niceties the API lags on.
This is fragile by design: when a platform changes its uploader UI, the script breaks loudly. The service treats the per-platform poster as a swappable adapter, so a broken TikTok flow doesn't take down Reels.
Why a microservice vs. an in-process module
- Heavy deps in one place. MoviePy pulls in ffmpeg; Playwright pulls in Chromium. Callers shouldn't have to.
- Independent scale. Render jobs spike at content-publish time. The rest of the system shouldn't auto-scale on that signal.
- Versioned API. The video pipeline can evolve (new captioning model, new platform adapter) without forcing every caller to redeploy.
Deployment
Test and prod each have their own ECR repository and EC2 instance, with images tagged test-{SHA} and prod-{SHA} respectively. SSM is used for environment configuration; nothing static. Deploys are immutable — every prod release is a new image, never an in-place mutate.