Computer Vision Stack

The Computer Vision Stack is the foundation under AppLiaison products that ingest video or image streams and turn them into business signals — retail foot-traffic counts, dwell-time heatmaps, defect detection on assembly lines, perimeter monitoring. It pairs Python where the math lives (FastAPI + PyTorch) with a Next.js dashboard where the operators live, and supports both cloud and on-device inference so latency-sensitive workloads do not have to round-trip frames over the internet.

Architecture

Architecture variant: ai-native

Frontend

Next.js 14 (operator dashboard)Tailwind + headless primitivesWebSocket live tiles

Backend

FastAPI (Python 3.12)PyTorch + torchvision (model serving)YOLOv8 / RT-DETR baseline detectors (swappable)

Data + infra

Vercel (dashboard)GPU pool on Modal or Lambda Labs (cloud inference)On-device runtime via ONNX or CoreML / TFLite

Integrations

RTSP / ONVIF cameras (direct ingest)Existing NVR/VMS (Milestone, Genetec) via ONVIF passthroughTwilio (alert SMS)Slack / Microsoft Teams (alert routing)

When to choose this stack

The product input is video or images, not events from a database
Latency requirements drive on-device inference at the edge
The operator surface is a live monitoring dashboard, not a CRUD app
Customers integrate with existing camera infrastructure, not buy new
The output is business analytics (dwell time, throughput) rather than a chat

What's NOT included

— Custom training of customer-private models (available as a paid engagement)
— Camera hardware sourcing or installation
— Long-term video archival beyond 90 days (S3-IA bucket lifecycle, customer-paid)
— Real-time human review of every event — the system is autonomous by design

How the pieces fit

Video frames flow from RTSP-capable cameras into a FastAPI ingest service. Inference runs in one of two places — on a GPU pool in the cloud for cameras whose latency budget can absorb a 200ms round-trip, or on an edge runtime (NVIDIA Jetson, an Intel NUC with OpenVINO, or an Apple Mac mini with CoreML) for cameras that can’t.

Detected events post to Postgres and to a Redis channel; the dashboard subscribes via WebSocket, so a count or alert appears within a second of the event happening. Hourly and daily rollups precompute the analytics most customers actually look at.

Why these choices

FastAPI + PyTorch over a Node-only stack: the model ecosystem is in Python. Pretending otherwise just adds round-trip latency.

ONNX export for on-device inference: lets the same trained model run in the cloud (PyTorch) and on the edge (CoreML / TFLite / OpenVINO) without retraining.

Two inference locations instead of one: most workloads are happy with cloud inference; the ones that aren’t are deal-breakers. Supporting both at the platform level means we don’t lose the deal over latency.

Apps built on this stack

Visionline

Computer-vision analytics for retail foot traffic and dwell-time heatmaps

AI-native · Computer Vision Stack

service + license · 28d

$24k license $4.5k/mo