VLM Evaluation
Strong general purpose VLMs can still fail on downstream vision tasks
A small chessboard-to-FEN experiment showed Claude Opus 4.7 struggling with exact square-level localization even when it understood the board broadly.
Field notes, build stories, research, and working projects from video infrastructure, retrieval, streaming, and agent tooling.
Browse short notes on infrastructure, reliability, latency, retrieval, and the small fixes that matter in production.
VLM Evaluation
A small chessboard-to-FEN experiment showed Claude Opus 4.7 struggling with exact square-level localization even when it understood the board broadly.
Operations
A short field note on replacing repeated bare requests calls with a shared session so Python services can reuse HTTP connections.
Frontend Performance
A practical note on preflight caching, custom auth headers, and why OPTIONS requests quietly dominate app latency on slower connections.
Operations
A field note from a simple offline Postgres migration: why pg_dump was enough, where read-only mode fits, and why the restore drill matters more than the dump command.
Follow the architecture choices, API decisions, and tradeoffs behind apps, demos, and agent tools built with VideoDB.
Build Note · Lalit Gupta
How I built a local-first screen recorder on top of the VideoDB SDK. It automatically indexes your audio and screen, making everything searchable by the time you paste the share link.
Build Note · VideoDB Team
How we built Focusd as a local-first desktop app that records work sessions, indexes screen activity with VideoDB, and turns raw events into useful productivity coaching.
Build Note · VideoDB Team
How we built a VideoDB-powered OpenClaw skill that records, indexes, searches, summarizes, and clips an agent's remote desktop without changing the agent itself.
Build Note · Sankalp Nagaonkar
How we built Deep Search as a retrieval loop for finding exact moments in video using planning, indexing, validation, recovery, and follow-up state.
Build Note · Rohit Garg
A short engineering note on turning a developer session into searchable context with VideoDB Capture, RTStreams, local events, and agent-side retrieval.
Build Note · Om
A practical build story for a local-first call intelligence app that records calls, transcribes speakers, generates live nudges, and exports structured Markdown.
Read concise updates, technical notes, and build context from the team working on video infrastructure and agents.
Newsletter · May 8, 2026
Why agent runs need playable evidence, thinking tokens in vision models and agentic video streams.
Subscribe
Technical notes, build stories, and updates from our team.
Papers, evaluations, talks, and notes on video understanding, retrieval, multimodal models, and agent systems.
Research note · May 15, 2026
A practical workflow for evaluating video VLM setups with VideoDB and Langfuse, from task definition and dataset design to tracing, scoring, and deployment decisions.
Read the notearXiv preprint arXiv:2604.11177 · 2026/4/13
Benchmarks how internal reasoning traces affect video scene understanding in Gemini models, including where quality gains plateau and how tight budgets increase compression-step hallucination.
Read on arXivarXiv preprint arXiv:2502.06445 · 2025/2/10
Introduces an open-source benchmark for evaluating VLMs on OCR tasks in dynamic video environments across 1,477 manually annotated frames.
Read on arXivApps, agent skills, hackathon submissions, and side projects.

Turn meetings into live agent loops.

An agentic Loom alternative where recordings become inputs for AI.

Give coding agents live screen, voice, and system-audio context.
Agents research the internet and stream clean video briefings.