VideoDB Labs

01 /
Highlights

Selected work
from across VideoDB Labs.

Research, field notes, and build stories worth a closer look.

Research · Jul 7, 2026

JEPA: From Language Models to World Models

Why JEPA’s latent-prediction objective may shift AI systems from token prediction toward predictive world models for VLMs, VLAs, and embodied agents.

Sankalp Nagaonkar20 min

Build note · May 8, 2026

How I built Call.md: AI call intelligence on top of VideoDB

A practical build story for a local-first call intelligence app that records calls, transcribes speakers, generates live nudges, and exports structured Markdown.

Om20 min

Field note · Jul 24, 2026

What video retrieval benchmarks taught us about ground truth

Manual review of MSRVTT, MSVD, VATEX, DiDeMo, and QVHighlights shows cases where the benchmark ground truth is too narrow, shifted, or ambiguous, so valid retrieved clips are scored as misses.

Samuel Alexander10 min

02 /
Field Notes

Recent production lessons
and technical notes.

Browse short notes on infrastructure, reliability, latency, retrieval, and the small fixes that matter in production.

VLM Evaluation

Strong general purpose VLMs can still fail on downstream vision tasks

A small chessboard-to-FEN experiment showed Claude Opus 4.7 struggling with exact square-level localization even when it understood the board broadly.

Sankalp Nagaonkarfield noteMay 19, 2026 · 6 min

Operations

Reuse HTTP connections in Python services

A short field note on replacing repeated bare requests calls with a shared session so Python services can reuse HTTP connections.

Rohit Gargfield noteApr 20, 2026 · 3 min

Frontend Performance

One CORS header that removed hundreds of milliseconds from repeated API calls

A practical note on preflight caching, custom auth headers, and why OPTIONS requests quietly dominate app latency on slower connections.

Om Gatefield noteApr 17, 2026 · 2 min

Operations

A Postgres backup is not real until you restore it

A field note from a simple offline Postgres migration: why pg_dump was enough, where read-only mode fits, and why the restore drill matters more than the dump command.

Rohit Gargfield noteJan 2, 2026 · 9 min

Open field notes

03 /
Build Notes

Build notes
from working VideoDB projects.

Follow the architecture choices, API decisions, and tradeoffs behind apps, demos, and agent tools built with VideoDB.

Build Note · Lalit Gupta

Bloom: Record and Search Your Screen

How I built a local-first screen recorder on top of the VideoDB SDK. It automatically indexes your audio and screen, making everything searchable by the time you paste the share link.

Build Note · VideoDB Team

Focusd: A Coach for Your Workday, Not a Time-Tracking Report Card

How we built Focusd as a local-first desktop app that records work sessions, indexes screen activity with VideoDB, and turns raw events into useful productivity coaching.

Build Note · VideoDB Team

Build CCTV Camera for Your OpenClaw Agent

How we built a VideoDB-powered OpenClaw skill that records, indexes, searches, summarizes, and clips an agent's remote desktop without changing the agent itself.

Build Note · Sankalp Nagaonkar

Deep Search: How We Built an Engine for Finding Exact Moments in Video

How we built Deep Search as a retrieval loop for finding exact moments in video using planning, indexing, validation, recovery, and follow-up state.

Build Note · Rohit Garg

Pair Programmer: live screen and audio context for coding agents

A short engineering note on turning a developer session into searchable context with VideoDB Capture, RTStreams, local events, and agent-side retrieval.

Open build notes

04 /
Newsletter

Newsletters
from the VideoDB engineering team.

Read concise updates, technical notes, and build context from the team working on video infrastructure and agents.

#001 · Dispatch

Newsletter · May 8, 2026

The VideoDB Dispatch #001

Why agent runs need playable evidence, thinking tokens in vision models and agentic video streams.

newslettervideo-aiagents

Read the issue

Open newsletter

05 /
Research

Research at the edge
of video and agents.

Papers, evaluations, talks, and notes on video understanding, retrieval, multimodal models, and agent systems.

Research note · May 15, 2026

How to Evaluate Multimodal VLMs for Your Video Use Case

Sankalp Nagaonkar

A practical workflow for evaluating video VLM setups with VideoDB and Langfuse, from task definition and dataset design to tracing, scoring, and deployment decisions.

Read the note

arXiv preprint arXiv:2604.11177 · 2026/4/13

Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding

Shivam Sharma, Sankalp Nagaonkar, Ashish Choithani, Ashutosh Trivedi

Benchmarks how internal reasoning traces affect video scene understanding in Gemini models, including where quality gains plateau and how tight budgets increase compression-step hallucination.

Read on arXiv

arXiv preprint arXiv:2502.06445 · 2025/2/10