NRT Vector Search — A Six-Part Series on Near Real Time Vector Search in Production

Most RAG pipeline content stops at batch. Embed your corpus, build your index, query it. Clean, simple, done.

That's not production. Production has data arriving continuously. Source systems changing. Vectors going stale while users are querying them. The moment you need freshness in your index — real freshness, not a nightly rebuild — you're in near real time territory, and the architecture gets meaningfully more complex.

This series is about that complexity. Not the happy path — the seams. Auto Loader to Structured Streaming to foreachBatch to LanceDB. Each arrow on the whiteboard is a handoff. Each handoff is where the friction actually lives. Six parts covering the spike that validated the approach, the implementation decisions that made it work in production, and the honest retrospective on what to do differently next time.

Part 1 The Architecture Decision The full stack overview. Replication boundary, three friction points at the seams, and the proposed solution. The map before the territory. Part 2 The Spike: What We Actually Found What the investigation actually uncovered. What got ruled out and why. Why the existing stack already had everything needed before a line of code was written. Part 3 The foreachBatch Seam: Where Spark's Guarantees End Where Spark's guarantees end and your responsibilities begin. Idempotency, embedding throughput, failure handling, and what to instrument from day one. Part 4 LanceDB in Production: What the Docs Don't Prepare You For Append-only realities, fragment accumulation, optimize cadence tuning, and flat scan performance. What you actually hit versus what the documentation describes. Part 5 The Full Pipeline End to End: Where Theory Meets Production Auto Loader to LanceDB wired together. End to end latency, trigger interval alignment, catchup scenarios, and what production observability actually looks like. Part 6 What I'd Do Differently: The Honest Retrospective The honest retrospective. What surprised me, what I'd build earlier, what the happy path tutorials miss entirely, and where the real engineering lives.

Clarity through the chaos.

Arjun Krishnamoorthi is the founder of LogicLens LLC, a fractional data architecture and AI consulting practice. If you have a data infrastructure problem or an AI project that needs senior hands — let's talk.