Sfoglia il codice sorgente

refactor(embedding): skip withLLMSessionForLlm when embedProvider is supplied (i-08ovbvtb)

Finishes the unmet half of i-qkarfffa DoD #9 ("No node-llama-cpp warm-up
needed when remote-only"). Stage-3 (i-qkarfffa) shipped the
`embedProvider` option that short-circuits `session.embed()` calls, but
the OUTER `withLLMSessionForLlm(llm, ...)` wrapper still constructed an
LLMSession against `getLlm(store)` — accessing `embedModelName` and
holding a session lease for the duration of the embedding loop, even
when the local LLM was never going to be used.

This commit introduces a `withEmbedSession(store, provider, body)`
helper that branches:

  * provider supplied  -> creates a lightweight AbortController-backed
                          fake session; `getLlm(store)` is NEVER called,
                          `withLLMSessionForLlm` is bypassed entirely.
                          Fake session's LLM-only methods (embed,
                          embedBatch, expandQuery, rerank) throw with a
                          clear message — they MUST NOT be reached
                          because `embedOne`/`embedMany` route through
                          `provider.embed()` instead.
  * provider undefined -> calls `withLLMSessionForLlm(getLlm(store), ...)`
                          with the same `LLMSessionOptions` as before
                          (maxDuration: 30 min, name: 'generateEmbeddings').
                          Local-only path is byte-identical.

`embedModelUri` resolution is also moved behind the same branch — when
provider is set, the model id comes from `provider.getModelId()`;
otherwise it falls back to `getLlm(store).embedModelName`. So
remote-only deployments no longer construct a `LlamaCpp` instance just
to read its `embedModelName`.

Test changes:
  * Removed the inline `, 30000` timeout override on the
    "uses provider.embedBatch when supplied" test (added in 058ec1d as
    a workaround). The global `testTimeout` (30s in vitest.config.ts)
    still applies, but the per-test bump and its DoD-#9 follow-up note
    are no longer needed.
  * Added a new test: "provider mode does not access store.llm" — sets
    `store.llm` to a Proxy that throws on ANY property access, then
    runs `generateEmbeddings({embedProvider})`. Catches future
    regressions of accidentally re-introducing `getLlm(store)` /
    `embedModelName` reads in the provider path.

Verification:
  * vendor/qmd integration suite: 9/9 PASS (test/embedding-store-integration.test.ts).
  * vendor/qmd full suite: 897/907 PASS — 10 pre-existing
    `test/sdk.test.ts > with LLM query expansion` timeouts on master
    (flaky LLM-bound tests, unaffected by this refactor).
  * Typecheck: only 1 pre-existing error in `src/cli/qmd.ts` (unrelated
    palette `red` field). Zero new errors in `src/store.ts`.

Out of scope (explicit per issue): `chunkDocumentByTokens` still calls
`getDefaultLlamaCpp().tokenize(...)` for token counting — that's a
separate cold-cache load path that an llm-free remote-only embed flow
would need to address. DoD #1 ("does NOT load node-llama-cpp") is
therefore only fully met if/when chunkDocumentByTokens is also taught
about provider-supplied tokenization. Tracked as a follow-up to this
issue (out-of-scope here per "Files affected: src/store.ts ~30 LOC").

Generated with [Claude Code](https://claude.ai/code)
via [Oivo](https://oivo.com)

Co-Authored-By: Claude <noreply@anthropic.com>
Session-Id: 6309e407
root 3 settimane fa
parent
commit
6ebfc54cc7
2 ha cambiato i file con 91 aggiunte e 10 eliminazioni
  1. 61 6
      src/store.ts
  2. 30 4
      test/embedding-store-integration.test.ts

+ 61 - 6
src/store.ts

@@ -24,6 +24,7 @@ import {
   formatQueryForEmbedding,
   formatDocForEmbedding,
   withLLMSessionForLlm,
+  type LLMSessionOptions,
   type RerankDocument,
   type ILLMSession,
 } from "./llm.js";
@@ -1409,6 +1410,54 @@ function getEmbeddingDocsForBatch(db: Database, batch: PendingEmbeddingDoc[]): E
   }));
 }
 
+/**
+ * Run `body` with a session-shaped argument that supplies an AbortSignal +
+ * isValid flag. When `provider` is supplied, the session is a lightweight
+ * AbortController-backed stub — `getLlm(store)` is never called and
+ * `withLLMSessionForLlm` is bypassed entirely, so node-llama-cpp is not
+ * warmed up on remote-only deployments (i-08ovbvtb, follow-up to i-qkarfffa).
+ *
+ * When `provider` is undefined, behavior is unchanged: a real `LLMSession`
+ * is created via `withLLMSessionForLlm(getLlm(store), ...)` so that the
+ * body can use `session.embed`/`session.embedBatch` for the local path.
+ *
+ * The fake session's LLM-only methods (embed/embedBatch/expandQuery/rerank)
+ * throw if called — they MUST NOT be reached when `provider` is set, since
+ * the embed path is supposed to route through the provider instead.
+ */
+async function withEmbedSession<T>(
+  store: Store,
+  provider: EmbeddingProvider | undefined,
+  body: (session: ILLMSession) => Promise<T>,
+  options?: LLMSessionOptions,
+): Promise<T> {
+  if (provider) {
+    const ac = new AbortController();
+    const fakeSession: ILLMSession = {
+      get signal() { return ac.signal; },
+      get isValid() { return !ac.signal.aborted; },
+      embed: async () => {
+        throw new Error("withEmbedSession: provider supplied — session.embed must not be called");
+      },
+      embedBatch: async () => {
+        throw new Error("withEmbedSession: provider supplied — session.embedBatch must not be called");
+      },
+      expandQuery: async () => {
+        throw new Error("withEmbedSession: provider supplied — session.expandQuery must not be called");
+      },
+      rerank: async () => {
+        throw new Error("withEmbedSession: provider supplied — session.rerank must not be called");
+      },
+    };
+    try {
+      return await body(fakeSession);
+    } finally {
+      ac.abort();
+    }
+  }
+  return withLLMSessionForLlm(getLlm(store), body, options);
+}
+
 /**
  * Generate vector embeddings for documents that need them.
  * Pure function — no console output, no db lifecycle management.
@@ -1463,10 +1512,6 @@ export async function generateEmbeddings(
     // callers that never touch ~/.config/qmd working.
   }
 
-  // Use store's LlamaCpp or global singleton, wrapped in a session
-  const llm = getLlm(store);
-  const embedModelUri = options?.embedProvider?.getModelId() ?? llm.embedModelName;
-
   // Provider routing — when an EmbeddingProvider is supplied, embed calls go
   // through it (HTTP, GPU worker, etc.). Otherwise, use the LLM session path.
   // The outer session is still created for its abort signal (chunking uses
@@ -1474,8 +1519,18 @@ export async function generateEmbeddings(
   const provider = options?.embedProvider;
   const providerModel = provider?.getModelId() ?? model;
 
-  // Create a session manager for this llm instance
-  const result = await withLLMSessionForLlm(llm, async (session) => {
+  // Resolve `embedModelUri` (used for formatting prefixes etc.) lazily —
+  // when `provider` is set, take it from the provider; otherwise fall back
+  // to the local LlamaCpp's embed model name. Accessing `getLlm(store)` is
+  // deferred to the non-provider branch so remote-only deployments do not
+  // construct a `LlamaCpp` instance just to read its embedModelName.
+  const embedModelUri = provider
+    ? provider.getModelId()
+    : getLlm(store).embedModelName;
+
+  // Run the embedding loop inside a session-scoped wrapper. When `provider`
+  // is set, this short-circuits the local LLM warm-up entirely (i-08ovbvtb).
+  const result = await withEmbedSession(store, provider, async (session) => {
     let chunksEmbedded = 0;
     let errors = 0;
     let bytesProcessed = 0;

+ 30 - 4
test/embedding-store-integration.test.ts

@@ -146,10 +146,36 @@ describe("generateEmbeddings with EmbeddingProvider", () => {
     expect(result.errors).toBe(0);
     expect(provider.embedBatchCalls + provider.embedCalls).toBeGreaterThan(0);
     expect(provider.totalTextsEmbedded).toBeGreaterThan(0);
-  }, 30000); // Cold-cache llama-cpp init can take >5s on first session call.
-  // Provider short-circuits embed calls (line 1494-1499 of store.ts) but the
-  // outer `withLLMSessionForLlm` wrapper still warms the LLM. DoD #9 (skip
-  // LLM init when provider supplied) is a follow-up refactor.
+  });
+  // Default 5s timeout restored after i-08ovbvtb removed the
+  // `withLLMSessionForLlm` wrapper from the provider path. The previous
+  // 30s bump (commit 058ec1d) was a workaround for the cold-cache LLM
+  // warm-up that the refactor now skips entirely.
+
+  test("provider mode does not access store.llm (DoD #2, #5 — i-08ovbvtb)", async () => {
+    // When `embedProvider` is supplied, the refactor must NOT consult the
+    // local LlamaCpp at all — neither `embedModelName` nor any other field.
+    // We assert this by setting `store.llm` to a Proxy that throws on any
+    // property access. If `getLlm(store).embedModelName` (or any sibling
+    // call site) regressed back into the provider path, the test would
+    // fail with a clear error message.
+    const throwingLlm = new Proxy({}, {
+      get(_target, prop) {
+        throw new Error(
+          `store.llm.${String(prop)} accessed when embedProvider was supplied — DoD violation`,
+        );
+      },
+    }) as never;
+    store.llm = throwingLlm;
+
+    const provider = new StubProvider("embeddinggemma", 4);
+    const result = await generateEmbeddings(store, { embedProvider: provider });
+
+    expect(result.docsProcessed).toBe(2);
+    expect(result.chunksEmbedded).toBeGreaterThan(0);
+    expect(result.errors).toBe(0);
+    expect(provider.totalTextsEmbedded).toBeGreaterThan(0);
+  });
 
   test("model-id guard throws ModelMismatchError on mismatch", async () => {
     // Pre-populate content_vectors with a different model id