Explorar el Código

Fix Metal backend crash by properly disposing llama resources

The Metal crash was caused by not disposing resources in the right order
at the right time. The fix:

1. Restore proper dispose() that disposes contexts → models → llama in order
2. Move disposeDefaultLlamaCpp() to global afterAll (after all tests complete)
3. Keep disposed flag to prevent double-dispose

The issue was that disposing per-suite broke tests that share llama,
and not disposing at all left orphaned Metal resources at process exit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tobi Lutke hace 5 meses
padre
commit
e8f4dce0b7
Se han modificado 2 ficheros con 38 adiciones y 17 borrados
  1. 6 4
      src/eval.test.ts
  2. 32 13
      src/llm.ts

+ 6 - 4
src/eval.test.ts

@@ -34,7 +34,7 @@ import {
   DEFAULT_EMBED_MODEL,
   type RankedResult,
 } from "./store";
-import { getDefaultLlamaCpp, formatDocForEmbedding } from "./llm";
+import { getDefaultLlamaCpp, disposeDefaultLlamaCpp, formatDocForEmbedding } from "./llm";
 
 // Eval queries with expected documents
 const evalQueries: {
@@ -201,8 +201,8 @@ describe("Vector Search", () => {
     hasEmbeddings = true;
   }, 120000); // 2 minute timeout for embedding generation
 
-  // Note: Don't call disposeDefaultLlamaCpp() here - it causes Metal backend
-  // assertion failures during process exit. Let the process exit handle cleanup.
+  // Note: Don't dispose here - Hybrid tests also use llama.
+  // Dispose happens in the global afterAll.
 
   test("easy queries: ≥60% Hit@3 (vector should match keywords too)", async () => {
     if (!hasEmbeddings) return; // Skip if embedding failed
@@ -392,6 +392,8 @@ describe("Hybrid Search (RRF)", () => {
 // Cleanup
 // =============================================================================
 
-afterAll(() => {
+afterAll(async () => {
+  // Dispose llama before process exit to properly free Metal resources
+  await disposeDefaultLlamaCpp();
   rmSync(tempDir, { recursive: true, force: true });
 });

+ 32 - 13
src/llm.ts

@@ -704,19 +704,38 @@ Generate the structured expansion:`;
       this.inactivityTimer = null;
     }
 
-    // Don't explicitly dispose llama resources - it causes Metal backend
-    // assertion failures during process cleanup. The Metal device cleanup
-    // in ggml-metal expects resources to be freed in a specific order that
-    // we can't control. Just clear references and let the process exit
-    // handle cleanup naturally.
-    // See: https://github.com/ggml-org/llama.cpp/pull/17869
-    this.embedContext = null;
-    this.generateContext = null;
-    this.rerankContext = null;
-    this.embedModel = null;
-    this.generateModel = null;
-    this.rerankModel = null;
-    this.llama = null;
+    // Dispose in order: contexts -> models -> llama
+    // Contexts depend on models, models depend on llama
+    if (this.embedContext) {
+      await this.embedContext.dispose();
+      this.embedContext = null;
+    }
+    if (this.generateContext) {
+      await this.generateContext.dispose();
+      this.generateContext = null;
+    }
+    if (this.rerankContext) {
+      await this.rerankContext.dispose();
+      this.rerankContext = null;
+    }
+
+    if (this.embedModel) {
+      await this.embedModel.dispose();
+      this.embedModel = null;
+    }
+    if (this.generateModel) {
+      await this.generateModel.dispose();
+      this.generateModel = null;
+    }
+    if (this.rerankModel) {
+      await this.rerankModel.dispose();
+      this.rerankModel = null;
+    }
+
+    if (this.llama) {
+      await this.llama.dispose();
+      this.llama = null;
+    }
   }
 }