Преглед изворни кода

Use default createContext() options for better VRAM management

Let node-llama-cpp handle context size and sequences automatically.
The mutex still serializes generation calls for safety.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tobi Lutke пре 5 месеци
родитељ
комит
c9ac3c1463
1 измењених фајлова са 1 додато и 2 уклоњено
  1. 1 2
      src/llm.ts

+ 1 - 2
src/llm.ts

@@ -361,8 +361,7 @@ export class LlamaCpp implements LLM {
       const llama = await this.ensureLlama();
       const modelPath = await this.resolveModel(this.generateModelUri);
       this.generateModel = await llama.loadModel({ modelPath });
-      // Use single sequence to minimize VRAM when multiple models are loaded
-      this.generateContext = await this.generateModel.createContext({ sequences: 1 });
+      this.generateContext = await this.generateModel.createContext();
     }
     this.touchActivity();
     return this.generateContext;