소스 검색

Use default createContext() options for better VRAM management

Let node-llama-cpp handle context size and sequences automatically.
The mutex still serializes generation calls for safety.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tobi Lutke 5 달 전
부모
커밋
c9ac3c1463
1개의 변경된 파일1개의 추가작업 그리고 2개의 파일을 삭제
  1. 1 2
      src/llm.ts

+ 1 - 2
src/llm.ts

@@ -361,8 +361,7 @@ export class LlamaCpp implements LLM {
       const llama = await this.ensureLlama();
       const modelPath = await this.resolveModel(this.generateModelUri);
       this.generateModel = await llama.loadModel({ modelPath });
-      // Use single sequence to minimize VRAM when multiple models are loaded
-      this.generateContext = await this.generateModel.createContext({ sequences: 1 });
+      this.generateContext = await this.generateModel.createContext();
     }
     this.touchActivity();
     return this.generateContext;