Symptom
On Apple Silicon (M-series) Macs, when using "provider": "local" for memory search embeddings (via node-llama-cpp), the application crashes with a GGML Metal assertion error during graceful shutdown (Ctrl+C / SIGINT) or during auto-update restarts.
The crash produces the following error in Console.app:
/Users/runner/work/node-llama-cpp/node-llama-cpp/llama/llama.cpp/ggml/src/ggml-metal/ggml-metal-device.m:608: GGML_ASSERT([rsets->data count] == 0) failed WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
The stack trace shows the crash occurs during process exit: libggml-metal.so ggml_metal_device_init libggml-metal.so ggml_metal_device_free libsystem_c.dylib exit
Additional symptoms include:
- Gateway stops during auto-update and requires manual restart
- The crash is misinterpreted as a graceful exit
- Network interface change errors may also appear in logs before the crash
Root Cause Analysis
The root cause is a resource leak in the llama.cpp / node-llama-cpp integration with Apple Metal:
- When using local embeddings with
"provider": "local", embedding contexts are created vianode-llama-cpp - These embedding contexts hold Metal GPU resources (managed by
ggml_metal_device) - During process shutdown (SIGINT, SIGTERM, or auto-update restart), these contexts are not explicitly disposed
- When the Node.js process exits,
ggml_metal_device_free()is called bylibsystem_c.dylib’s__cxa_finalize_ranges - Metal asserts that all resources should have been released (
rsets->data count == 0) - Since embedding contexts still hold references, the assertion fails and crashes the process
This is a regression — the functionality worked before but now fails after recent changes to how local embeddings are handled.
Solution
Temporary Workaround (Environment Variables)
If immediate mitigation is needed, disable Metal GPU acceleration for embeddings:
Option 1: Set NODE_LLAMA_CPP_GPU_LAYERS=0
export NODE_LLAMA_CPP_GPU_LAYERS=0
Option 2: Disable Metal entirely (depending on node-llama-cpp version) export NODE_LLAMA_CPP_METAL=false
Add these to your shell profile (~/.zshrc or ~/.bashrc) for persistence.
Permanent Fix (Recommended)
Create a cleanup handler to ensure embedding contexts are properly disposed before process exit:
Step 1: Create a new file src/memory/local-cleanup-patch.ts:
import { getLlama } from ’node-llama-cpp';
const trackedContexts: any[] = [];
const originalCreate = getLlama.prototype.createEmbeddingContext; getLlama.prototype.createEmbeddingContext = async function (…args: any[]) { const ctx = await originalCreate.apply(this, args); trackedContexts.push(ctx); return ctx; };
async function cleanup() {
if (trackedContexts.length === 0) return;
console.log([cleanup] Disposing ${trackedContexts.length} embedding context(s));
for (const ctx of trackedContexts) {
if (ctx?.dispose) {
await ctx.dispose().catch(e => console.warn(’[cleanup] Dispose failed:’, e));
}
}
trackedContexts.length = 0;
}
process.once(‘SIGINT’, cleanup); process.once(‘SIGTERM’, cleanup); process.on(‘beforeExit’, cleanup);
Step 2: Update src/memory/node-llama.ts to import the cleanup patch:
export async function importNodeLlamaCpp() { // Automatically apply our shutdown fix when local embeddings are used await import(’./local-cleanup-patch’); return import(“node-llama-cpp”); }
This ensures that:
- All embedding contexts created via
createEmbeddingContextare tracked - On SIGINT, SIGTERM, or
beforeExit, all contexts are properly disposed - Metal resources are released before the process exits, preventing the assertion
Prevention
To prevent similar issues in the future:
-
Always dispose GPU resources explicitly — When working with Metal/CUDA contexts, ensure all resources are released before process exit, especially in shutdown handlers.
-
Add shutdown hooks for GPU-accelerated modules — Register cleanup handlers using
process.on('beforeExit'),process.once('SIGINT'), andprocess.once('SIGTERM')for any module that manages GPU resources. -
Test graceful shutdown — Include automated tests that trigger SIGINT/SIGTERM and verify the process exits cleanly without assertions or crashes.
-
Monitor for resource leaks — Use tools like
leakon macOS or Valgrind on Linux to detect unreleased resources during development. -
Document GPU cleanup requirements — When adding features that use Metal or CUDA, document any cleanup requirements in the code comments and troubleshooting guides.
Additional Information
| Item | Details |
|---|---|
| Affected Platform | macOS on Apple Silicon (M1/M2/M3 series) |
| Affected Components | node-llama-cpp, ggml-metal |
| Trigger Condition | Using "provider": "local" for memory search with graceful shutdown |
| Related llama.cpp PR | ggml-org/llama.cpp#17869 |
| Issue Classification | Regression (previously worked) |
| Severity | High (blocks auto-update feature) |
This issue specifically affects users who:
- Run OpenClaw on macOS with Apple Silicon
- Use local embeddings for memory search functionality
- Rely on the auto-update feature for continuous operation
The permanent fix should be implemented to ensure reliable operation of the auto-update system when local embeddings are in use.