GGML Metal Assertion Crash on Apple Silicon During Shutdown with Local Embeddings

Symptom

On Apple Silicon (M-series) Macs, when using "provider": "local" for memory search embeddings (via node-llama-cpp), the application crashes with a GGML Metal assertion error during graceful shutdown (Ctrl+C / SIGINT) or during auto-update restarts.

The crash produces the following error in Console.app:

/Users/runner/work/node-llama-cpp/node-llama-cpp/llama/llama.cpp/ggml/src/ggml-metal/ggml-metal-device.m:608: GGML_ASSERT([rsets->data count] == 0) failed WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.

The stack trace shows the crash occurs during process exit: libggml-metal.so ggml_metal_device_init libggml-metal.so ggml_metal_device_free libsystem_c.dylib exit

Additional symptoms include:

Gateway stops during auto-update and requires manual restart
The crash is misinterpreted as a graceful exit
Network interface change errors may also appear in logs before the crash

Root Cause Analysis

The root cause is a resource leak in the llama.cpp / node-llama-cpp integration with Apple Metal:

When using local embeddings with "provider": "local", embedding contexts are created via node-llama-cpp
These embedding contexts hold Metal GPU resources (managed by ggml_metal_device)
During process shutdown (SIGINT, SIGTERM, or auto-update restart), these contexts are not explicitly disposed
When the Node.js process exits, ggml_metal_device_free() is called by libsystem_c.dylib’s __cxa_finalize_ranges
Metal asserts that all resources should have been released (rsets->data count == 0)
Since embedding contexts still hold references, the assertion fails and crashes the process

This is a regression — the functionality worked before but now fails after recent changes to how local embeddings are handled.

Solution

Temporary Workaround (Environment Variables)

If immediate mitigation is needed, disable Metal GPU acceleration for embeddings:

Option 1: Set NODE_LLAMA_CPP_GPU_LAYERS=0 export NODE_LLAMA_CPP_GPU_LAYERS=0

Option 2: Disable Metal entirely (depending on node-llama-cpp version) export NODE_LLAMA_CPP_METAL=false

Add these to your shell profile (~/.zshrc or ~/.bashrc) for persistence.

Permanent Fix (Recommended)

Create a cleanup handler to ensure embedding contexts are properly disposed before process exit:

Step 1: Create a new file src/memory/local-cleanup-patch.ts:

import { getLlama } from ’node-llama-cpp';

const trackedContexts: any[] = [];

const originalCreate = getLlama.prototype.createEmbeddingContext; getLlama.prototype.createEmbeddingContext = async function (…args: any[]) { const ctx = await originalCreate.apply(this, args); trackedContexts.push(ctx); return ctx; };

async function cleanup() { if (trackedContexts.length === 0) return; console.log([cleanup] Disposing ${trackedContexts.length} embedding context(s)); for (const ctx of trackedContexts) { if (ctx?.dispose) { await ctx.dispose().catch(e => console.warn(’[cleanup] Dispose failed:’, e)); } } trackedContexts.length = 0; }

process.once(‘SIGINT’, cleanup); process.once(‘SIGTERM’, cleanup); process.on(‘beforeExit’, cleanup);

Step 2: Update src/memory/node-llama.ts to import the cleanup patch:

export async function importNodeLlamaCpp() { // Automatically apply our shutdown fix when local embeddings are used await import(’./local-cleanup-patch’); return import(“node-llama-cpp”); }

This ensures that:

All embedding contexts created via createEmbeddingContext are tracked
On SIGINT, SIGTERM, or beforeExit, all contexts are properly disposed
Metal resources are released before the process exits, preventing the assertion

Prevention

To prevent similar issues in the future:

Always dispose GPU resources explicitly — When working with Metal/CUDA contexts, ensure all resources are released before process exit, especially in shutdown handlers.
Add shutdown hooks for GPU-accelerated modules — Register cleanup handlers using process.on('beforeExit'), process.once('SIGINT'), and process.once('SIGTERM') for any module that manages GPU resources.
Test graceful shutdown — Include automated tests that trigger SIGINT/SIGTERM and verify the process exits cleanly without assertions or crashes.
Monitor for resource leaks — Use tools like leak on macOS or Valgrind on Linux to detect unreleased resources during development.
Document GPU cleanup requirements — When adding features that use Metal or CUDA, document any cleanup requirements in the code comments and troubleshooting guides.

Additional Information

Item	Details
Affected Platform	macOS on Apple Silicon (M1/M2/M3 series)
Affected Components	`node-llama-cpp`, `ggml-metal`
Trigger Condition	Using `"provider": "local"` for memory search with graceful shutdown
Related llama.cpp PR	ggml-org/llama.cpp#17869
Issue Classification	Regression (previously worked)
Severity	High (blocks auto-update feature)

This issue specifically affects users who:

Run OpenClaw on macOS with Apple Silicon
Use local embeddings for memory search functionality
Rely on the auto-update feature for continuous operation

The permanent fix should be implemented to ensure reliable operation of the auto-update system when local embeddings are in use.

Symptom#

Root Cause Analysis#

Solution#

Temporary Workaround (Environment Variables)#

Permanent Fix (Recommended)#

Prevention#

Additional Information#