Skip to content

feat: add MLX local model support for Apple Silicon#544

Open
wangericx wants to merge 2 commits intovirattt:mainfrom
wangericx:feat/mlx-local-model-support
Open

feat: add MLX local model support for Apple Silicon#544
wangericx wants to merge 2 commits intovirattt:mainfrom
wangericx:feat/mlx-local-model-support

Conversation

@wangericx
Copy link
Copy Markdown

@wangericx wangericx commented Mar 15, 2026

Enables running LLMs locally on Apple Silicon (M1/M2/M3/M4) via the mlx-lm library, with no API key required.

  • src/utils/mlx_lm.py: MLX inference engine wrapping mlx-lm generate, exposes a LangChain-compatible chat interface

  • src/llm/mlx_models.json: curated list of MLX-compatible HuggingFace model IDs (Llama, Mistral, Gemma, Qwen, Phi families)

  • src/llm/models.py: register 'mlx' as a provider; get_model() returns MLX chat model when provider is 'mlx'

  • src/utils/llm.py: pass MLX_API_KEY / mlx base URL through LangChain call_options so the inference server URL is configurable

  • app/backend/routes/mlx.py: GET /mlx/models endpoint returns available MLX models for the frontend model selector

  • app/frontend/src/components/settings/models/mlx.tsx: UI panel to configure the MLX server URL and browse available models

  • app/frontend/src/data/models.ts: fetch and merge MLX models into the global model list used by all node selectors

  • docker/run.sh: pass MLX_API_KEY env var through to the container

    Summary

    • Run LLMs locally on Apple Silicon (M1/M2/M3/M4) via mlx-lm — no API key required
    • Add src/utils/mlx_lm.py: MLX inference engine with a LangChain-compatible chat interface
    • Add src/llm/mlx_models.json: curated list of MLX-compatible models (Llama, Mistral, Gemma, Qwen, Phi)
    • Register mlx as a provider in src/llm/models.py; get_model() returns MLX model when provider is mlx
    • Add GET /mlx/models backend endpoint to serve available models to the frontend
    • Add MLX settings panel in the web UI to configure the server URL and browse models
    • Frontend model selector automatically includes MLX models when the MLX server is reachable
    • Pass MLX_API_KEY env var through Docker

    Test plan

    • Install mlx-lm and start a local MLX server
    • Verify MLX models appear in the web UI model selector
    • Run a full analysis using an MLX model end-to-end
    • Verify non-Apple-Silicon machines gracefully skip MLX (no crash)

Eric and others added 2 commits March 15, 2026 10:22
Enables running LLMs locally on Apple Silicon (M1/M2/M3/M4) via the
mlx-lm library, with no API key required.

- src/utils/mlx_lm.py: MLX inference engine wrapping mlx-lm generate,
  exposes a LangChain-compatible chat interface
- src/llm/mlx_models.json: curated list of MLX-compatible HuggingFace
  model IDs (Llama, Mistral, Gemma, Qwen, Phi families)
- src/llm/models.py: register 'mlx' as a provider; get_model() returns
  MLX chat model when provider is 'mlx'
- src/utils/llm.py: pass MLX_API_KEY / mlx base URL through LangChain
  call_options so the inference server URL is configurable
- app/backend/routes/mlx.py: GET /mlx/models endpoint returns available
  MLX models for the frontend model selector
- app/frontend/src/components/settings/models/mlx.tsx: UI panel to
  configure the MLX server URL and browse available models
- app/frontend/src/data/models.ts: fetch and merge MLX models into the
  global model list used by all node selectors
- docker/run.sh: pass MLX_API_KEY env var through to the container

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevents backend crash when a previously-selected MLX (or any removed)
model is sent to a backend that no longer supports it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant