Help: Choosing an AI Model
The exact model lists exposed by the plugin come from the backend at runtime. The names below reflect the curated lists shipped with the current backend (
server/src/providers/). Pricing and availability change over time — verify with each provider before relying on production cost estimates.
Decision factors
Choose based on:
- privacy requirements (cloud vs. local)
- quality expectations (description detail, keyword accuracy, edit recipe sanity)
- runtime per image and batch throughput
- per-image cost (cloud) or hardware cost (local)
- available local hardware (VRAM/RAM, Apple Silicon vs. discrete GPU)
Cloud models
Google Gemini
Configure in Plug-in Manager → API Keys → Gemini API key. Models exposed today:
gemini-2.5-flash-lite— cheapest and fastest; good for bulk keywording.gemini-2.5-flash— balanced default for analyze-and-index runs.gemini-2.5-pro— highest 2.5-tier quality; use for tricky scenes or when description quality matters more than throughput.gemini-3-flash-preview,gemini-3.1-flash-lite-preview,gemini-3.1-pro-preview— latest preview tier. Expect higher quality and better instruction following, but preview pricing/quotas can change.
The backend automatically tunes a thinking budget for gemini-2.5-* and
gemini-3-pro-preview, so you don’t need to configure that yourself.
OpenAI / ChatGPT
Configure in Plug-in Manager → API Keys → OpenAI API key. Models exposed:
gpt-4.1— proven vision quality; the safe baseline.gpt-5-nano,gpt-5-mini,gpt-5— current GPT-5 tier; picknano/minifor batch jobs andgpt-5for higher-fidelity descriptions.gpt-5.4-nano,gpt-5.4-mini,gpt-5.4,gpt-5.4-pro— newest GPT-5.4 tier;gpt-5.4-prois the highest-quality option but the most expensive.
Note: GPT-5 and GPT-5.4 models ignore the temperature slider and use a
fixed reasoning effort — small differences in plugin temperature settings
will not affect output for these models.
Vertex AI (embeddings only)
Vertex AI is used for the multimodalembedding@001 model that powers the
image_embeddings_vertex semantic-search collection. It is not an
alternative LLM for keywords/descriptions — pair it with a Gemini, ChatGPT,
or local provider for metadata generation. See
Google Vertex AI Login.
Local models
Local providers run on your own machine, so privacy is the strongest argument for using them. Quality of small open-weights vision models has improved significantly, but cloud frontier models still lead on tricky scenes.
Ollama
Install and start Ollama from ollama.com, then pull at least one vision-capable model. Recommended starting points:
ollama pull qwen3-vl:4b-instruct-q4_K_M # fast, ~6 GB VRAM
ollama pull qwen3-vl:8b-instruct-q4_K_M # better quality, ~10 GB VRAM
ollama pull gemma3:4b-it-q4_K_M # good general default
ollama pull gemma3:12b-it-q4_K_M # higher quality if you have VRAM
ollama pull llava # legacy fallback
Browse all vision models: ollama.com/search?c=vision. See Ollama Setup.
LM Studio
Download from lmstudio.ai, enable server mode, and download one or more vision models from inside the app. Recommended:
qwen/qwen3-vl-4b— fast baseline.qwen/qwen3-vl-8b— better description quality.`gemma-4-e4b/google/gemma3-12b— strong general-purpose options.
On Apple Silicon prefer the MLX variants of the same model — they run significantly faster than the GGUF builds. See LM Studio Setup.
Quick recommendations
| Workflow | Suggested first try |
|---|---|
| Cheap bulk keywording (cloud) | gemini-2.5-flash-lite or gpt-5-nano |
| Balanced default (cloud) | gemini-2.5-flash or gpt-5-mini |
| Best description quality (cloud) | gemini-2.5-pro, gpt-5.4, or gpt-5.4-pro |
| Privacy-first / no API billing | Ollama qwen3-vl:8b or LM Studio qwen3-vl-8b |
| Apple Silicon, local | LM Studio MLX build of qwen3-vl or gemma3 |
Practical recommendation
The dropdown in Analyze & Index and AI Edit always reflects what the backend currently advertises — newer models that ship with future backend updates will appear automatically. If a model you expect is missing, check that the corresponding API key or local server is configured and reachable from the backend (the Plugin Manager → Status section reports availability per provider).
When evaluating, run the same batch of 10–20 representative photos through two candidates and compare:
- keyword coverage and accuracy
- description quality and language correctness
- runtime per image and end-to-end batch time
- system load (local) or token cost (cloud)