8 AI model architectures, visually explained

LLM (Large Language Models):

Text goes in, gets tokenized into embeddings, processed through transformers, and text comes out. ChatGPT, Claude, Gemini, Llama.

LCM (Large Concept Models)

Works at concept level, not tokens. Input is segmented into sentences, passed through SONAR embeddings, then uses diffusion before output. Meta’s LCM is the pioneer.

LAM (Large Action Models)

Turns intent into action. Input flows through perception, intent recognition, task breakdown, then action planning with memory before executing. Rabbit R1, Microsoft UFO, Claude Computer Use.

MoE (Mixture of Experts)

A router decides which specialized “experts” handle your query. Only relevant experts activate, results go through selection and processing. Mixtral, GPT-4, DeepSeek.

VLM (Vision-Language Models)

Images pass through a vision encoder, text through a text encoder. Both fuse in a multimodal processor, then a language model generates output. GPT-4V, Gemini Pro Vision, LLaVA.

SLM (Small Language Models)

LLMs optimized for edge devices. Compact tokenization, efficient transformers, and quantization for local deployment. Phi-3, Gemma, Mistral 7B, Llama 3.2 1B.

MLM (Masked Language Models)

Tokens get masked, converted to embeddings, then processed bidirectionally to predict hidden words. BERT, RoBERTa, DeBERTa power search and sentiment analysis.

SAM (Segment Anything Models)

Prompts and images go through separate encoders, feed into a mask decoder to produce pixel-perfect segmentation. Meta’s SAM powers photo editing, medical imaging, and autonomous vehicles.