Meta Releases SAM: Foundation Model That Segments Any Object in Images

Meta AI Research has released the Segment Anything Model (SAM), a foundation model designed to revolutionize image segmentation by enabling users to identify and isolate any object within an image using simple prompts. The model represents a significant advancement in computer vision, offering zero-shot performance across diverse segmentation tasks without requiring task-specific training.​

What Makes SAM Different

GitHub - facebookresearch/sam2: The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

SAM introduces promptable segmentation capabilities that allow users to generate accurate object masks through intuitive inputs such as clicks, points, bounding boxes, or text descriptions. Unlike traditional segmentation models that require extensive training for specific tasks, SAM can segment previously unseen objects across different visual domains without custom adaptation. The model processes these prompts through a sophisticated three-component architecture: a Vision Transformer (ViT) image encoder, a prompt encoder, and a lightweight mask decoder that work together to produce pixel-precise segmentation masks.​

Training and Dataset Scale

The model was trained on SA-1B, the largest segmentation dataset ever created, containing over 1 billion masks across 11 million licensed, privacy-respecting images. Meta built this dataset using a model-in-the-loop data engine where human annotators worked alongside SAM through three stages: assisted manual annotation, semi-automatic refinement, and fully automatic mask generation. This iterative process enabled both the model and dataset to improve continuously, resulting in SAM's impressive generalization capabilities across medical imaging, industrial inspection, and natural scenes.​

Real-Time Performance

After computing image embeddings once, SAM delivers segmentation masks instantly for any prompt, enabling real-time interaction. The model runs efficiently across different backbone sizes - ViT-H (default), ViT-L, and ViT-B - allowing developers to balance performance with computational requirements. When the model encounters ambiguous scenarios, it can generate multiple valid masks, giving users flexibility in selecting the most appropriate segmentation.​

Evolution to SAM 2 and Beyond

GitHub - facebookresearch/segment-anything: The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model. -…

Meta has since released Segment Anything Model 2 (SAM 2), which extends the original's capabilities to video segmentation and real-time tracking. SAM 2 introduces a memory-augmented streaming architecture that maintains consistent mask propagation across video frames while treating images as single-frame videos. Most recently, Meta announced SAM 3 in November 2025, introducing unified detection, segmentation, and tracking capabilities with visual prompting features.​​

For developers interested in exploring 3D applications of Meta's segmentation technology, the company has also introduced capabilities for transforming 2D segmented images into complete 3D models through SAM 3D Objects.

Availability and Implementation

SAM 2 Demo | By Meta FAIR
Track an object across any video and create fun effects interactively, with as little as a single click on one frame.

SAM is available as open-source software under the Apache 2.0 license, with model checkpoints, inference code, and example notebooks accessible through Meta's GitHub repository. Developers can integrate SAM into applications with just a few lines of Python code, and the model's mask decoder can be exported to ONNX format for deployment in web browsers and other runtime environments. The combination of powerful segmentation capabilities, zero-shot generalization, and accessible implementation has made SAM a foundational tool for computer vision applications across research and industry.