S3OD: Towards Generalizable Salient Object
Detection with Synthetic Data

Large-Scale Synthetic Dataset for SOD • Ambiguity-Aware Architecture • State-of-the-Art Model

Orest Kupyn Hirokatsu Kataoka Christian Rupprecht

University of Oxford, VGG • AIST

📄 arXiv Paper 🚀 Live Demo 💻 Code 🤗 Dataset Viewer

🎯 TL;DR

We present two key contributions: (1) S3OD Dataset — 139K+ high-resolution synthetic images generated through a multi-modal diffusion pipeline that extracts labels from FLUX DiT features, concept attention maps, and DINO-v3 representations, and (2) Ambiguity-Aware Architecture — a streamlined model with multi-mask decoder that naturally handles inherent ambiguity in salient object detection. Our approach unifies DIS and HR-SOD tasks, achieving state-of-the-art performance with strong cross-dataset generalization.

Dataset & Model Highlights

Addressing the data bottleneck through synthetic data generation

139K+

Synthetic Images

2× larger than all existing SOD datasets combined

SOTA

Performance

State-of-the-art across DIS and HR-SOD benchmarks

1,676

Object Categories

Diverse scenes across multiple domains

Multi-Modal

Diffusion Pipeline

High-quality complex data with accurate annotations

The S3OD Dataset

Scroll through our diverse, high-quality synthetic samples

🎨 Multi-Modal Diffusion Pipeline

Our pipeline simultaneously generates images and masks by extracting multi-modal signals during diffusion:

FLUX DiT Features — Rich spatial understanding encoded during generation
Concept Attention Maps — Object-level focus from cross-attention layers
DINO-v3 Visual Features — Robust semantic representations from self-supervised learning

This ensures strong image-label alignment and high-quality annotations without teacher model bottlenecks.

🔄 Iterative Generation Framework

Our feedback-driven approach dynamically identifies model weaknesses and adapts the sampling distribution:

Performance Monitoring — Evaluate model on validation set to identify weak categories
Adaptive Sampling — Prioritize generation of challenging object categories
Continuous Improvement — Dataset quality improves iteratively as it grows

Unlike static methods, this enables targeted data generation where the model needs it most.

Animals

Camping

Vehicles

Cuisine

Sports

Furniture

Wildlife

Big Cats

← Scroll to explore more categories →

🔍 Explore in Dataset Viewer 💾 Download Dataset

Cross-Dataset Generalization

Synthetic pre-training resulting in strong real-world generalization

Method	Training Data	DAVIS-S F_m↑	HRSOD-TE F_m↑	DUTS-TE F_m↑	DUT-OMRON F_m↑
InSPyReNet	DIS-5K	.921	.891	.845	.713
BiRefNet	DIS-5K	.919	.887	.860	.744
MVANet	DIS-5K	.907	.902	.852	.711
S3OD (Ours)	DIS-5K	.951	.923	.902	.808
S3OD (Ours)	S3OD Synthetic Only	.970	.954	.937	.860

🚀 Try It Yourself!

Upload your own images and see S3OD in action with our interactive demo on HuggingFace Spaces.

🎯 Launch Interactive Demo

Get Started in Seconds

Simple Python API for state-of-the-art segmentation

# Install from GitHub
pip install git+https://github.com/KupynOrest/s3od.git

# Import and initialize
from s3od import BackgroundRemoval
from PIL import Image

# Initialize detector (automatically downloads model from HuggingFace)
detector = BackgroundRemoval()

# Load and process image
image = Image.open("your_image.jpg")
result = detector.remove_background(image)

# Save result with transparent background
result.rgba_image.save("output.png")

# Access predictions
best_mask = result.predicted_mask  # Best mask (H, W) numpy array
all_masks = result.all_masks       # All masks (N, H, W) numpy array
all_ious = result.all_ious         # IoU scores (N,) numpy array

# Load the S3OD dataset from HuggingFace
from datasets import load_dataset

dataset = load_dataset("okupyn/s3od_dataset")

# Access samples
for sample in dataset['train']:
    image = sample['image']
    mask = sample['mask']
    # Process your data...

💻 Browse Code on GitHub

Citation

If you find S3OD useful, please cite our work

@article{s3od2025,
  title={S3OD: Towards Generalizable Salient Object Detection with Synthetic Data},
  author={Kupyn, Orest and Kataoka, Hirokatsu and Rupprecht, Christian},
  journal={arXiv preprint arXiv:2510.21605},
  year={2025}
}

S3OD: Towards Generalizable Salient ObjectDetection with Synthetic Data