Foundation Models: images as input in developer AI models

With iOS 27 and macOS Golden Gate, the Foundation Models framework lets developers pass images as direct input to on-device models, opening new scenarios for visual analysis apps without cloud dependency.

Multimodality in developers' hands

Until WWDC 2026, the Foundation Models framework Apple provides to developers primarily handled text input. The Platforms State of the Union and subsequent framework sessions confirmed that developers can now use images as direct input to on-device models. Engadget's live blog noted: "Developers can use images as input in the foundation models framework now."

Custom skills and server models

Alongside multimodal image support, Apple announced that developers can define "custom skills" — custom capabilities that extend model behavior — and access server models in addition to local ones. This significantly expands the application scope: no longer just text assistants, but apps that analyze photographs, scanned documents, or screenshots directly on-device.

Why it matters for privacy

The qualifying point remains on-device execution: an app that analyzes images through Foundation Models does not send photographic data to external servers. For sensitive categories — medical images, identity documents, personal photographs — this is a concrete distinction from cloud-based solutions. The technical sessions in the days following the keynote will clarify size and performance limits on the older devices compatible with iOS 27.

← Back to home