<model>
element
Copyright
©
2026
the
Contributors
to
the
The
<model>
element
Specification,
published
by
the
Immersive
Web
Community
Group
under
the
W3C
Community
Contributor
License
Agreement
(CLA)
.
A
human-readable
summary
is
available.
The
<model>
element
allows
embedding
3D
graphical
content
inline
within
an
[
HTML
]
document
to
be
managed
declaratively,
and
rendered
directly
by
the
user
agent.
The
HTMLModelElement
interface
then
provides
a
simple
API
for
controlling
the
presentation,
animation,
interaction,
fetching,
and
UI
affordances
of
a
<model>
.
This specification was published by the Immersive Web Community Group . It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups .
This is a work in progress.
GitHub Issues are preferred for discussion of this specification.
The HTML <model> element allows a website to embed interactive 3D models as conveniently as any other visual media. Models are served as a standalone resource. The <model> element also has support for interaction and animation playback while presented within the page, and supports more spatial experiences, such as the stereoscopic display of 3D content where available.
As <model> is rendered directly by the user agent, it has the ability to utilize privileged information, such as the user's head and eye position or the lighting of the user's environment, without exposing that information to JavaScript. Additionally, the use of accessibility information and controls can be engaged in a privacy-preserving manner.
As <model> is embedded content , the user agent can present 3D content in a privacy-preserving, interactive, and accessible manner.
There are a number of specified non-goals for this initial incubation; some are beyond the long-term scope of this proposal, and others are deferred for the benefit of reaching consensus on an initial specification, which can hopefully serve the needs of users and authors sooner.
While the use cases for inspecting and manipulating the contents of a given model asset are clear, the initial scope of this proposal is intentionally limited to a reduced complexity, hopefully making the initial specification one that has broad consensus amongst the web community.
While many popular model formats include the ability to encode, play and mix multiple animations in an animation mixing system, This initial specification considers only a single animation timeline, both for simplicity and for having a minimal viable specification on which the web community can build on.
While many popular model formats support tapping on a 3D button inside their contents to show or hide additional contents like a title, play an animation or undertake some other stateful action. Because there is no current intention to provide a JavaScript-facing mechanism to be aware of such actions, stateful interactions should be considered out of scope.
While many popular model formats support the use of external asset requests for better re-use and consistency, this initial specification will only allow assets that are self-contained.
stagemode
attribute
is
set
to
anything
other
than
none
:
interactive
content
.
src
attribute:
transparent,
a
picture
or
img
,
or
a
media
element
descendant.
src
attribute:
Zero
or
more
source
elements,
then
transparent,
optionally
intermixed
with
script-supporting
elements.
autoplay
—
Hint
that
the
resource
can
be
started
automatically
when
the
page
is
loaded
stagemode
—
Allows
the
user
to
interact
with
the
model
in
a
specified
mode
crossorigin
—
How
the
element
handles
crossorigin
requests
height
—
Vertical
dimension
loading
—
Used
when
determining
loading
deferral
loop
—
Whether
to
loop
the
media
resource
poster
—
Poster
frame
to
show
while
the
resource
is
loading
src
—
Address
of
the
resource
width
—
Horizontal
dimension
HTMLModelElement
interface
provides
a
means
to
interface
with
the
embedded
resource.
The
element
is
used
for
embedding
3D
models
into
a
document.
model
Content
may
be
provided
inside
the
element.
User
agents
should
not
show
this
content
to
the
user;
it
is
intended
for
web
browsers
which
do
not
support
model
,
to
be
shown
as
fallback
content
.
model
HTML
defines
an
algorithm
to
determine
the
[=poster
frame=],
but
it's
<video>
specific.
Should
we
accommodate
it
support
<model>
or
specify
our
own?
We also need to consider what happens if the animation resets and the model is paused: does the poster show again? (i.e., do we follow video's behavior?)
Hint that the model can start its animation automatically, if present, when the asset is loaded.
Some
<model>
's
resources
can
be
significant
in
size.
As
such,
it
might
be
good
to
support
the
loading
attribute
to
allow
these
resources
to
be
lazy-loaded.
The
attribute
gives
the
URL
of
an
image
file
that
the
user
agent
can
show
while
3D
content
is
unavailable.
The
attribute,
if
present,
must
contain
a
valid
non-empty
URL
potentially
surrounded
by
spaces
.
poster
WebIDL[Exposed=Window]
interface HTMLModelElement : HTMLElement {
readonly attribute Promise<HTMLModelElement> ready;
readonly attribute DOMPointReadOnly boundingBoxCenter;
readonly attribute DOMPointReadOnly boundingBoxExtents;
attribute DOMMatrixReadOnly entityTransform;
attribute USVString environmentMap;
readonly attribute Promise<undefined> environmentMapReady;
[Reflect=stagemode] attribute DOMString stageMode;
};
source
element's
parent
is
a
model
element
The
<source>
element
behaves
differently
depending
on
who
the
parent
is.
For
instance,
when
the
parent
is
<picture>
,
the
srcset
attribute
comes
into
to
play.
We
need
to
look
at
the
attributes
of
<source>
and
figure
out
what
they
mean
in
when
used
in
the
context
of
<model>
.
In
addition
to
emitting
standard
input
events,
model
may
interpret
input
events
according
to
specific
stage
modes,
including
none
and
orbit
.
The default value for stageMode. In this this mode, input events do not have any direct action on the behavior of the element.
In
this
mode,
input
events
in
a
horizontal
direction
result
in
a
rotation
of
the
model
about
the
Y
axis,
and
events
in
a
vertical
direction
result
in
a
rotation
about
the
horizontal
axis.
This
is
reflected
in
the
model's
entityTransform
value.
In
this
mode,
the
entityTransform
value
is
read-only.
Setting the mode to orbit results in a change of the position and scale to the orbit fit mode.
glTF is not a run-time format. It does not define what an application should do with a model once it is loaded and rendered. It does provide some capabilities that a run-time engine may use to enhance the user experience. glTF currently does not store any interactivity information. Currently that is solely a run-time determination. The run-time determines what parts (if any) of the model may be active and the behavior based on any trigger.
Like Interactivity, animation is not built-into glTF. glTF files may contain animation parameters that specify the type of animation (e.g., morph, skin & bones, etc.) and the associated parameters needed to perform the animation. There is nothing in the glTF specification that defines how one animation interacts with another. For example, a human model may include walk, jump, and drop animations; but it is unlikely that they should all be played at the same time.
Any HTML element that wishes to handle animation as stored in a glTF file needs to understand how the content creator intended the animation to play.
As with other media elements (again #13 ), having "controls" for media specific things can be extremely helpful for accessibility (and just generally helpful for developers not needing to deal with things like the fullscreen API).
It
would
be
nice
to
consider
adding
support
for
controls
and
then
leaving
it
mostly
to
the
UA
as
to
what
those
controls
are...
we
could
figure
out
a
standard
set
of
things,
like
<video>
provides.
The model SHOULD be rendered according to a realtime, physically-based rendering (PBR) shading model, and lit by an image-based light.
If provided, an environment map MUST be interpreted as an equirectangular environment map. If an environmentMap is not specified, the User Agent MUST provide an appropriate map.
I agree that it is very good to make it easy for people to display 3D content in a web page. I completely disagree with the methods and processes described in this proposal to make it an HTML element. HTML elements need to be fully defined so that they can be similarly implemented across browsers and reflect what people would see in applications outside of browsers. The process of rendering a high-quality model requires proper handling and rendering of the model's geometry, appearance, animation, and interaction.
My knowledge is in glTF (and glTF binary) so these comments may or may not reflect on the capabilities of USDZ. I will address the topics as separate issues: Appearance and Animation / interactivity; with respect to 3D models in glTF format. Static geometry is pretty straight-forward and not subject to much interpretation.
The really difficult part is appearance. The document states that "it is impractical to define a pixel accurate rendering..." for models. However, this is really important. Khronos has done extensive work in the 3D Commerce Working Group towards pixel accurate rendering across multiple 3D viewers ( https://www.khronos.org/3dcommerce/certification/ ). The accuracy was demanded by retailers so their products would appear visually identical across different web sites. There were so many factors that mattered in producing acceptable renderings that include lighting, rendering calculations (including equation approximations), conversion from GPU to display, and tone mapping.
The component that caused the most issues and difficulties is lighting. A model built for physically-based rendering looks best in a complex lighting environment. This is usually done with image based lighting, but punctual plus area lights will also work. The statement that "A future version ... will describe the lighting model and environment .... Both items will require community collaboration and some consensus." makes the process sound much easier that Khronos found it to be.
Some issues that came from the Certification work. Note that the Certification program did not solve all of these in the initial release.
It may be possible to construct an initial release without resolving all of these items.
The
Oculus
browser
is
displayed
on
a
curved
surface.
How
we
envision
the
display
of
multiple
models?
Would
we
allow
them
to
bump
into
each
other
or
would
there
be
clipping?
The
The
position,
rotation,
and
scale
of
a
displayed
model
MUST
present
its
contents
according
to
its
entityTransform
property,
a
DOMMatrixReadOnly
that
can
be
composed
using
that
object's
existing
API.
Updates
to
the
entityTransform
SHOULD
be
reflected
on
the
next
rendered
frame.
On
the
initial
load
for
a
model,
the
entityTransform
MUST
be
set
so
that
the
object
is
fully
in
view
within
the
model
element's
width
and
height
on
the
page.
The bounding box calculation algorithm consists of the following steps.
With the model scene loaded, set the animation to the first frame if present.
let
max
be
a
new
DOMPoint
with
values
of
-Infinity
.
let
min
be
a
new
DOMPoint
with
values
of
Infinity
.
let queue be an empty list of elements.
Add the model's root object to queue .
While queue is not empty:
set element to the last element in queue and remove it from queue .
If element contains any child elements, add them to queue .
If element contains renderable geometry, find the minimum and maximum value for the X, Y and Z locations of that geometry.
Apply the world matrix of element to the bounding box of its geometry.
set each value of min to the minimum of its current value and the minimum for this element's bounding box.
set each value of max to the maximum of its current value and the maximum for this element's bounding box.
Set the values of boundingBoxCenter to be the mean of each values of min and max .
Set the values of boundingBoxExtents to be each value of min subtracted from max .
The initial model fit algorithm consists of the following steps.
Retrieve the bounds of the smallest axis-aligned box that contains the geometry of the object using the bounding box calculation algorithm.
Let
extents
be
the
boundingBoxExtents
of
the
resource.
Let
center
be
the
boundingBoxCenter
of
the
resource.
Divide
extents.x
by
the
model's
width
in
the
viewport.
This
is
the
X-scale.
Divide
extents.y
by
the
model's
height
in
the
viewport.
This
is
the
Y-scale.
scale
the
entityTransform
to
be
the
minimum
of
the
X-scale
and
Y-scale.
Set
the
entityTransform
to
be
centered
on
center.x
,
center.y
and
set
back
from
center.z
by
extents.z
/
2
,
so
that
the
full
extents
are
visible
and
set
directly
behind
the
viewport.
The
orbit
fit
algorithm
is
triggered
when
the
model's
stagemode
is
set
to
orbit
.
The
orbit
fit
algorithm
consists
of
the
following
steps.
Retrieve the bounds of the smallest axis-aligned box that contains the geometry of the object.
Let
extents
be
the
boundingBoxExtents
of
the
resource.
Let
center
be
the
boundingBoxCenter
of
the
resource.
Let length be the *length* of the extents of extents .
Divide
length
by
the
model's
width
in
the
viewport.
This
is
the
X-scale.
Divide
length
by
the
model's
height
in
the
viewport.
This
is
the
Y-scale.
scale
the
entityTransform
to
be
the
minimum
of
the
X-scale
and
Y-scale.
Set
the
entityTransform
to
be
centered
on
center.x
,
center.y
and
set
back
from
center.z
by
extents.z
/
2
,
so
that
the
full
extents
are
visible
and
set
directly
behind
the
viewport,
and
will
remain
in
view
at
any
orientation.
What's
the
default
CSS
style
for
a
model
element?
Should
it
have
a
border
around
it?
what
about
background
color?
etc.
Whether
a
element
is
exposing
a
user
interface
is
not
expected
to
affect
the
size
of
the
rendering;
controls
are
expected
to
be
overlaid
above
the
page
content
without
causing
any
layout
changes,
and
may
disappear
when
the
user
does
not
need
them.
model
When
a
element
represents
a
poster
frame,
the
poster
frame
is
expected
to
be
rendered
at
the
largest
size
that
maintains
the
aspect
ratio
of
that
poster
frame
without
being
taller
or
wider
than
the
model
element
itself,
and
is
expected
to
be
centered
in
the
model
element.
model
The
environmentMapReady
Promise
resolves
when
an
environment
map
resource
has
been
loaded,
or
is
rejected
if
the
resource
is
unable
to
be
loaded.
The
element
emits
a
model
ready
Promise
when
the
model
is
processed
and
ready
to
display.
The
Promise
is
rejected
if
the
model
source
cannot
be
loaded.
Need
to
investigate
what
formats
are
suitable
for
model
.
We
might
need
some
kind
of
evaluation
matrix.
Model
can
support
multiple
formats
out
of
the
box,
but
it
might
be
good
to
evaluate
what
is
best
of
users
and
developers
and
why.
The
<model>
element
shares
a
lot
of
similarities
with
the
<audio>
and
<video>
elements,
yet
it's
distinct
in
some
ways
(we
need
to
tease
these
out).
It's
similar
in
being
potentially
temporal
multimedia
content
(i.e.,
it
has
audio,
it
potentially
animates
over
time).
We
need
to
figure
out
if
model
sufficiently
different
to
warrant
being
its
own
element
class,
or
if
it
can
reuse
much
of
"media
element"'s
infrastructure.
Additional integrations into HTML:
Need
same
behavior
as
audio
and
video
when
including
into
a
p
element
with
no
end
tag.
Need
to
specify
that
model
is
an
appropriate
child
of
<figure>
.
Model
elements
in
fullscreen
mode,
as
invoked
through
model.reqeustFullscreen(),
are
presented
in-line
in
the
full-screen
window.
The formats that model support can fetch a lot of other resources. We probably need a new fetch destination ("model").
We need to investigate what the privacy implications are of each model format we will recommend. The model formats themselves can fetch resources, so we need to put a privacy and security framework around what schemes they can fetch (https only, for instance). We also need to say what all the fetch policies are. Need to investigate if the formats provide any guidance here, or if they leave it up to the implementation. If they do, we need to specify it (i.e., don't send cookies, don't leak the referrer, etc.).
Need to clarify that 3D resources can fetch resources, and as such need to be subject the document's CSP (probably "media-src"). However, we need to clarify what this means in relation to, say, "img-src", for example... as models can load png/jpg textures.
Given
the
close
relationship
to
media
elements,
and
given
the
reliance
on
<source>
elements,
we
could
just
say
that
media-src
applies
to
<model>
too.
Need to describe that each format will come with its own security considerations (and link to the appropriate security considerations in their respective specs).
We
need
to
figure
out
how
to
make
<model>
accessible
on
a
number
of
different
fronts:
Usually, this would be provide by the embedded format... however, it appears that both glTF and USDZ are quite limited when it comes to accessibility.
As
such,
it
may
be
that
we
need
to
leverage
what
we
can
from
HTML
+
ARIA
to
overcome
the
shortcomings
of
these
formats.
We
have
quite
a
bit
of
precedent
(e.g.,
from
the
humble,
yet
limited,
alt
attribute,
to
how
<canvas>
can
be
made
accessibly,
to
the
potential
inclusion
of
<track>
elements,
and
so
on).
We need to define how what the ARIA semantics are and what is exposed (application probably). We need to coordinate with the accessibility folks + get this added to the HTML Accessibility API Mappings.
| [ wai-aria-1.2 ] | No corresponding role |
|---|---|
| MSAA + IAccessible2 |
Not
mapped
|
| UIA |
Not
mapped
|
| ATK |
Not
mapped
|
| AX |
Not
mapped
|
| Comments |
We
need
to
check
if
there
are
any
relevant
MIME
parameters
for
model/*
content
(if
any).
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY , MUST , and SHOULD in this document are to be interpreted as described in BCP 14 [ RFC2119 ] [ RFC8174 ] when, and only when, they appear in all capitals, as shown here.
This section is non-normative.
The following are some significant changes that were made since the initial proposal:
Referenced in:
autoplay
§4.1
boundingBoxCenter
attribute
for
HTMLModelElement
§5.
boundingBoxExtents
attribute
for
HTMLModelElement
§5.
controls
§4.3
crossorigin
§4.4
entityTransform
attribute
for
HTMLModelElement
§5.
environmentMap
attribute
for
HTMLModelElement
§5.
environmentMapReady
attribute
for
HTMLModelElement
§5.
height
§4.5
HTMLModelElement
interface
§5.
loading
§4.6
loop
§4.7
model
§4.
poster
§4.8
ready
attribute
for
HTMLModelElement
§5.
src
§4.9
stagemode
§4.2
stageMode
attribute
for
HTMLModelElement
§5.
width
§4.10
DOMMatrixReadOnly
interface
DOMPointReadOnly
interface
HTMLElement
interface
img
element
picture
element
[Reflect]
extended
attribute
DOMString
interface
[Exposed]
extended
attribute
Promise
interface
undefined
type
USVString
interface