1. Scope
[AV1] defines the syntax and semantics of an AV1 bitstream . The AV1 Image File Format ( AVIF ) defined in this document supports the storage of a subset of the syntax and semantics of an AV1 bitstream in a [HEIF] file. The AV1 Image File Format defines multiple profiles, which restrict the allowed syntax and semantics of the AV1 bitstream with the goal to improve interoperability, especially for hardware implementations. The profiles defined in this specification follow the conventions of the [MIAF] specification. Images encoded with [AV1] and not meeting the restrictions of the defined profiles may still be compliant to this AV1 Image File Format if they adhere to the general AVIF requirements.
The AV1 Image File Format supports High Dynamic Range (HDR) and Wide Color Gamut (WCG) images as well as Standard Dynamic Range (SDR). It supports monochrome images as well as multi-channel images with all the bit depths and color spaces specified in [AV1] , and other bit depths with Sample Transform Derived Image Items . The AV1 Image File Format also supports transparency (alpha) and other types of data such as depth maps through auxiliary AV1 bitstreams .
The AV1 Image File Format also supports multi-layer images as specified in [AV1] to be stored both in image items and image sequences. The AV1 Image File Format supports progressive image decoding through layered images.
An AVIF file is designed to be a conformant [HEIF] file for both image items and image sequences. Specifically, this specification follows the recommendations given in "Annex I: Guidelines On Defining New Image Formats and Brands" of [HEIF] .
This specification reuses syntax and semantics used in [AV1-ISOBMFF] .
2. Image Items and properties
2.1. AV1 Image Item
When an item is of type av01 , it is called an AV1 Image Item , and shall obey the following constraints:
- The AV1 Image Item shall be a conformant MIAF image item .
-
The
AV1
Image
Item
shall
be
associated
with
an
AV1ItemConfigurationProperty. -
The content of an AV1 Image Item is called the AV1 Image Item Data and shall obey the following constraints:
-
The
AV1
Image
Item
Data
shall
be
identical
to
the
content
of
an
AV1
Sample
marked
as
' sync ', as defined in [AV1-ISOBMFF] . -
The
AV1
Image
Item
Data
shall
have
exactly
one
Sequence
Header
OBU
.
NOTE: File writers may want to set the
still_pictureandreduced_still_picture_headerflags to 1 when possible in the Sequence Header OBU part of the AV1 Image Item Data so that AV1 header overhead is minimized.
-
The
AV1
Image
Item
Data
shall
be
identical
to
the
content
of
an
AV1
Sample
marked
as
2.2. Image Item Properties
2.2.1. AV1 Item Configuration Property
Box Type: av1C Property type: Descriptive item property Container: ItemPropertyContainerBox Mandatory (per item): Yes, for an image item of type'av01', no otherwise Quantity (per item): One for an image item of type'av01', zero otherwise
The
syntax
and
semantics
of
the
AV1ItemConfigurationProperty
are
identical
to
those
of
the
AV1CodecConfigurationBox
defined
in
[AV1-ISOBMFF]
,
with
the
following
constraints:
-
Sequence
Header
OBUs
should
not
be
present
in
the
AV1ItemConfigurationProperty. -
If
a
Sequence
Header
OBU
is
present
in
the
AV1ItemConfigurationProperty, it shall match the Sequence Header OBU in the AV1 Image Item Data . -
The
values
of
the
fields
in
the
AV1ItemConfigurationPropertyshall match those of the Sequence Header OBU in the AV1 Image Item Data . -
The
values
of
the
bit
depth
and
the
number
of
channels
derived
from
the
AV1ItemConfigurationPropertyshall match thePixelInformationProperty(' pixi ') if present. -
Metadata
OBUs
,
if
present,
shall
match
the
values
given
in
other
item
properties
,
such
as
the
MasteringDisplayColourVolumeBox(' mdcv ') orContentLightLevelBox(' clli ').
2.2.2. Image Spatial Extents Property
The
semantics
of
the
'
ispe
'
property
as
defined
in
[HEIF]
apply.
More
specifically,
for
[AV1]
images,
the
values
of
image_width
and
image_height
shall
respectively
equal
the
values
of
UpscaledWidth
and
FrameHeight
as
defined
in
[AV1]
but
for
a
specific
frame
in
the
item
payload.
The
exact
frame
depends
on
the
presence
and
content
of
the
'
lsel
'
and
OperatingPointSelectorProperty
properties
as
follows:
-
In the absence of a
' lsel 'property associated with the item, or if it is present and itslayer_idvalue is set to 0xFFFF:-
If no
OperatingPointSelectorPropertyis associated with the item, the' ispe 'property shall document the dimensions of the last frame decoded when processing the operating point whose index is 0 . -
If an
OperatingPointSelectorPropertyis associated with the item, the' ispe 'property shall document the dimensions of the last frame decoded when processing the corresponding operating point .
NOTE: The dimensions of possible intermediate output images might not match the ones given in the
' ispe 'property. If renderers display these intermediate images, they are expected to scale the output image to match the' ispe 'property. -
-
If a
' lsel 'property is associated with an item and itslayer_idis different from 0xFFFF, the' ispe 'property documents the dimensions of the output frame produced by decoding the corresponding layer.
NOTE:
The
dimensions
indicated
in
the
'
ispe
'
property
might
not
match
the
values
max_frame_width_minus1
+1
and
max_frame_height_minus1
+1
indicated
in
the
AV1
bitstream.
NOTE:
The
values
of
render_width_minus1
and
render_height_minus1
possibly
present
in
the
AV1
bitstream
are
not
exposed
at
the
AVIF
container
level.
2.2.3. Clean Aperture Property
The
semantics
of
the
clean
aperture
property
(
'
clap
'
)
as
defined
in
[HEIF]
apply.
In
addition
to
the
restrictions
on
transformative
item
property
ordering
specified
in
[MIAF]
,
the
following
restriction
also
applies:
'
clap
'
item
property
shall
be
anchored
to
0,0
(top-left)
of
the
input
image
unless
the
full,
un-cropped
image
item
is
included
as
a
secondary
non-hidden
image
item
.
2.2.4. Other Item Properties
In addition to the Image Properties defined in this document, AV1 image items may also be associated with item properties defined in other specifications such as [HEIF] and [MIAF] . Commonly used item properties can be found in § 9.1.1 Minimum set of boxes and § 9.1.2 Requirements on additional image item related boxes .
In
general,
it
is
recommended
to
use
item
properties
instead
of
Metadata
OBUs
in
the
AV1ItemConfigurationProperty
.
2.3. AV1 Layered Image Items
2.3.1. Overview
[AV1] supports encoding a frame using multiple spatial layers. A spatial layer may improve the resolution or quality of the image decoded based on one or more of the previous layers. A layer may also provide an image that does not depend on the previous layers. Additionally, not all layers are expected to produce an image meant to be rendered. Some decoded images may be used only as intermediate decodes. Finally, layers are grouped into one or more Operating Points . The Sequence Header OBU defines the list of Operating Points , provides required decoding capabilities, and indicates which layers form each Operating Point .
[AV1]
delegates
the
selection
of
which
Operating
Point
to
process
to
the
application,
by
means
of
a
function
called
choose_operating_point()
.
AVIF
defines
the
OperatingPointSelectorProperty
to
control
this
selection.
In
the
absence
of
an
OperatingPointSelectorProperty
associated
with
an
AV1
Image
Item
,
the
AVIF
renderer
is
free
to
process
any
Operating
Point
present
in
the
AV1
Image
Item
Data
.
In
particular,
when
the
AV1
Image
Item
is
composed
of
a
unique
Operating
Point
,
the
OperatingPointSelectorProperty
should
not
be
present
.
If
an
OperatingPointSelectorProperty
is
associated
with
an
AV1
Image
Item
,
the
op_index
field
indicates
which
Operating
Point
is
expected
to
be
processed
for
this
item.
NOTE:
When
an
author
wants
to
offer
the
ability
to
render
multiple
Operating
Points
from
the
same
AV1
image
(e.g.
in
the
case
of
multi-view
images),
multiple
AV1
Image
Items
can
be
created
that
share
the
same
AV1
Image
Item
Data
but
have
different
OperatingPointSelectorProperties
.
[AV1]
expects
the
renderer
to
display
only
one
frame
within
the
selected
Operating
Point
,
which
should
be
the
highest
spatial
layer
that
is
both
within
the
Operating
Point
and
present
within
the
temporal
unit,
but
[AV1]
leaves
the
option
for
other
applications
to
set
their
own
policy
about
which
frames
are
output,
as
defined
in
the
general
output
process.
AVIF
sets
a
different
policy,
and
defines
how
the
'
lsel
'
property
(mandated
by
[HEIF]
for
layered
images)
is
used
to
control
which
layer
is
rendered.
According
to
[HEIF]
,
the
interpretation
of
the
layer_id
field
in
the
'
lsel
'
property
is
codec
specific.
In
this
specification,
the
value
0xFFFF
is
reserved
for
a
special
meaning.
If
a
'
lsel
'
property
is
associated
with
an
AV1
Image
Item
but
its
layer_id
value
is
set
to
0xFFFF,
the
renderer
is
free
to
render
either
only
the
output
image
of
the
highest
spatial
layer,
or
to
render
all
output
images
of
all
the
intermediate
layers
and
the
highest
spatial
layer,
resulting
in
a
form
of
progressive
decoding.
If
a
'
lsel
'
property
is
associated
with
an
AV1
Image
Item
and
the
value
of
layer_id
is
not
0xFFFF,
the
renderer
is
expected
to
render
only
the
output
image
for
that
layer.
NOTE:
When
such
a
progressive
decoding
of
the
layers
within
an
Operating
Point
is
not
desired
or
when
an
author
wants
to
expose
each
layer
as
a
specific
item,
multiple
AV1
Image
Items
sharing
the
same
AV1
Image
Item
Data
can
be
created
and
associated
with
different
'
lsel
'
properties,
each
with
a
different
value
of
layer_id
.
2.3.2. Properties
2.3.2.1. Operating Point Selector Property
2.3.2.1.1. Definition
Box Type: a1op Property type: Descriptive item property Container: ItemPropertyContainerBox Mandatory (per item): No Quantity (per item): Zero or one
2.3.2.1.2. Description
An OperatingPointSelectorProperty may be associated with an AV1 Image Item to provide the index of the operating point to be processed for this item. If associated, it shall be marked as essential.
2.3.2.1.3. Syntax
class OperatingPointSelectorProperty extends ItemProperty ( 'a1op ') { unsigned int ( 8 ) op_index ; }
2.3.2.1.4. Semantics
op_index
indicates
the
index
of
the
operating
point
to
be
processed
for
this
item.
Its
value
shall
be
between
0
and
operating_points_cnt_minus_1
inclusive.
2.3.2.2. Layer Selector Property
The
'
lsel
'
property
defined
in
[HEIF]
may
be
associated
with
an
AV1
Image
Item
.
The
layer_id
indicates
the
value
of
the
spatial_id
to
render.
The
value
shall
be
between
0
and
3,
or
the
special
value
0xFFFF.
When
a
value
between
0
and
3
is
used,
the
corresponding
spatial
layer
shall
be
present
in
the
bitstream
and
shall
produce
an
output
frame
.
Other
layers
may
be
needed
to
decode
the
indicated
layer.
When
the
special
value
0xFFFF
is
used,
progressive
decoding
is
allowed
as
described
in
§ 2.3.1
Overview
.
2.3.2.3. Layered Image Indexing Property
2.3.2.3.1. Definition
Box Type: a1lx Property type: Descriptive item property Container: ItemPropertyContainerBox Mandatory (per item): No Quantity (per item): Zero or one
2.3.2.3.2. Description
The AV1LayeredImageIndexingProperty property may be associated with an AV1 Image Item . It should not be associated with AV1 Image Items consisting of only one layer.
The
AV1LayeredImageIndexingProperty
documents
the
size
in
bytes
of
each
layer
(except
the
last
one)
in
the
AV1
Image
Item
Data
,
and
enables
determining
the
byte
ranges
required
to
process
one
or
more
layers
of
an
Operating
Point
.
If
associated,
it
shall
not
be
marked
as
essential.
2.3.2.3.3. Syntax
class AV1LayeredImageIndexingProperty extends ItemProperty ( 'a1lx ') { unsigned int ( 7 ) reserved = 0 ; unsigned int ( 1 ) large_size ; FieldLength = ( large_size + 1 ) * 16 ; unsigned int ( FieldLength ) layer_size [ 3 ]; }
2.3.2.3.4. Semantics
layer_size
indicates
the
number
of
bytes
corresponding
to
each
layer
in
the
item
payload,
except
for
the
last
layer.
Values
are
provided
in
increasing
order
of
spatial_id
.
A
value
of
zero
means
that
all
the
layers
except
the
last
one
have
been
documented
and
following
values
shall
be
0
.
The
number
of
non-zero
values
shall
match
the
number
of
layers
in
the
image
minus
one.
NOTE: The size of the last layer can be determined by subtracting the sum of the sizes of all layers indicated in this property from the entire item size.
spatial_id
for
the
first
layer
does
not
necessarily
match
the
index
in
the
array
that
provides
the
size.
In
other
words,
in
this
case
the
index
giving
value
X
is
0,
but
the
corresponding
spatial_id
could
be
0,
1
or
2.
Similarly,
a
property
indicating
[X,Y,0]
is
used
for
an
image
made
of
3
layers.
3. Image Sequences
An AV1 Image Sequence is defined as a set of AV1 Temporal Units stored in an AV1 track as defined in [AV1-ISOBMFF] with the following constraints:
- The track shall be a valid MIAF image sequence .
-
The
track
handler
for
an
AV1
Image
Sequence
shall
be
' pict '. - The track shall have only one AV1 Sample description entry.
-
If
multiple
Sequence
Header
OBUs
are
present
in
the
track
payload,
they
shall
be
identical.
4. Other Image Items and Sequences
4.1. Auxiliary Image Items and Sequences
An AV1 Auxiliary Image Item (respectively an AV1 Auxiliary Image Sequence ) is an AV1 Image Item (respectively AV1 Image Sequence ) with the following additional constraints:
- It shall be a compliant MIAF Auxiliary Image Item (respectively MIAF Auxiliary Image Sequence ).
-
The
mono_chromefield in the Sequence Header OBU shall be set to 1. -
The
color_rangefield in the Sequence Header OBU shall be set to 1.
An
AV1
Alpha
Image
Item
(respectively
an
AV1
Alpha
Image
Sequence
)
is
an
AV1
Auxiliary
Image
Item
(respectively
an
AV1
Auxiliary
Image
Sequence
),
and
as
defined
in
[MIAF]
,
with
the
aux_type
field
of
the
AuxiliaryTypeProperty
(respectively
AuxiliaryTypeInfoBox
)
set
to
urn:mpeg:mpegB:cicp:systems:auxiliary:alpha
.
An
AV1
Alpha
Image
Item
(respectively
an
AV1
Alpha
Image
Sequence
)
shall
be
encoded
with
the
same
bit
depth
as
the
associated
master
AV1
Image
Item
(respectively
AV1
Image
Sequence
).
For
AV1
Alpha
Image
Items
and
AV1
Alpha
Image
Sequences
,
the
ColourInformationBox
(
'
colr
'
)
should
be
omitted.
If
present,
readers
shall
ignore
it.
An
AV1
Depth
Image
Item
(respectively
an
AV1
Depth
Image
Sequence
)
is
an
AV1
Auxiliary
Image
Item
(respectively
an
AV1
Auxiliary
Image
Sequence
),
and
as
defined
in
[MIAF]
,
with
the
aux_type
field
of
the
AuxiliaryTypeProperty
(respectively
AuxiliaryTypeInfoBox
)
set
to
urn:mpeg:mpegB:cicp:systems:auxiliary:depth
.
NOTE:
[AV1]
supports
encoding
either
3-component
images
(whose
semantics
are
given
by
the
matrix_coefficients
element),
or
1-component
images
(monochrome).
When
an
image
requires
a
different
number
of
components,
multiple
auxiliary
images
may
be
used,
each
providing
additional
component(s),
according
to
the
semantics
of
their
aux_type
field.
In
such
case,
the
maximum
number
of
components
is
restricted
by
number
of
possible
items
in
a
file,
coded
on
16
or
32
bits.
4.2. Derived Image Items
4.2.1. Grid Derived Image Item
A
grid
derived
image
item
(
'
grid
'
)
as
defined
in
[HEIF]
may
be
used
in
an
AVIF
file
.
4.2.2. Tone Map Derived Image Item
A
tone
map
derived
image
item
(
'
tmap
'
)
as
defined
in
[HEIF]
may
be
used
in
an
AVIF
file
.
When
present,
the
base
image
item
and
the
'
tmap
'
image
item
should
be
grouped
together
by
an
'
altr
'
(see
§ 5.1
'altr'
group
)
entity
group
as
recommended
in
[HEIF]
.
When
present,
the
gain
map
image
item
should
be
a
hidden
image
item
.
4.2.3. Sample Transform Derived Image Item
With a Sample Transform Derived Image Item , pixels at the same position in multiple input image items can be combined into a single output pixel using basic mathematical operations. This can for example be used to work around codec limitations or for storing alterations to an image as non-destructive residuals. With a Sample Transform Derived Image Item it is possible for AVIF to support 16 or more bits of precision per sample, while still offering backward compatibility through a regular 8 to 12-bit AV1 Image Item containing the most significant bits of each sample.
In these sections, a "sample" refers to the value of a pixel for a given channel.
4.2.3.1. Definition
When
a
derived
image
item
is
of
type
'
sato
'
,
it
is
called
a
Sample
Transform
Derived
Image
Item
,
and
its
reconstructed
image
is
formed
from
a
set
of
input
image
items,
constants
and
operators
.
The
input
images
are
specified
in
the
SingleItemTypeReferenceBox
or
SingleItemTypeReferenceBoxLarge
entries
of
type
'
dimg
'
for
this
Sample
Transform
Derived
Image
Item
within
the
ItemReferenceBox
.
The
input
images
are
in
the
same
order
as
specified
in
these
entries.
In
the
SingleItemTypeReferenceBox
or
SingleItemTypeReferenceBoxLarge
of
type
'
dimg
'
,
the
value
of
the
from_item_ID
field
identifies
the
Sample
Transform
Derived
Image
Item
,
and
the
values
of
the
to_item_ID
field
identify
the
input
images.
There
are
reference_count
input
image
items
as
specified
by
the
ItemReferenceBox
.
The input image items and the Sample Transform Derived Image Item shall:
-
each be associated with a
PixelInformationPropertyand an' ispe 'property; -
have the same number of channels as defined by the
PixelInformationPropertyorAV1ItemConfigurationProperty; -
have the same chroma subsampling; this may be explicitly defined by one of the above properties for some input image items or the Sample Transform Derived Image Item , and may be implicit for the other items (meaning no property defines chroma subsampling for these items);
-
have the same dimensions as defined by the
' ispe 'property; -
have the same color information as defined by the
ColourInformationBoxproperties (or lack thereof).
Each output sample of the Sample Transform Derived Image Item is obtained by evaluating an expression consisting of a series of integer operators and operands . An operand is a constant or a sample from an input image item located at the same channel index and at the same spatial coordinates as the output sample.
No color space conversion, matrix coefficients, or transfer characteristics function shall be applied to the input samples. They are already in the same color space as the output samples.
The
output
reconstructed
image
is
made
up
of
the
output
samples,
whose
values
shall
each
be
clamped
to
fit
in
the
number
of
bits
per
sample
as
defined
by
the
PixelInformationProperty
of
the
reconstructed
image
item.
The
full_range_flag
field
of
the
ColourInformationBox
property
of
colour_type
'
nclx
'
also
defines
a
range
of
values
to
clamp
to,
as
defined
in
[CICP]
.
NOTE: Appendix A: (informative) Sample Transform Derived Image Item Examples contains examples of Sample Transform Derived Image Item usage.
4.2.3.2. Syntax
An expression is a series of tokens . A token is an operand or an operator . An operand can be a literal constant value or a sample value. A stack is used to keep track of the results of the subexpressions . An operator takes either one or two input operands . Each unary operator pops one value from the stack. Each binary operator pops two values from the stack, the first being the right operand and the second being the left operand . Each token results in a value pushed to the stack. The single remaining value in the stack after evaluating the whole expression is the resulting output sample.
aligned ( 8 ) class SampleTransform { unsigned int ( 2 ) version = 0 ; unsigned int ( 4 ) reserved ; unsigned int ( 2 ) bit_depth ; // Enum signaling signed 8, 16, 32 or 64-bit. // Create an empty stack of signed integer elements of that depth. unsigned int ( 8 ) token_count ; for ( i = 0 ; i < token_count ; i ++ ) { unsigned int ( 8 ) token ; if ( token == 0 ) { // Push the 'constant' value to the stack. signed int ( 1 << ( bit_depth + 3 )) constant ; } else if ( token <= 32 ) { // Push the sample value from the 'token'th input image item // to the stack. } else { if ( token >= 64 && token <= 67 ) { // Unary operator. Pop the operand from the stack. } else if ( token >= 128 && token <= 137 ) { // Binary operator. Pop the right operand // and then the left operand from the stack. } // Apply operator 'token' and push the result to the stack. } } // Output the single remaining stack element. }
4.2.3.3. Semantics
version
shall
be
equal
to
0.
Readers
shall
ignore
a
Sample
Transform
Derived
Image
Item
with
an
unrecognized
version
number.
reserved
shall
be
equal
to
0.
The
value
of
reserved
shall
be
ignored
by
readers.
bit_depth
determines
the
precision
(from
8
to
64
bits,
see
Table
1
)
of
the
signed
integer
temporary
variable
supporting
the
intermediate
results
of
the
operations.
It
also
determines
the
precision
of
the
stack
elements
and
the
field
size
of
the
constant
fields.
This
intermediate
precision
shall
be
high
enough
so
that
all
input
sample
values
fit
into
that
signed
bit
depth.
Value
of
bit_depth
|
Intermediate
bit
depth
(sign
bit
inclusive)
num_bits
|
|---|---|
| 0 | 8 |
| 1 | 16 |
| 2 | 32 |
| 3 | 64 |
The
result
of
any
computation
underflowing
or
overflowing
the
intermediate
bit
depth
is
replaced
by
-2
num_bits
-1
and
2
num_bits
-1
-1,
respectively.
Encoder
implementations
should
not
create
files
leading
to
potential
computation
underflow
or
overflow.
Decoder
implementations
shall
check
for
computation
underflow
or
overflow
and
clamp
the
results
accordingly.
Computations
with
operands
of
negative
values
use
the
two’s-complement
representation.
token_count
is
the
expected
number
of
tokens
to
read.
The
value
of
token_count
shall
be
greater
than
0.
token
determines
the
type
of
the
operand
(
constant
or
input
image
item
sample)
or
the
operator
(how
to
transform
one
or
two
operands
into
the
result).
See
Table
2
.
Readers
shall
ignore
a
Sample
Transform
Derived
Image
Item
with
a
reserved
token
value.
Value
of
token
| Token name | Token type | Meaning before pushing to the stack |
Value
pushed
to
the
stack
( and refer to operands popped from the stack for operators ) |
|---|---|---|---|---|
| 0 | constant | operand | bits from the stream read as a signed integer. | constant value |
| 1..32 | sample | operand |
Sample
value
from
the
token
th
input
image
item
(
token
is
the
1-based
index
of
the
input
image
item
whose
sample
is
pushed
to
the
stack).
| input image item sample value |
| 33..63 | Reserved | |||
| 64 | negation | unary operator | Negation of the left operand . | |
| 65 | absolute value | unary operator | Absolute value of the left operand . | |
| 66 | not | unary operator | Bitwise complement of the operand . | |
| 67 | bsr | unary operator | 0-based index of the most significant set bit of the left operand if the left operand is strictly positive, zero otherwise. | |
| 68..127 | Reserved | |||
| 128 | sum | binary operator | Left operand added to the right operand . | |
| 129 | difference | binary operator | Right operand subtracted from the left operand . | |
| 130 | product | binary operator | Left operand multiplied by the right operand . | |
| 131 | quotient | binary operator | Left operand divided by the right operand if the right operand is not zero, left operand otherwise. The result is truncated toward zero (integer division). | |
| 132 | and | binary operator | Bitwise conjunction of the operands . | |
| 133 | or | binary operator | Bitwise inclusive disjunction of the operands . | |
| 134 | xor | binary operator | Bitwise exclusive disjunction of the operands . | |
| 135 | pow | binary operator | Left operand raised to the power of the right operand if the left operand is not zero, zero otherwise. | |
| 136 | min | binary operator | Minimum value among the operands . | |
| 137 | max | binary operator | Maximum value among the operands . | |
| 138..255 | Reserved | |||
constant is a literal signed value extracted from the stream with a precision of intermediate bit depth , pushed to the stack.
4.2.3.4. Constraints
Sample Transform Derived Image Items use the postfix notation to evaluate the result of the whole expression for each reconstructed image item sample.
-
The
tokens
shall
be
evaluated
in
the
order
they
are
defined
in
the
metadata
(the
SampleTransformstructure defined in § 4.2.3.2 Syntax ) of the Sample Transform Derived Image Item . -
tokenshall be at mostreference_countwhen evaluating a sample operand (when ). -
There
shall
be
at
least
one
token. -
The stack is empty before evaluating the first
token. - There shall be at least 1 element in the stack before evaluating a unary operator .
- There shall be at least 2 elements in the stack before evaluating a binary operator .
-
There
shall
be
exactly
one
remaining
element
in
the
stack
after
evaluating
the
last
token. This element is the value of the reconstructed image item sample.
Non-compliant expressions shall be rejected by parsers as invalid files.
Note:
Because
each
operator
pops
one
or
two
elements
and
then
pushes
one
element
to
the
stack,
there
is
at
most
one
more
operand
than
operators
in
the
expression
.
There
are
at
least
operators
and
at
most
operands
.
token_count
is
at
most
255,
meaning
the
maximum
stack
size
for
a
valid
expression
is
128.
5. Entity groups
The
GroupsListBox
(
'
grpl
'
)
defined
in
[ISOBMFF]
may
be
used
to
group
multiple
image
items
or
tracks
in
a
file
together.
The
type
of
the
group
describes
how
the
image
items
or
tracks
are
related.
Decoders
should
ignore
groups
of
unknown
type.
5.1.
'
altr
'
group
The
'
altr
'
entity
group
as
defined
in
[ISOBMFF]
may
be
used
to
mark
multiple
items
or
tracks
as
alternatives
to
each
other.
Only
one
item
or
track
in
the
'
altr
'
group
should
be
played
or
processed.
This
grouping
is
useful
for
defining
a
fallback
for
parsers
when
new
types
of
items
or
essential
item
properties
are
introduced.
5.2.
'
ster
'
group
The
'
ster
'
entity
group
as
defined
in
[HEIF]
may
be
used
to
indicate
that
two
image
items
form
a
stereo
pair
suitable
for
stereoscopic
viewing.
6. Brands, Internet media types and file extensions
6.1. Brands overview
As
defined
by
[ISOBMFF]
,
the
presence
of
a
brand
in
the
FileTypeBox
can
be
interpreted
as
the
permission
for
those
AV1
Image
File
Format
readers/parsers
and
AV1
Image
File
Format
renderers
that
only
implement
the
features
required
by
the
brand,
to
process
the
corresponding
file
and
only
the
parts
(e.g.
items
or
sequences)
that
comply
with
the
brand.
An AV1 Image File Format file may conform to multiple brands. Similarly, an AV1 Image File Format reader/parser or AV1 Image File Format renderer may be capable of processing the features associated with one or more brands.
If
any
of
the
brands
defined
in
this
document
is
specified
in
the
major_brand
field
of
the
FileTypeBox
,
the
file
extension
and
Internet
Media
Type
should
respectively
be
"
.avif
"
and
"
image/avif
"
as
defined
in
§ 10
AVIF
Media
Type
Registration
.
6.2. AVIF image and image collection brand
The brand to identify AV1 image items is avif .
Files
that
indicate
this
brand
in
the
FileTypeBox
shall
comply
with
the
following:
- The primary image item shall be an AV1 Image Item or be a derived image that references directly or indirectly one or more items that all are AV1 Image Items .
-
AV1 auxiliary image items may be present in the file.
avif
in
the
FileTypeBox
.
Additionally,
the
brand
avio
is
defined.
If
the
file
indicates
the
brand
avio
in
the
FileTypeBox
,
then
the
primary
image
item
or
all
the
items
referenced
by
the
primary
image
item
shall
be
AV1
image
items
made
only
of
Intra
Frames
.
6.3. AVIF image sequence brands
The brand to identify AV1 image sequences is avis .
Files
that
indicate
this
brand
in
the
FileTypeBox
shall
comply
with
the
following:
- they shall contain one or more AV1 image sequences .
-
they may contain AV1 auxiliary image sequences .
avis
in
the
FileTypeBox
.
Additionally,
if
a
file
contains
AV1
image
sequences
and
the
brand
avio
is
used
in
the
FileTypeBox
,
the
item
constraints
for
this
brand
shall
be
met
and
at
least
one
of
the
AV1
image
sequences
shall
be
made
only
of
AV1
Samples
marked
as
'
sync
'
.
Conversely,
if
such
a
track
exists
and
the
constraints
of
the
brand
avio
on
AV1
image
items
are
met,
the
brand
should
be
used
.
NOTE: As defined in [MIAF] , a file that is primarily an image sequence still has at least an image item. Hence, it can also declare brands for signaling the image item.
7. General constraints
The following constraints are common to files compliant with this specification:
-
The
file
shall
be
compliant
with
the
[MIAF]
specification
and
list
' miaf 'in theFileTypeBox. -
The
file
shall
list
' avif 'or' avis 'in theFileTypeBox. -
Transformative
properties
shall
not
be
associated
with
items
in
a
derivation
chain
(as
defined
in
[MIAF]
)
that
serves
as
an
input
to
a
grid
derived
image
item
.
For
example,
if
a
file
contains
a
grid
item
and
its
referenced
coded
image
items,
cropping,
mirroring
or
rotation
transformations
are
only
permitted
on
the
grid
item
itself.
NOTE: This constraint further restricts files compared to [MIAF] .
8. Profiles
8.1. Overview
The profiles defined in this section are for enabling interoperability between AV1 Image File Format files and AV1 Image File Format readers/parsers. A profile imposes a set of specific restrictions and is signaled by brands defined in this specification.
The
FileTypeBox
should
declare
at
least
one
profile
that
enables
decoding
of
the
primary
image
item
.
It
is
not
an
error
for
the
encoder
to
include
an
auxiliary
image
that
is
not
allowed
by
the
specified
profile(s).
If
'
avis
'
is
declared
in
the
FileTypeBox
and
a
profile
is
declared
in
the
FileTypeBox
,
the
profile
shall
also
enable
decoding
of
at
least
one
image
sequence
track.
The
profile
should
allow
decoding
of
any
associated
auxiliary
image
sequence
tracks,
unless
it
is
acceptable
to
decode
the
image
sequence
track
without
its
auxiliary
image
sequence
tracks.
It is possible for a file compliant to this AV1 Image File Format to not be able to declare an AVIF profile, if the corresponding AV1 encoding characteristics do not match any of the defined profiles.
NOTE:
[AV1]
supports
3
bit
depths:
8,
10
and
12
bits,
and
the
maximum
dimensions
of
a
coded
image
is
65536x65536,
when
seq_level_idx
is
set
to
31
(maximum
parameters
level).
8.2. AVIF Baseline Profile
This section defines the MIAF AV1 Baseline profile of [HEIF] , specifically for [AV1] bitstreams, based on the constraints specified in [MIAF] and identified by the brand MA1B .
If
the
brand
'
MA1B
'
is
in
the
FileTypeBox
,
the
common
constraints
in
the
section
§ 6
Brands,
Internet
media
types
and
file
extensions
shall
apply.
The following shared conditions and requirements from [MIAF] shall apply:
- baseline-profile self-containment (subclause 8.2)
The following shared conditions and requirements from [MIAF] should apply:
- baseline-profile grid-limit (subclause 8.4)
- baseline-profile single-track (subclause 8.5)
- baseline-profile edit-lists (subclause 8.6)
- baseline-profile matched-duration (subclause 8.7)
The following additional constraints apply to all AV1 Image Items and all AV1 Image Sequences :
-
The
AV1
profile
shall
be
the
Main
Profile
and
the
level
shall
be
5.1
or
lower.
NOTE: AV1 tiers are not constrained because timing is optional in image sequences and is not relevant in image items or collections.
NOTE: Level 5.1 is chosen for the Baseline profile to ensure that no single coded image exceeds 4k resolution, as some decoders may not be able to handle larger images. More precisely, following [AV1] level definitions, coded image items compliant to the AVIF Baseline profile may not have a number of pixels greater than 8912896, a width greater than 8192 or a height greater than 4352. It is still possible to use the Baseline profile to create larger images using a grid derived image item .
FileTypeBox
:
avif,
mif1,
miaf,
MA1B
A
file
containing
a
'
pict
'
track
compliant
with
this
profile
is
expected
to
list
the
following
brands,
in
any
order,
in
the
FileTypeBox
:
avis,
msf1,
miaf,
MA1B
A
file
containing
a
'
pict
'
track
compliant
with
this
profile
and
made
only
of
AV1
Samples
marked
'
sync
'
is
expected
to
list
the
following
brands,
in
any
order,
in
the
FileTypeBox
:
avis,
avio,
msf1,
miaf,
MA1B
8.3. AVIF Advanced Profile
This section defines the MIAF AV1 Advanced profile of [HEIF] , specifically for [AV1] bitstreams, based on the constraints specified in [MIAF] and identified by the brand MA1A .
If
the
brand
'
MA1A
'
is
in
the
FileTypeBox
,
the
common
constraints
in
the
section
§ 6
Brands,
Internet
media
types
and
file
extensions
shall
apply.
The following shared conditions and requirements from [MIAF] shall apply:
- advanced-profile self-containment (subclause 8.2)
The following shared conditions and requirements from [MIAF] should apply:
- advanced-profile grid-limit (subclause 8.4)
- advanced-profile single-track (subclause 8.5)
- advanced-profile edit-lists (subclause 8.6)
- advanced-profile matched-duration (subclause 8.7)
The following additional constraints apply to all AV1 Image Items :
-
The
AV1
profile
shall
be
the
High
Profile
and
the
level
shall
be
6.0
or
lower.
NOTE: Following [AV1] level definitions, coded image items compliant to the AVIF Advanced profile may not have a number of pixels greater than 35651584, a width greater than 16384 or a height greater than 8704. It is still possible to use the Advanced profile to create larger images using a grid derived image item .
The following additional constraints apply only to AV1 Image Sequences :
- The AV1 profile shall be either Main Profile or High Profile.
- The AV1 level for Main Profile shall be 5.1 or lower.
- The AV1 level for High Profile shall be 5.1 or lower.
FileTypeBox
:
avif,
mif1,
miaf,
MA1A
A
file
containing
a
'
pict
'
track
compliant
with
this
profile
is
expected
to
list
the
following
brands,
in
any
order,
in
the
FileTypeBox
:
avis,
msf1,
miaf,
MA1A
9. Box requirements
9.1. Image item boxes
This section discusses the box requirements for an AVIF file containing image items.9.1.1. Minimum set of boxes
As
indicated
in
§ 7
General
constraints
,
an
AVIF
file
is
a
compliant
[MIAF]
file.
As
a
consequence,
some
[ISOBMFF]
or
[HEIF]
boxes
are
required,
as
indicated
in
the
following
table.
The
order
of
the
boxes
is
indicative
in
the
table.
The
specifications
listed
in
the
"Specification"
column
may
require
a
specific
order
for
a
box
or
for
its
children
and
the
order
shall
be
respected.
For
example,
per
[ISOBMFF]
,
the
FileTypeBox
is
required
to
appear
first
in
an
AVIF
file
.
The
"Version(s)"
column
in
the
following
table
lists
the
version(s)
of
the
boxes
allowed
by
this
brand.
With
the
exception
of
item
properties
marked
as
non-essential,
other
versions
of
the
boxes
shall
not
be
used.
"-"
means
that
the
box
does
not
have
a
version.
| Top-Level | Level 1 | Level 2 | Level 3 | Version(s) | Specification | Note |
|---|---|---|---|---|---|---|
| ftyp | - | [ISOBMFF] | ||||
| meta | 0 | [ISOBMFF] | ||||
| hdlr | 0 | [ISOBMFF] | ||||
| pitm | 0, 1 | [ISOBMFF] | ||||
| iloc | 0, 1, 2 | [ISOBMFF] | ||||
| iinf | 0, 1 | [ISOBMFF] | ||||
| infe | 2, 3 | [ISOBMFF] | ||||
| iprp | - | [ISOBMFF] | ||||
| ipco | - | [ISOBMFF] | ||||
| av1C | - | AVIF | ||||
| ispe | 0 | [HEIF] | ||||
| pixi | 0 | [HEIF] | ||||
| ipma | 0, 1 | [ISOBMFF] | ||||
| mdat | - | [ISOBMFF] |
The
coded
payload
may
be
placed
in
'
idat
'
rather
than
'
mdat
'
,
in
which
case
'
mdat
'
is
not
required.
|
9.1.2. Requirements on additional image item related boxes
The
boxes
indicated
in
the
following
table
may
be
present
in
an
AVIF
file
to
provide
additional
signaling
for
image
items.
If
present,
the
boxes
shall
use
the
version
indicated
in
the
table
unless
the
box
is
an
item
property
marked
as
non-essential.
AVIF
readers
are
expected
to
understand
the
boxes
and
versions
listed
in
this
table.
The
order
of
the
boxes
in
the
table
may
not
be
the
order
of
the
boxes
in
the
file.
Specifications
may
require
a
specific
order
for
a
box
or
for
its
children
and
the
order
shall
be
respected.
Additionally,
the
'
free
'
and
'
skip
'
boxes
may
be
present
at
any
level
in
the
hierarchy
and
AVIF
readers
are
expected
to
ignore
them.
Additional
boxes
in
the
'
meta
'
hierarchy
not
listed
in
the
following
table
may
also
be
present
and
may
be
ignored
by
AVIF
readers.
| Top-Level | Level 1 | Level 2 | Level 3 | Version(s) | Specification | Description |
|---|---|---|---|---|---|---|
| meta | See § 9.1.1 Minimum set of boxes | |||||
| dinf | - | [ISOBMFF] | Used to indicate the location of the media information | |||
| dref | 0 | [ISOBMFF] | ||||
| iref | 0, 1 | [ISOBMFF] | Used to indicate directional relationships between images or metadata | |||
| auxl | - | [HEIF] | Used when an image is auxiliary to another image | |||
| thmb | - | [HEIF] | Used when an image is a thumbnail of another image | |||
| dimg | - | [HEIF] | Used when an image is derived from another image | |||
| prem | - | [HEIF] | Used when the color values in an image have been premultiplied with alpha values | |||
| cdsc | - | [HEIF] | Used to link metadata with an image | |||
| idat | - | [ISOBMFF] | Typically used to store derived image definitions or small pieces of metadata | |||
| grpl | - | [ISOBMFF] | Used to indicate that multiple images are semantically grouped | |||
| altr | 0 | [ISOBMFF] | Used when images in a group are alternatives to each other | |||
| ster | 0 | [HEIF] | Used when images in a group form a stereo pair | |||
| iprp | See § 9.1.1 Minimum set of boxes | |||||
| ipco | See § 9.1.1 Minimum set of boxes | |||||
| pasp | - | [ISOBMFF] | Used to signal pixel aspect ratio. If present, shall indicate a pixel aspect ratio of 1:1 | |||
| colr | - | [ISOBMFF] | Used to signal color information such as color primaries | |||
| auxC | 0 | [HEIF] | Used to signal the type of an auxiliary image (e.g. alpha, depth) | |||
| clap | - | [ISOBMFF] | Used to signal cropping applied to an image | |||
| irot | - | [HEIF] | Used to signal a rotation applied to an image | |||
| imir | - | [HEIF] | Used to signal a mirroring applied to an image | |||
| clli | - | [ISOBMFF] | Used to signal HDR content light level information for an image | |||
| cclv | - | [ISOBMFF] | Used to signal HDR content color volume for an image | |||
| mdcv | - | [ISOBMFF] | Used to signal HDR mastering display color volume for an image | |||
| amve | - | [ISOBMFF] | Used to signal the nominal ambient viewing environment for the display of the content | |||
| reve | 0 | [HEIF] | Used to signal the viewing environment in which the image was mastered | |||
| ndwt | 0 | [HEIF] | Used to signal the nominal diffuse white luminance of the content | |||
| a1op | - | AVIF | Used to configure which operating point to select when there are multiple choices | |||
| lsel | - | [HEIF] | Used to configure rendering of a multilayered image | |||
| a1lx | - | AVIF | Used to assist reader in parsing a multilayered image | |||
| cmin | 0 | [HEIF] | Used to signal the camera intrinsic matrix | |||
| cmex | 0 | [HEIF] | Used to signal the camera extrinsic matrix |
10. AVIF Media Type Registration
The
media
type
"image/avif"
is
officially
registered
with
IANA
and
available
at:
https://www.iana.org/assignments/media-types/image/avif
.
11. Changes since v1.1.0 release
-
EDITORIAL: Stop using `dfn value` for definitions.
-
EDITORIAL: Add assert-ids in the spec for conformance file testing and ComplianceWarden
-
EDITORIAL: Add "per item" to item property definitions
-
EDITORIAL: Fix broken link for latest-draft.html
-
Relax constraint on transformative properties in derivation chains to only apply to grid items
-
Clarify relationship between av1C, metadata OBUs and item properties
-
EDITORIAL: Update list of other item properties
-
Further clarify relationship between av1C, metadata OBUs and item properties
-
Replace recommendations regarding still picture flags in image items by a note
-
Add Appendix A "Sample Transform Derived Image Item Examples"
-
EDITORIAL: Clean up usage of dfn and linking
-
EDITORIAL: Add sato, alpha, depth, progressive in Scope
-
EDITORIAL: Remove Sample Transform sections from TOC
-
EDITORIAL: Indent notes as the list items they refer to
-
EDITORIAL: Remove inconsistent dots in 9.1.2
-
EDITORIAL: Make assert IDs between profiles unique
-
Update the HEIF, ISOBMFF, and MIAF references to the latest versions
-
EDITORIAL: Merge asserts for avio brand
-
EDITORIAL: Clarify
satoitem requirements
Appendix A: (informative) Sample Transform Derived Image Item Examples
This informative appendix contains example recipes for extending base AVIF features with Sample Transform Derived Image Items .
Bit depth extension
Sample Transform Derived Image Items allow for more than 12 bits per channel per sample by combining several AV1 image items in multiple ways.
Suffix bit depth extension
The following example describes how to leverage a Sample Transform Derived Image Item on top of a regular 8-bit MIAF image item to extend the decoded bit depth to 16 bits.
Consider the following:
-
A MIAF image item being a losslessly coded image item,
and itsPixelInformationPropertywithbits_per_channel=8, -
Another image item being a lossily or losslessly coded image item with the same spatial dimensions, the same number of channels, and the same chroma subsampling (or lack thereof) as the first input image item,
and itsPixelInformationPropertywithbits_per_channel=8, -
A Sample Transform Derived Image Item with the two items above as input in this order,
and itsPixelInformationPropertywithbits_per_channel=16,
and the followingSampleTransformfields:
This is equivalent to the following postfix notation (parentheses added for clarity):
This is equivalent to the following infix notation:
Each output sample is equal to the sum of a sample of the first input image item shifted to the left by 8 bits and of a sample of the second input image item. This can be viewed as a bit depth extension of the first input image item by the second input image item. The first input image item contains the 8 most significant bits and the second input image item contains the 8 least significant bits of the 16-bit output reconstructed image item. It is impossible to achieve a bit depth of 16 with a single AV1 image item .
NOTE:
If
the
first
input
image
item
is
the
primary
image
item
and
is
enclosed
in
an
'
altr
'
group
(see
§ 5.1
'altr'
group
)
with
the
Sample
Transform
Derived
Image
Item
,
the
first
input
image
item
is
also
a
backward-compatible
8-bit
regular
coded
image
item
that
can
be
used
by
readers
that
do
not
support
Sample
Transform
Derived
Image
Items
or
do
not
need
extra
precision.
NOTE: The second input image item can be marked as hidden to prevent readers from surfacing it to users.
NOTE: The second input image item loses its meaning of the least significant part if any of the most significant bits changes, so the first input image item has to be losslessly encoded. The second input image item supports reasonable loss during encoding.
NOTE: This pattern can be used for reconstructed bit depths beyond 16 by combining more than two input image items or with various input bit depth configurations and operations.
Residual bit depth extension
The
following
example
describes
how
to
leverage
a
Sample
Transform
Derived
Image
Item
on
top
of
a
regular
12-bit
MIAF
image
item
to
extend
the
decoded
bit
depth
to
16
bits.
It
differs
from
the
Suffix
bit
depth
extension
by
its
slightly
longer
series
of
operations
allowing
its
first
input
image
item
to
be
lossily
encoded.
Consider the following:
-
A MIAF image item being a lossily coded image item,
and itsPixelInformationPropertywithbits_per_channel=12, -
Another image item being a lossily or losslessly coded image item with the same spatial dimensions, the same number of channels, and the same chroma subsampling (or lack thereof) as the first input image item,
and itsPixelInformationPropertywithbits_per_channel=8,
with the following contraints: -
-
For
each
sample
position
in
each
plane,
-
being
the
value
of
the
16-bit
original
sample
at
that
position
in
that
plane,
-
being
the
value
of
the
12-bit
sample
of
the
first
input
image
at
that
position
in
that
plane,
-
being
the
value
of
the
sample
of
the
second
input
image
at
that
position
in
that
plane,
-
representing
similarity
within
compression
loss
range,
-
NOTE: Files that do not respect this constraint will still decode successfully because Clause § 4.2.3.1 Definition mandates the resulting values to be each clamped to fit in the number of bits per sample as defined by the
PixelInformationPropertyof the reconstructed image item.
-
A Sample Transform Derived Image Item with the two items above as input in this order,
and itsPixelInformationPropertywithbits_per_channel=16,
and the followingSampleTransformfields:-
version=0 -
bit_depth=2 (signed 32-bitconstants, stack values and intermediate results) -
token_count=7
-
This is equivalent to the following postfix notation (parentheses added for clarity):
This is equivalent to the following infix notation:
Each output sample is equal to the sum of a sample of the first input image item shifted to the left by 4 bits and of a sample of the second input image item offset by -128. This can be viewed as a bit depth extension of the first input image item by the second input image item, which contains the residuals to correct the precision loss of the first input image item.
NOTE:
If
the
first
input
image
item
is
the
primary
image
item
and
is
enclosed
in
an
'
altr
'
group
(see
§ 5.1
'altr'
group
)
with
the
derived
image
item,
the
first
input
image
item
is
also
a
backward-compatible
12-bit
regular
coded
image
item
that
can
be
used
by
decoding
contexts
that
do
not
support
Sample
Transform
Derived
Image
Items
or
do
not
need
extra
precision.
NOTE: The second input image item can be marked as hidden to prevent readers from surfacing it to users.
NOTE: The first input image item supports reasonable loss during encoding because the second input image item "overlaps" by 4 bits to correct the loss. The second input image item supports reasonable loss during encoding.
NOTE: This pattern can be used for reconstructed bit depths beyond 16 by combining more than two input image items or with various input bit depth configurations and operations.