1. Introduction
The Web Neural Network API defines a web-friendly hardware-agnostic abstraction layer that makes use of Machine Learning capabilities of operating systems and underlying hardware platforms without being tied to platform-specific capabilities. The abstraction layer addresses the requirements of key Machine Learning JavaScript frameworks and also allows web developers familiar with the ML domain to write custom code without the help of libraries. A complementary Model Loader API defines a higher-level abstraction targeting primarily web developers.
For an illustrated introduction, please see the explainer .
2. Use cases
2.1. Application Use Cases
This section illustrates application-level use cases for neural network inference hardware acceleration. All applications in those use cases can be built on top of pre-trained deep neural network (DNN) [models] .
Note: Please be aware that some of the use cases described here, are by their very nature, privacy-invasive. Developers who are planning to use the API for such use cases should ensure that the API is being used to benefit users, for purposes that users understand, and approve. They should apply the Ethical Principles for Web Machine Learning [webmachinelearning-ethics] and implement appropriate privacy risk mitigations such as transparency, data minimisation, and users controls.
2.1.1. Person Detection
A user opens a web-based video conferencing application, but she temporarily leaves from her room. The application is watching whether she is in front of her PC by using object detection (for example, using object detection approaches such as [SSD] or [YOLO] that use a single DNN) to detect regions in a camera input frame that include persons.
When she comes back, the application automatically detects her and notifies other online users that she is active now.
2.1.2. Semantic Segmentation
A user joins a teleconference via a web-based video conferencing application at her desk since no meeting room in her office is available. During the teleconference, she does not wish that her room and people in the background are visible. To protect the privacy of the other people and the surroundings, the application runs a machine learning model such as [DeepLabv3+] or [MaskR-CNN] to semantically split an image into segments and replaces segments that represent other people and background with another picture.
2.1.3. Skeleton Detection
A web-based video conferencing application tracks a pose of user’s skeleton by running a machine learning model, which allows for real-time human pose estimation, such as [PoseNet] to recognize her gesture and body language. When she raises her hand, her microphone is automatically unmuted and she can start speaking on the teleconference.
2.1.4. Face Recognition
There are multiple people in the conference room and they join an online meeting using a web-based video conferencing application. The application detects faces of participants by using object detection (for example, using object detection approaches such as [SSD] ) and checks whether each face was present at the previous meeting or not by running a machine learning model such as [FaceNet] , which verifies whether two faces would be identical or not.
2.1.5. Facial Landmark Detection
A user wants to find new glasses that beautifully fits her on an online glasses store. The online store offers web-based try-on simulator that runs a machine learning model such as Face Alignment Network [FAN] to detect facial landmarks like eyes, nose, mouth, etc. When she chooses a pair of glasses, the simulator properly renders the selected glasses on the detected position of eyes on her facial image.
2.1.6. Style Transfer
A user is looking for cosmetics on an online store and wondering which color may fit her face. The online store shows sample facial makeup images of cosmetics, and offers makeup simulator that runs a machine learning model like [ContextualLoss] or [PairedCycleGAN] to transfer the makeup style of the sample makeup image to her facial image. She can check how the selected makeup looks like on her face by the simulator.
2.1.7. Super Resolution
A web-based video conferencing is receiving a video stream from its peer, but the resolution of the video becomes lower due to network congestion. To prevent degradation of the perceived video quality, the application runs a machine learning model for super-resolution such as [SRGAN] to generate higher-resolution video frames.
2.1.8. Image Captioning
For better accessibility, a web-based presentation application provides automatic image captioning by running a machine learning model such as [im2txt] which predicts explanatory words of the presentation slides.
2.1.9. Machine Translation
Multiple people from various countries are talking via a web-based real-time text chat application. The application translates their conversation by using a machine learning model such as [GNMT] or [OpenNMT] , which translates every text into different language.
2.1.10. Emotion Analysis
A user is talking to her friend via a web-based real-time text chat application, and she is wondering how the friend feels because she cannot see the friend’s face. The application analyses the friend’s emotion by using a machine learning model such as [DeepMoji] , which infers emotion from input texts, and displays an emoji that represents the estimated emotion.
2.1.11. Video Summarization
A web-based video conferencing application records received video streams, and it needs to reduce recorded video data to be stored. The application generates the short version of the recorded video by using a machine learning model for video summarization such as [Video-Summarization-with-LSTM] .
2.1.12. Noise Suppression
A web-based video conferencing application records received audio streams, but usually the background noise is everywhere. The application leverages real-time noise suppression using Recurrent Neural Network such as [RNNoise] for suppressing background dynamic noise like baby cry or dog barking to improve audio experiences in video conferences.
2.1.13. Detecting fake video
A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. The fake video can swap the speaker’s face into the president’s face to incite a user politically or to manipulate user’s opinion. The deepfake detection applications such as [FaceForensics++] analyze the videos and protect a user against the fake videos or images. When she watches a fake video on the web, the detection application alerts her of the fraud video in real-time.
2.2. Framework Use Cases
This section collects framework-level use cases for a dedicated low-level API for neural network inference hardware acceleration. It is expected that Machine Learning frameworks will be key consumers of the Web Neural Network API (WebNN API) and the low-level details exposed through the WebNN API are abstracted out from typical web developers. However, it is also expected that web developers with specific interest and competence in Machine Learning will want to interface with the WebNN API directly instead of a higher-level ML framework.
2.2.1. Custom Layer
A web application developer wants to run a DNN model on the WebNN API. However, she has found that some of activation functions like [LeakyReLU] , [ELU] , etc. are not included in the WebNN API. To address this issue, she constructs custom layers of the additional activation functions on top of the WebNN API. Note that the scope of custom layers may include convolution, normalization, etc. as well as activation.
2.2.2. Network Concatenation
A web application uses a DNN model, and its model data of upper convolutional layers and lower fully-connected layers are stored in separate files, since model data of the fully-connected layers are periodically updated due to fine tuning at the server side.
Therefore, the application downloads both partial model files at first and concatenates them into a single model. When the model is updated, the application downloads fine-tuned part of the model and replace only the fully-connected layers with it.
2.2.3. Performance Adaptation
A web application developer has a concern about performance of her DNN model on mobile devices. She has confirmed that it may run too slow on mobile devices which do not have GPU acceleration. To address this issue, her web application refers to the WebNN API to confirm whether acceleration is available or not, so that the application can display the warning for devices without acceleration.
After several weeks, she has developed a tiny DNN model that can even run on CPU. In order to accommodate CPU execution, she modifies the application so that the application loads the tiny model in the case of CPU-only devices.
2.2.4. Operation Level Execution
A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device.
2.2.5. Integration with real-time video processing
The user experience of WebRTC-based video conferencing is enhanced using real-time video processing. For example, background blur implemented using a § 2.1.2 Semantic Segmentation model blurs the background in the user’s live camera feed. To satisfy the performance requirements of this use case, the WebNN API integrates with primitives from other Web APIs that make up the media pipeline to allow WebNN API-based transformation of real-time video streams.
3. Security Considerations
This specification defines a low-level API for neural network inference hardware acceleration. This API is considered a powerful feature [POWERFUL-FEATURES] because it grants low-level access to a user’s computer. To meet the authentication and confidentiality expectations of a powerful feature and to prevent man-in-the-middle attacks, all interfaces defined by this specification are only available in a secure context.This API is disabled by default in all cross-origin frames using the § 7.2.1 Permissions Policy Integration . This prevents third-party content from using this API unless the embedding page explicitly sets a policy that grants permission.
This
API
allows
creation
of
an
MLContext
from
a
GPUDevice
defined
by
WebGPU
specification.
See
WebGPU
Security
Considerations
for
more
information
regarding
security
characteristics
of
this
context.
Once the graph is fully constructed and compiled, the input shapes into each of the operations in the graph are inferred and finalized. The bounds checking occurs when the compute method is invoked that executes the graph against the actual data. No actual data is bound to the compiled graph before this stage. It is the implementation’s responsibility to make sure proper bounds checking occurs against the shapes of the data already inferred by that time.
Document operations susceptible to out-of-bounds access as a guidance to implementers.
As a future-proofing measure, the API design allows certain operations that can be generically emulated to be deprecated for security, performance, or other reasons without breaking compatibility. This is made possible by high-level functions that are defined in terms of smaller primitive operations defined in this specifications. This enables a native implementation of a high-level function to be replaced with a polyfill implementation.
Investigate side channel attack feasibility considering the current state where CPU is shared between processes running renderers.
In
order
to
not
allow
an
attacker
to
target
a
specific
implementation
that
may
contain
a
flaw,
the
§ 6.2
Context
and
Device
Selection
Association
mechanism
is
a
hint
only,
and
the
concrete
device
selection
is
left
to
the
implementation
-
a
user
agent
could
for
instance
choose
never
to
run
a
model
on
a
device
with
known
vulnerabilities.
As
a
further
mitigation,
no
device
enumeration
mechanism
is
defined.
Hinting partially mitigates the concern. Investigate additional mitigations.
The
API
design
minimizes
the
attack
surface
for
the
compiled
computational
graph.
The
MLGraphBuilder
interface
that
hosts
the
various
operations
is
a
data
definition
API
and
as
such
doesn’t
execute
anything,
only
constructs
data.
What
follows,
is
that
the
potential
for
an
attack
is
limited
to
when
binding
the
data
to
the
graph
before
executing
it
by
invoking
the
MLContext
.
compute()
method.
This
enables
implementers
to
focus
on
hardening
the
MLContext
.
compute()
method.
For
example,
by
making
sure
it
honors
the
boundary
of
data
and
fails
appropriately
when
the
bounds
are
not
respected.
Purpose-built Web APIs for measuring high-resolution time mitigate against timing attacks using techniques such as resolution reduction, adding jitter, detection of abuse and API call throttling [hr-time-3] . The practical deployment of WebNN implementations are likely to bring enough jitter to make timing attacks impractical (e.g. because they would use IPC) but implementers are advised to consider and test their implementations against timing attacks.
3.1. Guidelines for new operations
To ensure operations defined in this specification are shaped in a way they can be implemented securely, this section includes guidelines on how operations are expected to be defined to reduce potential for implementation problems. These guidelines are expected to evolve over time to align with industry best practices:
-
Prefer simplicity of arguments
-
Don’t use parsers for complex data formats
-
If an operation can be decomposed to low level primitives:
-
Add an informative emulation path
-
Prefer primitives over new high level operations but consider performance consequences
-
-
Operations should follow a consistent style for inputs and attributes
-
Operation families such as pooling and reduction should share API shape and options
-
Formalize failure cases into test cases whenever possible
-
When in doubt, leave it out: API surface should be as small as possible required to satisfy the use cases, but no smaller
-
Try to keep the API free of implementation details that might inhibit future evolution, do not overspecify
-
Fail fast: the sooner the web developer is informed of an issue, the better
In general, always consider the security and privacy implications as documented in [security-privacy-questionnaire] by the Technical Architecture Group and the Privacy Interest Group when adding new features.
4. Privacy Considerations
This API enhances privacy compared to cloud-based inference, since input data such as locally sourced images or video streams stay within the browser’s sandbox.
This API exposes the minimum amount of information necessary to address the identified § 2 Use cases for the best performance and reliability of results.
No information from the underlying platform is exposed directly. An execution time analysis may reveal indirectly the performance of the underlying platform’s neural network hardware acceleration capabilities relative to another underlying platform.
Note: The group is soliciting further input on the proposed execution time analysis fingerprinting vector and will augment this section with more information and mitigations to inform the implementers of this API.
Unlike
WebGPU,
this
API
does
not
intrinsically
support
custom
shader
authoring;
and
as
a
result
is
not
prone
to
timing
attacks
that
rely
on
shader
caches,
or
other
persistent
data.
The
API
builds
upon
pre-existing
shaders
and
lower
level
primitives
of
the
browser
or
the
underlying
OS.
Web
developers
who
interface
with
GPUDevice
are
expected
to
be
aware
of
WebGPU
compilation
cache
considerations
.
The
WebGPU
API
identifies
machine-specific
artifacts
as
a
privacy
consideration.
Given
the
WebNN
API
defines
means
to
record
an
ML
workload
onto
a
WebGPU-compatible
GPUCommandBuffer
,
compute
unit
scheduling
may
under
certain
circumstances
introduce
a
fingerprint.
However,
similarly
to
WebGPU,
such
fingerprints
are
identical
across
most
or
all
of
the
devices
of
each
vendor,
mitigating
the
concern.
Furthermore,
software
implementations
can
be
used
to
further
eliminate
such
artifacts.
The
WebNN
API
defines
two
developer-settable
preferences
to
help
inform
§ 6.2
Context
and
Device
Selection
Association
and
allow
the
implementation
to
better
select
the
most
appropriate
underlying
execution
device
for
the
workload.
Device
type
normatively
indicates
the
kind
of
device
and
is
either
"cpu"
or
"gpu".
If
this
type
cannot
be
satisfied,
an
"
OperationError
"
DOMException
is
thrown,
thus
this
type
can
in
some
cases
add
two
bits
of
entropy
to
the
fingerprint.
Power
preference
indicates
preference
as
related
to
the
power
consumption
and
is
considered
a
hint
only
and
as
such
does
not
increase
entropy
of
the
fingerprint.
If
a
future
version
of
this
specification
introduces
support
for
new
a
device
type
that
can
only
support
a
subset
of
MLOperandType
s,
that
may
introduce
a
new
fingerprint.
In
general,
implementers
of
this
API
are
expected
to
apply
WebGPU
Privacy
Considerations
to
their
implementations
where
applicable.
5. Ethical Considerations
The Working Group has started documenting ethical issues associated with using Machine Learning on the Web, to help identify what mitigations its normative specifications should take into account. The Working Group publishes and maintains an Ethical Principles for Web Machine Learning document [webmachinelearning-ethics] open to contributions from the wider community via a dedicated GitHub repository .
6. Programming Model
6.1. Overview
At the heart of neural networks is a computational graph of mathematical operations. These operations are the building blocks of modern machine learning technologies in computer vision, natural language processing, and robotics. The WebNN API is a specification for constructing, compiling, and executing computational graphs of neural networks.
The
MLGraph
interface
represents
a
compiled
computational
graph
that
is
immutable
(that
is,
a
model).
The
MLGraphBuilder
interface
serves
as
a
builder
(factory)
to
create
an
MLGraph
.
An
MLOperand
is
a
representation
of
data
that
flows
within
the
computational
graph,
which
include
input-values
for
inference,
constants
(including
trained
weights)
used
for
inference,
intermediate
values
(often
referred
to
as
activations)
computed
during
inference,
as
well
as
the
output
values
of
inference.
At
inference
time,
every
MLOperand
will
be
bound
to
a
tensor
(the
actual
data).
The
MLGraphBuilder
interface
enables
the
creation
of
MLOperand
s.
A
key
part
of
the
MLGraphBuilder
interface
are
the
operations
(such
as
MLGraphBuilder
.
gemm()
and
MLGraphBuilder
.
softmax()
).
The
operations
have
a
functional
semantics,
with
no
side
effects.
Each
operation
invocation
conceptually
returns
a
distinct
new
value,
without
changing
the
value
of
any
other
MLOperand
.
The
runtime
values
(of
MLOperand
s)
are
tensors,
which
are
essentially
multidimensional
arrays.
The
representation
of
the
tensors
is
implementation
dependent,
but
it
typically
includes
the
array
data
stored
in
some
buffer
(memory)
and
some
metadata
describing
the
array
data
(such
as
its
shape).
As mentioned above, the operations have a functional semantics. This allows the implementation to potentially share the array data between multiple tensors. For example, the implementation of operations such as reshape, or slice, or squeeze may return a view of its input tensor that shares the same buffer as the input tensor. (In the case of reshape or squeeze, the entire data is shared, while in the case of slice, a part of the input data is shared.) The implementation may use views, as above, for intermediate values.
Before the execution, the computation graph that is used to compute one or more specified outputs needs to be compiled and optimized. The key purpose of the compilation step is to enable optimizations that span two or more operations, such as operation or loop fusion.
There
are
multiple
ways
by
which
the
graph
may
be
compiled.
The
MLGraphBuilder
.
build()
method
compiles
the
graph
in
the
background
without
blocking
the
calling
thread,
and
returns
a
Promise
that
resolves
to
an
MLGraph
.
The
MLGraphBuilder
.
buildSync()
method
compiles
the
graph
immediately
on
the
calling
thread,
which
must
be
a
worker
thread
running
on
CPU
or
GPU
device,
and
returns
an
MLGraph
.
Both
compilation
methods
produce
an
MLGraph
that
represents
a
compiled
graph
for
optimal
execution.
Once
the
MLGraph
is
constructed,
there
are
multiple
ways
by
which
the
graph
may
be
executed.
The
MLContext
.
computeSync()
method
represents
a
way
the
execution
of
the
graph
is
carried
out
immediately
on
the
calling
thread,
which
must
also
be
a
worker
thread,
either
on
a
CPU
or
GPU
device.
thread.
The
execution
produces
the
results
of
the
computation
from
all
the
inputs
bound
to
the
graph.
The
MLContext
.
compute()
method
represents
a
way
the
execution
of
the
graph
is
performed
asynchronously
either
on
a
parallel
timeline
in
a
separate
CPU
worker
thread
for
the
CPU
execution
or
on
a
GPU
timeline
in
executing
a
GPU
command
queue.
This
method
returns
immediately
without
blocking
the
calling
thread
while
the
actual
execution
is
offloaded
to
a
different
timeline.
This
type
of
execution
is
appropriate
when
the
responsiveness
of
the
calling
thread
is
critical
to
good
user
experience.
The
computation
results
will
be
placed
at
the
bound
outputs
at
the
time
the
operation
is
successfully
completed
on
the
offloaded
timeline
at
which
time
the
calling
thread
is
signaled.
This
type
of
execution
supports
both
the
CPU
and
GPU
device.
In
both
the
MLContext
.
compute()
and
MLContext
.
computeSync()
execution
methods,
the
caller
supplies
the
input
values
using
MLNamedArrayBufferViews
,
binding
the
input
MLOperand
s
to
their
values.
The
caller
then
supplies
pre-allocated
buffers
for
output
MLOperand
s
using
MLNamedArrayBufferViews
.
The
MLCommandEncoder
interface
created
by
the
MLContext
.
createCommandEncoder()
method
supports
a
graph
execution
method
that
provides
the
maximum
flexibility
to
callers
that
also
utilize
WebGPU
in
their
application.
It
does
this
by
placing
the
workload
required
to
initialize
and
compute
the
results
of
the
operations
in
the
graph
onto
a
GPUCommandBuffer
.
The
callers
are
responsible
for
the
eventual
submission
of
this
workload
on
the
GPUQueue
through
the
WebGPU
queue
submission
mechanism.
Once
the
submitted
workload
is
completely
executed,
the
result
is
avaialble
available
in
the
bound
output
buffers.
6.2.
Context
and
Device
Selection
Association
An
MLContext
interface
represents
a
global
state
of
a
neural
network
execution.
One
of
the
An
important
context
states
function
of
this
state
is
the
underlying
execution
device
that
manages
the
to
manage
resources
and
facilitates
used
in
the
compilation
and
the
eventual
execution
of
the
neural
network
graph.
In
addition
to
the
default
method
An
implementation
in
a
user
agent
may
implement
this
state
in
terms
of
creation
with
CPU
resources
and
execution
when
an
is
created
without
an
explicit
association
with
a
hardware
device
(aka.
a
"default
context"
).
However,
when
an
MLContextOptions
MLContext
,
MLContext
could
also
be
is
created
from
a
specific
WebGPU
GPUDevice
that
is
already
in
use
by
(aka.
a
"GPU
context"
),
the
application,
in
which
case
implementation
uses
the
specified
GPU
device
as
a
resource
domain
for
the
subsequent
compilation
and
execution
of
the
graph.
Any
GPU
resource
such
as
the
corresponding
GPUBuffer
resources
used
as
graph
constants,
as
well
as
the
or
GPUTexture
as
graph
inputs
must
also
be
created
from
the
same
device.
In
a
multi-adapter
configuration,
the
device
used
for
MLContext
GPUDevice
must
is
therefore
considered
a
resource
of
native
resource
type
that
can
be
created
from
the
same
adapter
as
the
device
used
to
allocate
the
resources
referenced
in
the
graph.
store
a
graph
constant,
input,
or
output
operand.
In
a
situation
when
a
GPU
context
executes
a
graph
with
a
constant
or
an
input
or
output
allocated
in
the
system
memory
as
in
an
ArrayBufferView
,
the
input
content
is
automatically
uploaded
from
the
system
memory
to
the
GPU
memory,
and
downloaded
back
to
the
system
memory
of
an
ArrayBufferView
output
buffer
at
the
end
of
the
graph
execution.
This
These
automatic
data
upload
and
download
cycles
will
only
occur
whenever
the
execution
device
requires
executing
GPU
context
determines
that
the
data
to
must
be
copied
out
of
and
or
back
into
the
system
memory,
such
memory
as
in
the
case
part
of
the
GPU.
It
doesn’t
occur
when
the
device
is
a
CPU
device.
execution.
Additionally,
the
eventual
result
of
the
graph
execution
is
must
also
be
in
a
known
layout
format.
While
the
internal
execution
technique
may
be
optimized
for
a
native
memory
access
pattern
in
an
intermediate
result
within
the
graph,
the
output
of
the
last
operation
of
the
graph
must
convert
the
content
back
to
a
known
layout
format
at
the
end
of
the
graph
in
order
to
maintain
the
interoperability
as
expected
behavior
from
the
caller’s
perspective.
When
an
MLContext
is
created
with
MLContextOptions
,
the
user
agent
selects
and
creates
the
underlying
execution
device
by
taking
into
account
the
application’s
power
preference
and
device
type
specified
in
the
MLPowerPreference
and
MLDeviceType
options.
caller.
The
following
table
summarizes
the
types
of
resource
supported
by
the
context
created
through
different
method
methods
of
creation:
| ArrayBufferView | GPUBuffer | GPUTexture |
---|---|---|---|
| Yes | No | No |
| Yes | Yes | Yes |
7. API
7.1. The navigator.ml interface
An
ML
object
is
available
in
the
Window
and
DedicatedWorkerGlobalScope
contexts
through
the
Navigator
and
WorkerNavigator
interfaces
respectively
and
is
exposed
via
navigator.ml
.
interface mixin { [
NavigatorML SecureContext ,SameObject ]readonly attribute ML ; };
ml Navigator includes NavigatorML ;WorkerNavigator includes NavigatorML ;
7.2. The ML interface
enum {
MLPowerPreference ,
"default" ,};
"low-power" dictionary {
MLContextOptions = "cpu"; = "default";MLPowerPreference = "default"; }; [
powerPreference SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {
ML = {}); );// Default contextPromise <MLContext >(
createContext optional MLContextOptions = {}); [
options Exposed =(DedicatedWorker )]= {});MLContext (
createContextSync optional MLContextOptions = {}); // GPU context
options Promise <MLContext >(
createContext GPUDevice ); [
gpuDevice Exposed =(DedicatedWorker )]);MLContext (
createContextSync GPUDevice ); };
gpuDevice
7.2.1. Permissions Policy Integration
This
specification
defines
a
policy-controlled
feature
identified
by
the
string
"
webnn
".
Its
default
allowlist
is
'self'
.
7.2.2.
The
createContext()
method
The
createContext()
method
steps
are:
-
If this 's relevant global object 's associated Document is not allowed to use the webnn feature, return a new promise rejected with a "
SecurityError
"DOMException
and abort these steps. -
Let promise be a new promise .
-
Return promise and run the following steps in parallel .
-
Let options be the first argument.
-
Run the create context steps given options :
-
Let context be a new
MLContext
object. -
If options is a
GPUDevice
object,-
Set context .
[[contextType]]
to " webgpu ". -
Set context .
[[deviceType]] to " gpu ". Set context .[[powerPreference]]
-
-
Otherwise,
-
Set context .
[[contextType]]
to " default ". -
If options ["
deviceType "] exists , then set context . [[deviceType]] to options [" deviceType "]. Otherwise, set context . [[deviceType]] to " cpu ". If options ["powerPreference[[powerPreference]]
to options ["powerPreference
"]. Otherwise, set context .[[powerPreference]]
to " default ".
-
-
-
If the validate MLContext steps given context return
false
, reject promise with a "NotSupportedError
"DOMException
and abort these steps. -
Resolve promise with context .
7.2.3.
The
createContextSync()
method
The
createContextSync()
method
steps
are:
-
If this 's relevant global object 's associated Document is not allowed to use the webnn feature, throw a "
SecurityError
"DOMException
and abort these steps. -
Let options be the first argument.
-
Let context be the result of running the create context steps given options .
-
If the validate MLContext steps given context return
false
, throw a "NotSupportedError
"DOMException
and abort these steps. -
Return context .
7.3. The MLContext interface
The
MLContext
interface
represents
a
global
state
of
neural
network
compute
workload
and
execution
processes.
Each
MLContext
object
has
associated
context
type
The context type is the type of the execution context that manages the resources and facilitates the compilation and execution of the neural network graph:
-
"
default
" - Context created per user preference options.
-
"
webgpu
" - Context created from a WebGPU device.
The
device
type
indicates
the
kind
of
device
used
for
the
context.
It
is
one
of
the
following:
"
cpu
"
Provides
the
broadest
compatibility
and
usability
across
all
client
devices
with
varying
degrees
of
performance.
"
gpu
"
Provides
the
broadest
range
of
achievable
performance
across
graphics
hardware
platforms
from
consumer
devices
to
professional
workstations.
The
power
preference
indicates
preference
as
related
to
power
consumption.
It
is
one
of
the
following:
-
"
default
" - Let the user agent select the most suitable behavior.
-
"
high-performance " Prioritizes execution speed over power consumption. "low-power - Prioritizes power consumption over other considerations such as execution speed.
typedef record <DOMString ,ArrayBufferView >; [
MLNamedArrayBufferViews SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLContext
MLContext
has
the
following
internal
slots:
-
[[contextType]]
of type context type -
The
MLContext
's context type . -
[[deviceType]] of type device type The MLContext 's device type .[[powerPreference]]
of type power preference -
The
MLContext
's power preference .
7.3.1.
The
MLContext
validation
algorithm
To
validate
MLContext
,
given
context
,
run
these
steps:
-
If context .
[[contextType]]
is not " webgpu " or " default , returnfalse
. -
If context .
[[deviceType]] is not " cpu " or " gpu ", return false . If context .[[powerPreference]]high-performance " or "low-power ", returnfalse
. -
If the user agent cannot support context .
[[contextType]]
and context ., context . [[deviceType]][[powerPreference]]
, returnfalse
. -
Return
true
;
7.3.2. Synchronous Execution
Synchronously carries out the computational workload of a compiled graph
MLGraph
on
the
calling
thread,
which
must
be
a
worker
thread,
to
produce
results
as
defined
by
the
operations
in
the
graph.
partial interface MLContext { [Exposed =(DedicatedWorker )]undefined (
computeSync MLGraph ,
graph MLNamedArrayBufferViews ,
inputs MLNamedArrayBufferViews ); };
outputs
Arguments:
-
graph : an
MLGraph
. The compiled graph to be executed. -
inputs : an
MLNamedArrayBufferViews
. The resources of inputs. -
outputs : an
MLNamedArrayBufferViews
. The pre-allocated resources of required outputs.
Returns:
undefined
.
-
If any of the following requirements are unmet, then throw a "
DataError
"DOMException
and stop.-
For each key -> value of inputs :
-
graph .
[[inputDescriptors]]
[ key ] must exist . -
Let inputDesc be graph .
[[inputDescriptors]]
[ key ]. -
The type of
ArrayBufferView
value must match inputDesc .type
according to this table . -
value .[[ByteLength]] must equal to byte length of inputDesc .
-
-
For each key -> value of outputs :
-
graph .
[[outputDescriptors]]
[ key ] must exist . -
Let outputDesc be graph .
[[outputDescriptors]]
[ key ]. -
The type of
ArrayBufferView
value must match outputDesc .type
according to this table . -
value .[[ByteLength]] must equal to byte length of outputDesc .
-
-
-
For each key -> value of inputs :
-
Let inputDesc be graph .
[[inputDescriptors]]
[ key ]. -
Let inputTensor be a new tensor for graph .
[[implementation]]
. -
Set the data type of inputTensor to the one that matches the element type of
ArrayBufferView
value . -
Set the dimensions of inputTensor to inputDesc .
dimensions
. -
Set the values of inputTensor to the values of value .
-
Set the input of graph .
[[implementation]]
that is associated with key to inputTensor .
-
-
For each key -> value of outputs :
-
Issue a compute request for output of graph .
[[implementation]]
that is associated with key . -
Wait for the compute request to be completed.
-
If there is an error returned by graph .
[[implementation]]
, then:-
Throw an "
OperationError
"DOMException
and stop.
-
-
Else:
-
Let outputTensor be the output tensor returned by graph .
[[implementation]]
. -
If the data type of outputTensor doesn’t match the element type of
ArrayBufferView
value , then throw a "DataError
"DOMException
and stop. -
If the byte length of outputTensor is not equal to value .[[ByteLength]], then:
-
Throw a "
DataError
"DOMException
and stop.
-
-
Else:
-
Set the values of value to the values of outputTensor .
-
-
-
-
Return
undefined
.
7.3.2.1. Examples
const context= navigator. ml. createContextSync(); // Build a graph with two outputs. const builder= new MLGraphBuilder( context); const descA= { type: 'float32' , dimensions: [ 3 , 4 ]}; const a= builder. input( 'a' , descA); const descB= { type: 'float32' , dimensions: [ 4 , 3 ]}; const bufferB= new Float32Array( sizeOfShape( descB. dimensions)). fill( 0.5 ); const b= builder. constant( descB, bufferB); const descC= { type: 'float32' , dimensions: [ 3 , 3 ]}; const bufferC= new Float32Array( sizeOfShape( descC. dimensions)). fill( 1 ); const c= builder. constant( descC, bufferC); const d= builder. matmul( a, b); const e= builder. add( d, c); const graph= builder. buildSync({ 'd' : d, 'e' : e}); const bufferA= new Float32Array( sizeOfShape( descA. dimensions)). fill( 0.5 ); const inputs= { 'a' : bufferA}; // Compute d. const bufferD= new Float32Array( sizeOfShape([ 3 , 3 ])); context. computeSync( graph, inputs, { 'd' : bufferD}); console. log( `values: ${ bufferD} ` ); // Compute e. const bufferE= new Float32Array( sizeOfShape([ 3 , 3 ])); context. computeSync( graph, inputs, { 'e' : bufferE}); console. log( `values: ${ bufferE} ` );
7.3.3. Asynchronous Execution
Asynchronously carries out the computational workload of a compiled graph
MLGraph
on
a
separate
timeline,
either
on
a
CPU
worker
thread
partial interface MLContext {Promise <undefined >(
compute MLGraph ,
graph MLNamedArrayBufferViews ,
inputs MLNamedArrayBufferViews ); };
outputs
Arguments:
-
graph : an
MLGraph
. The compiled graph to be executed. -
inputs : an
MLNamedArrayBufferViews
. The resources of inputs. -
outputs : an
MLNamedArrayBufferViews
. The pre-allocated resources of required outputs.
Returns:
Promise<
undefined
>.
-
If any of the following requirements are unmet, then throw a "
DataError
"DOMException
and stop.-
For each key -> value of inputs :
-
graph .
[[inputDescriptors]]
[ key ] must exist . -
Let inputDesc be graph .
[[inputDescriptors]]
[ key ]. -
The type of
ArrayBufferView
value must match inputDesc .type
according to this table . -
value .[[ByteLength]] must equal to byte length of inputDesc .
-
-
For each key -> value of outputs :
-
graph .
[[outputDescriptors]]
[ key ] must exist . -
Let outputDesc be graph .
[[outputDescriptors]]
[ key ]. -
The type of
ArrayBufferView
value must match outputDesc .type
according to this table . -
value .[[ByteLength]] must equal to byte length of outputDesc .
-
-
-
Let promise be a new promise .
-
For each key -> value of inputs :
-
Let inputDesc be graph .
[[inputDescriptors]]
[ key ]. -
Let inputTensor be a new tensor for graph .
[[implementation]]
. -
Set the data type of inputTensor to the one that matches the element type of
ArrayBufferView
value . -
Set the dimensions of inputTensor to inputDesc .
dimensions
. -
Set the values of inputTensor to the values of value .
-
Set the input of graph .
[[implementation]]
that is associated with key to inputTensor .
-
-
For each key -> value of outputs :
-
Issue a compute request for output of graph .
[[implementation]]
that is associated with key . -
Wait for the compute request to be completed.
-
If there is an error returned by graph .
[[implementation]]
, then:-
reject promise with an "
OperationError
"DOMException
and stop.
-
-
Else:
-
Let outputTensor be the output tensor returned by graph .
[[implementation]]
. -
Let outputDesc be graph .
[[outputDescriptors]]
[ key ]. -
If the data type of outputTensor doesn’t match the element type of
ArrayBufferView
value , then throw a "DataError
"DOMException
and stop. -
If the byte length of outputTensor is not equal to byte length of outputDesc , then:
-
reject promise with an "
OperationError
"DOMException
and stop.
-
-
Else:
-
Set the values of value to the values of outputTensor .
-
-
If all compute requests are completed, Resolve promise and stop.
-
-
-
Return promise .
7.3.4. WebGPU Interoperability
Create
MLCommandEncoder
interface
used
to
record
the
ML
workload
onto
a
WebGPU-compatible
GPUCommandBuffer
to
allow
mixing
of
ML
workload
with
other
GPU
workload
in
an
application
that
leverages
WebGPU.
This
method
only
succeeds
on
an
MLContext
created
with
GPUDevice
.
Otherwise,
it
throws
an
"
OperationError
"
DOMException
.
partial interface MLContext {MLCommandEncoder (); };
createCommandEncoder
MLCommandEncoder
.
The
command
encoder
used
to
record
ML
workload
on
the
GPU.
7.4. The MLOperandDescriptor dictionary
enum {
MLInputOperandLayout ,
"nchw" };
"nhwc" enum {
MLOperandType ,
"float32" ,
"float16" ,
"int32" ,
"uint32" ,
"int8" };
"uint8" dictionary { // The operand type.
MLOperandDescriptor ;required MLOperandType ; // The dimensions field is only required for tensor operands.
type sequence <unsigned long >; };
dimensions
MLOperandDescriptor
desc
is
the
value
returned
by
the
following
steps:
-
Let elementLength be 1.
-
For each dimension of desc .
dimensions
:-
Set elementLength to elementLength × dimension .
-
-
Let elementSize be the element size of one of the
ArrayBufferView
types that matches desc .type
according to this table . -
Return elementLength × elementSize .
7.5. The MLOperand interface
An
MLOperand
represents
an
intermediary
graph
being
constructed
as
a
result
of
compositing
parts
of
an
operation
into
a
fully
composed
operation.
For
instance,
an
MLOperand
may
represent
a
constant
feeding
to
an
operation
or
the
result
from
combining
multiple
constants
together
into
an
operation.
See
also
§ 6
Programming
Model
.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLOperand
See also § 3.1 Guidelines for new operations
7.6. The MLActivation interface
Objects
implementing
the
MLActivation
interface
represent
activation
function
types.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLActivation
MLActivation
interface
can
simply
be
a
struct
that
holds
a
string
type
of
the
activation
function
along
with
other
properties
needed.
The
actual
creation
of
the
activation
function
e.g.
a
§ 7.7.27
The
sigmoid()
method
or
§ 7.7.24
The
relu()
method
can
then
be
deferred
until
when
the
rest
of
the
graph
is
ready
to
connect
with
it
such
as
during
the
construction
of
§ 7.7.5
The
conv2d()
method
for
example.
7.7. The MLGraphBuilder interface
The
MLGraphBuilder
interface
defines
a
set
of
operations
as
identified
by
the
§ 2
Use
cases
that
can
be
composed
into
a
computational
graph.
It
also
represents
the
intermediate
state
of
a
graph
building
session.
typedef record <DOMString ,MLOperand >;
MLNamedOperands dictionary {
MLBufferResourceView required GPUBuffer ;
resource unsigned long long = 0;
offset unsigned long long ; };
size ;typedef (ArrayBufferView or MLBufferResourceView ); [
MLBufferView SecureContext ,Exposed =(Window ,DedicatedWorker )]interface { // Construct the graph builder from the context.
MLGraphBuilder );(
constructor MLContext ); // Create an operand for a graph input.
context MLOperand (
input DOMString ,
name MLOperandDescriptor ); // Create an operand for a graph constant.
desc MLOperand (
constant MLOperandDescriptor ,
desc MLBufferView ); // Create a single-value operand from the specified number of the specified type.
bufferView = "float32");MLOperand (
constant double ,
value optional MLOperandType = "float32"); // Compile the graph up to the specified output operands asynchronously.
type Promise <MLGraph >(
build MLNamedOperands ); // Compile the graph up to the specified output operands synchronously. [
outputs Exposed =(DedicatedWorker )]MLGraph (
buildSync MLNamedOperands ); };
outputs
MLGraphBuilder
.
build()
and
MLGraphBuilder
.
buildSync()
methods
compile
the
graph
builder
state
up
to
the
specified
output
operands
into
a
compiled
graph
according
to
the
type
of
MLContext
that
creates
it.
Since
this
operation
can
be
costly
in
some
machine
configurations,
the
calling
thread
of
the
MLGraphBuilder
.
buildSync()
method
must
only
be
a
worker
thread
to
avoid
potential
disruption
of
the
user
experience.
When
the
[[contextType]]
of
the
MLContext
is
set
to
default
,
the
compiled
graph
is
initialized
right
before
the
MLGraph
is
returned.
This
graph
initialization
stage
is
important
for
optimal
performance
of
the
subsequent
graph
executions.
See
§ 7.9.1
Graph
Initialization
for
more
detail.
MLGraphBuilder
has
the
following
internal
slots:
-
[[context]]
of typeMLContext
-
The context of type
MLContext
associated with thisMLGraphBuilder
.
7.7.1.
The
MLGraphBuilder
constructor
The
new
MLGraphBuilder
constructor
steps
are:
-
If this 's relevant global object 's associated Document is not allowed to use the webnn feature, throw a "
SecurityError
"DOMException
and abort these steps. -
Let context be the first argument.
-
If context is not a valid
MLContext
, throw a "TypeError
" and abort these steps. -
Set
[[context]]
to context .
Add
an
algorithm
to
validate
MLContext
.
[Issue
#webmachinelearning/webnn#308]
7.7.2. The batchNormalization() method
Normalize the tensor values of input features across the batch dimension using [Batch-Normalization] . For each input feature, the mean and variance values of that feature supplied in this calculation as parameters are previously computed across the batch dimension of the input during the model training phase of this operation.dictionary {
MLBatchNormalizationOptions MLOperand ;
scale MLOperand ;
bias long = 1;
axis float = 1e-5;
epsilon MLActivation ; };
activation partial interface MLGraphBuilder {MLOperand (
batchNormalization MLOperand ,
input MLOperand ,
mean MLOperand ,
variance optional MLBatchNormalizationOptions = {}); };
options
-
input : an
MLOperand
. The input N-D tensor. -
mean : an
MLOperand
. The 1-D tensor of the mean values of the input features across the batch whose length is equal to the size of the input dimension denoted by options.axis . -
variance : an
MLOperand
. The 1-D tensor of the variance values of the input features across the batch whose length is equal to the size of the input dimension denoted by options.axis . -
options : an optional
MLBatchNormalizationOptions
. The optional parameters of the operation.-
scale : an
MLOperand
. The 1-D tensor of the scaling values whose length is equal to the size of the input dimension denoted by options.axis . -
bias : an
MLOperand
. The 1-D tensor of the bias values whose length is equal to the size of the input dimension denoted by options.axis . -
axis : a
long
scalar. The index to the feature count dimension of the input shape for which the mean and variance values are. When it’s not specified, the default value is 1. -
epsilon : a
float
scalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified. -
activation : an
MLActivation
. The optional activation function that immediately follows the normalization operation.
-
Returns:
an
MLOperand
.
The
batch-normalized
N-D
tensor
of
the
same
shape
as
the
input
tensor.
When input is a 4-D tensor of the "nchw" or "nhwc" layout, options.axis should be set to 1 or 3 respectively. The axis value designates the feature or channel count dimension of the input tensor.
const shape= [ 1 , - 1 , 1 , 1 ]; return builder. relu( builder. add( builder. mul( builder. reshape( options. scale, shape), builder. div( builder. sub( input, builder. reshape( mean, shape)), builder. pow( builder. add( builder. reshape( variance, shape), builder. constant( options. epsilon)), builder. constant( 0.5 )) )), builder. reshape( options. bias, shape)));
7.7.3. The clamp() method
Clamp the input tensor element-wise within a range specified by the minimum and maximum values.dictionary {
MLClampOptions float ;
minValue float ; };
maxValue partial interface MLGraphBuilder {MLOperand (
clamp MLOperand ,
x optional MLClampOptions = {});
options MLActivation (
clamp optional MLClampOptions = {}); };
options
-
x : an
MLOperand
. The input tensor. -
options : an optional
MLClampOptions
. The optional parameters of the operation.-
minValue : a
float
scalar. Specifies the minimum value of the range. When it is not specified, the clamping is not performed on the lower limit of the range. -
maxValue : a
float
scalar. Specifies the maximum value of the range. When it is not specified, the clamping is not performed on the upper limit of the range.
-
Returns:
-
an
MLOperand
. The output tensor of the same shape as x . -
an
MLActivation
. The activation function representing the clamp operation.
if ( options. minValue=== undefined ) { if ( options. maxValue=== undefined ) { return x; } else { return builder. min( x, builder. constant( options. maxValue)); } } else { if ( options. maxValue=== undefined ) { return builder. max( x, builder. constant( options. minValue)); } else { return builder. min( builder. max( x, builder. constant( options. minValue)), builder. constant( options. maxValue)); } }
7.7.4. The concat() method
Concatenates the input tensors along a given axis.partial interface MLGraphBuilder {MLOperand (
concat sequence <MLOperand >,
inputs long ); };
axis
-
inputs : a sequence of
MLOperand
. All input tensors must have the same shape, except for the size of the dimension to concatenate on. -
axis : a
long
scalar. The axis that the inputs concatenate along, with the value in the interval [0, N) where N is the rank of all the inputs.
Returns:
an
MLOperand
.
The
concatenated
tensor
of
all
the
inputs
along
the
axis
.
The
output
tensor
has
the
same
shape
except
on
the
dimension
that
all
the
inputs
concatenated
along.
The
size
of
that
dimension
is
computed
as
the
sum
of
all
the
input
sizes
of
the
same
dimension.
7.7.5. The conv2d() method
Compute a 2-D convolution given 4-D input and filter tensorsenum {
MLConv2dFilterOperandLayout ,
"oihw" ,
"hwio" ,
"ohwi" };
"ihwo" enum {
MLAutoPad ,
"explicit" ,
"same-upper" };
"same-lower" dictionary {
MLConv2dOptions sequence <unsigned long >;
padding sequence <unsigned long >;
strides sequence <unsigned long >;
dilations MLAutoPad = "explicit";
autoPad unsigned long = 1;
groups MLInputOperandLayout = "nchw";
inputLayout MLConv2dFilterOperandLayout = "oihw";
filterLayout MLOperand ;
bias MLActivation ; };
activation partial interface MLGraphBuilder {MLOperand (
conv2d MLOperand ,
input MLOperand ,
filter optional MLConv2dOptions = {}); };
options
-
input : an
MLOperand
. The input 4-D tensor. The logical shape is interpreted according to the value of options.inputLayout . -
filter : an
MLOperand
. The filter 4-D tensor. The logical shape is interpreted according to the value of options.filterLayout and options.groups . -
options : an optional
MLConv2dOptions
. The optional parameters of the operation.-
padding : a sequence of
unsigned long
of length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of input , [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0]. -
strides : a sequence of
unsigned long
of length 2. The stride of the sliding window for each spatial dimension of input , [stride_height, stride_width]. If not present, the values are assumed to be [1,1]. -
dilations : a sequence of
unsigned long
of length 2. The dilation factor for each spatial dimension of input , [dilation_height, dilation_width]. If not present, the values are assumed to be [1,1]. -
autoPad : an
MLAutoPad
. The automatic input padding options. By default, this argument is set to "explicit" , which means that the values in the options.padding array should be used for input padding. When the option is set other than "explicit" , the values in the options.padding array are ignored. With the "same-upper" option, the padding values are automatically computed such that the additional ending padding of the spatial input dimensions would allow all of the input values in the corresponding dimension to be filtered. The "same-lower" option is similar but padding is applied to the beginning padding of the spatial input dimensions instead of the ending one. -
groups : an
unsigned long
scalar. The number of groups that input channels and output channels are divided into, default to 1. -
inputLayout : an
MLInputOperandLayout
. The default value is "nchw" . This option specifies the layout format of the input and output tensor as follow:"nchw":
-
input tensor: [batches, input_channels, height, width]
-
output tensor: [batches, output_channels, height, width]
"nhwc":
-
input tensor: [batches, height, width, input_channels]
-
output tensor: [batches, height, width, output_channels]
-
-
filterLayout : an
MLConv2dFilterOperandLayout
. The default value is "oihw" . This option specifies the layout format of the filter tensor as follow:"oihw":
-
[output_channels, input_channels/groups, height, width]
"hwio":
-
[height, width, input_channels/groups, output_channels]
"ohwi":
-
[output_channels, height, width, input_channels/groups]
"ihwo":
-
[input_channels/groups, height, width, output_channels]
-
-
bias : an
MLOperand
. The additional 1-D tensor with the shape of [output_channels] whose values are to be added to the convolution result. -
activation : an
MLActivation
. The optional activation function that immediately follows the convolution operation.
-
Returns:
an
MLOperand
.
The
output
4-D
tensor
that
contains
the
convolution
result.
The
output
shape
is
interpreted
according
to
the
options.inputLayout
value.
More
specifically,
the
spatial
dimensions
or
the
sizes
of
the
last
two
dimensions
of
the
output
tensor
for
the
nchw
input
layout
can
be
calculated
as
follow:
output size = 1 + (input size - filter size - (filter size - 1) * (dilation - 1) + beginning padding + ending padding) / stride
7.7.6. The convTranspose2d() method
Compute a 2-D transposed convolution given 4-D input and filter tensorsenum {
MLConvTranspose2dFilterOperandLayout ,
"iohw" ,
"hwoi" };
"ohwi" dictionary {
MLConvTranspose2dOptions sequence <unsigned long >;
padding sequence <unsigned long >;
strides sequence <unsigned long >;
dilations sequence <unsigned long >;
outputPadding sequence <unsigned long >;
outputSizes MLAutoPad = "explicit";
autoPad unsigned long = 1;
groups MLInputOperandLayout = "nchw";
inputLayout MLConvTranspose2dFilterOperandLayout = "iohw";
filterLayout MLOperand ;
bias MLActivation ; };
activation partial interface MLGraphBuilder {MLOperand (
convTranspose2d MLOperand ,
input MLOperand ,
filter optional MLConvTranspose2dOptions = {}); };
options
-
input : an
MLOperand
. The input 4-D tensor. The logical shape is interpreted according to the value of options.inputLayout . -
filter : an
MLOperand
. The filter 4-D tensor. The logical shape is interpreted according to the value of options.filterLayout and options.groups . -
options : an optional
MLConvTranspose2dOptions
. The optional parameters of the operation.-
padding : a sequence of
unsigned long
of length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of input , [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0]. -
strides : a sequence of
unsigned long
of length 2. The stride of the sliding window for each spatial dimension of input , [stride_height, stride_width]. If not present, the values are assumed to be [1,1]. -
dilations : a sequence of
unsigned long
of length 2. The dilation factor for each spatial dimension of input , [dilation_height, dilation_width]. If not present, the values are assumed to be [1,1]. -
outputPadding : a sequence of
unsigned long
of length 2. The padding values applied to each spatial dimension of the output tensor. This explicit padding values are needed to disambiguate the output tensor shape for transposed convolution when the value of the options.strides is greater than 1. Note that these values are only used to disambiguate output shape when needed; it does not necessarily cause any padding value to be written to the output tensor. If not specified, the values are assumed to be [0,0]. -
outputSizes : a sequence of
unsigned long
of length 2. The sizes of the last two dimensions of the output tensor. When the output sizes are explicitly specified, the output padding values in options.outputPadding are ignored. If not specified, the output sizes are automatically computed. -
autoPad : an
MLAutoPad
. The automatic input padding options. By default, this argument is set to "explicit" , which means that the values in the options.padding array should be used for input padding. When the option is set other than "explicit" , the values in the options.padding array are ignored. With the "same-upper" option, the padding values are automatically computed such that the additional ending padding of the spatial input dimensions would allow all of the input values in the corresponding dimension to be filtered. The "same-lower" option is similar but padding is applied to the beginning padding of the spatial input dimensions instead of the ending one. -
groups : an
unsigned long
scalar. The number of groups that input channels and output channels are divided into, default to 1. -
inputLayout : an
MLInputOperandLayout
. The default value is "nchw" . This option specifies the layout format of the input and output tensor as follow:"nchw":
-
input tensor: [batches, input_channels, height, width]
-
output tensor: [batches, output_channels, height, width]
"nhwc":
-
input tensor: [batches, height, width, input_channels]
-
output tensor: [batches, height, width, output_channels]
-
-
filterLayout : an
MLConvTranspose2dFilterOperandLayout
. The default value is "iohw" . This option specifies the layout format of the filter tensor as follow:"iohw":
-
[input_channels, output_channels/groups, height, width]
"hwoi":
-
[height, width, output_channels/groups, input_channels]
"ohwi":
-
[output_channels/groups, height, width, input_channels]
-
-
bias : an
MLOperand
. The additional 1-D tensor with the shape of [output_channels] whose values are to be added to the transposed convolution result. -
activation : an
MLActivation
. The optional activation function that immediately follows the transposed convolution operation.
-
Returns:
an
MLOperand
.
The
output
4-D
tensor
that
contains
the
transposed
convolution
result.
The
output
shape
is
interpreted
according
to
the
options.inputLayout
value.
More
specifically,
unless
the
options.outputSizes
values
are
explicitly
specified,
the
options.outputPadding
may
be
needed
to
compute
the
spatial
dimension
values
of
the
output
tensor
as
follow:
output size = (input size - 1) * stride + filter size + (filter size - 1) * (dilation - 1) - beginning padding - ending padding + output padding
7.7.7. Element-wise binary operations
Compute the element-wise binary addition, subtraction, multiplication, division, maximum and minimum of the two input tensors.partial interface MLGraphBuilder {MLOperand (
add MLOperand ,
a MLOperand );
b MLOperand (
sub MLOperand ,
a MLOperand );
b MLOperand (
mul MLOperand ,
a MLOperand );
b MLOperand (
div MLOperand ,
a MLOperand );
b MLOperand (
max MLOperand ,
a MLOperand );
b MLOperand (
min MLOperand ,
a MLOperand );
b MLOperand (
pow MLOperand ,
a MLOperand ); };
b
Returns:
an
MLOperand
.
The
output
tensor
that
contains
the
result
of
element-wise
binary
operation
of
the
two
input
tensors.
The element-wise binary operation will be broadcasted according to [numpy-broadcasting-rule] . The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.
Operation types:
-
add : Add the values of the two input tensors, element-wise.
-
sub : Subtract the values of the second input tensor from the values of the first input tensor, element-wise.
-
mul : Multiply the values of the two input tensors, element-wise.
-
div : Divide the values of the first input tensor with the values of the second tensor, element-wise.
-
max : Select the greater values of the two input tensors, element-wise.
-
min : Select the lesser values of the two input tensors, element-wise.
-
pow : Compute the values of the values of the first input tensor to the power of the values of the second input tensor, element-wise.
7.7.8. Element-wise unary operations
Compute the element-wise unary operation for input tensor.partial interface MLGraphBuilder {MLOperand (
abs MLOperand );
x MLOperand (
ceil MLOperand );
x MLOperand (
cos MLOperand );
x MLOperand (
exp MLOperand );
x MLOperand (
floor MLOperand );
x MLOperand (
log MLOperand );
x MLOperand (
neg MLOperand );
x MLOperand (
sin MLOperand );
x MLOperand (
tan MLOperand ); };
x
-
x : an
MLOperand
. The input tensor.
Returns:
an
MLOperand
.
The
output
tensor
that
contains
the
result
of
element-wise
unary
operation
of
the
input
tensor.
The
shape
of
the
output
tensor
is
the
same
as
the
shape
of
input
tensor.
Operation types:
-
abs : Compute the absolute value of the input tensor, element-wise.
-
ceil : Compute the ceiling of the input tensor, element-wise.
-
cos : Compute the cosine of the input tensor, element-wise.
-
exp : Compute the exponential of the input tensor, element-wise.
-
floor : Compute the floor of the input tensor, element-wise.
-
log : Compute the natural logarithm of the input tensor, element-wise.
-
neg : Compute the numerical negative value of the input tensor, element-wise.
-
sin : Compute the sine of the input tensor, element-wise.
-
tan : Compute the tangent of the input tensor, element-wise.
7.7.9. The elu() method
Calculate the exponential linear unit function on the input tensor element-wise. The calculation follows the expression
max(0,
x)
+
alpha
*
(exp(min(0,
x))
-
1)
.
dictionary {
MLEluOptions float = 1; };
alpha partial interface MLGraphBuilder {MLOperand (
elu MLOperand ,
x optional MLEluOptions = {});
options MLActivation (
elu optional MLEluOptions = {}); };
options
-
x : an
MLOperand
. The input tensor. -
options : an optional
MLEluOptions
. The optional parameters of the operation.-
alpha : a
float
scalar multiplier, default to 1.
-
Returns:
-
an
MLOperand
. The output tensor of the same shape as x . -
an
MLActivation
. The activation function representing the elu operation.
return builder. add( builder. max( 0 , x), builder. mul( builder. constant( options. alpha), builder. sub( builder. exp( builder. min( builder. constant( 0 ), x)), builder. constant( 1 ))));
7.7.10. The gemm() method
Calculate the general matrix multiplication of the Basic Linear Algebra Subprograms . The calculation follows the expression
alpha
*
A
*
B
+
beta
*
C
,
where
A
is
a
2-D
tensor
with
shape
[M,
K]
or
[K,
M],
B
is
a
2-D
tensor
with
shape
[K,
N]
or
[N,
K],
and
C
is
broadcastable
to
the
shape
[M,
N].
A
and
B
may
optionally
be
transposed
prior
to
the
calculation.
dictionary {
MLGemmOptions MLOperand ;
c float = 1.0;
alpha float = 1.0;
beta boolean =
aTranspose false ;boolean =
bTranspose false ; };partial interface MLGraphBuilder {MLOperand (
gemm MLOperand ,
a MLOperand ,
b optional MLGemmOptions = {}); };
options
-
a : an
MLOperand
. The first input 2-D tensor with shape [M, K] if aTranspose is false, or [K, M] if aTranspose is true. -
b : an
MLOperand
. The second input 2-D tensor with shape [K, N] if bTranspose is false, or [N, K] if bTranspose is true. -
options : an optional
MLGemmOptions
. The optional parameters of the operation.-
c : an
MLOperand
. The third input tensor. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape [M, N] according to [numpy-broadcasting-rule] . When it is not specified, the computation is done as if c is a scalar 0.0. -
alpha : a
float
scalar multiplier for the first input, default to 1.0. -
beta : a
float
scalar multiplier for the third input, default to 1.0. -
aTranspose : a
boolean
indicating if the first input should be transposed prior to calculating the output, default to false. -
bTranspose : a
boolean
indicating if the second input should be transposed prior to calculating the output, default to false.
-
Returns:
an
MLOperand
.
The
output
2-D
tensor
of
shape
[M,
N]
that
contains
the
calculated
product
of
all
the
inputs.
if ( options. aTranspose) a= builder. transpose( a); if ( options. bTranspose) b= builder. transpose( b); let ab= builder. matmul( builder. mul( builder. constant( options. alpha), a), b); return ( c? builder. add( ab, builder. mul( builder. constant( options. beta), c)) : ab);
7.7.11. The gru() method
Gated Recurrent Unit [GRU] recurrent network uses an update, reset, and new gate to compute the output state that rolls into the output across the temporal sequence of the network.enum {
MLGruWeightLayout , // update-reset-new gate ordering
"zrn" // reset-update-new gate ordering };
"rzn" enum {
MLRecurrentNetworkDirection ,
"forward" ,
"backward" };
"both" dictionary {
MLGruOptions MLOperand ;
bias MLOperand ;
recurrentBias MLOperand ;
initialHiddenState boolean =
resetAfter true ;boolean =
returnSequence false ;MLRecurrentNetworkDirection = "forward";
direction MLGruWeightLayout = "zrn";
layout sequence <MLActivation >; };
activations partial interface MLGraphBuilder {sequence <MLOperand >(
gru MLOperand ,
input MLOperand ,
weight MLOperand ,
recurrentWeight unsigned long ,
steps unsigned long ,
hiddenSize optional MLGruOptions = {}); };
options
-
input : an
MLOperand
. The input 3-D tensor of shape [steps, batch_size, input_size]. -
weight : an
MLOperand
. The 3-D input weight tensor of shape [num_directions, 3 * hidden_size, input_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
recurrentWeight : an
MLOperand
. The 3-D recurrent weight tensor of shape [num_directions, 3 * hidden_size, hidden_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
steps : an
unsigned long
scalar. The number of time steps in the recurrent network. The value must be greater than 0. -
hiddenSize : an
unsigned long
scalar. The value of the third dimension of the cell output tensor shape. It indicates the number of features in the hidden state. -
options : an optional
MLGruOptions
. The optional parameters of the operation.-
bias : an
MLOperand
. The 2-D input bias tensor of shape [num_directions, 3 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
recurrentBias : an
MLOperand
. The 2-D recurrent bias tensor of shape [num_directions, 3 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
initialHiddenState : an
MLOperand
. The 3-D initial hidden state tensor of shape [num_directions, batch_size, hidden_size]. When not specified, it’s assumed to be a tensor filled with zero. -
resetAfter : a
boolean
indicating whether to apply the reset gate after or before matrix multiplication. Default to true. -
returnSequence : a
boolean
indicating whether to also return the entire sequence with every output from each time step in it in addition to the output of the last time step. Default to false. -
direction : an
MLRecurrentNetworkDirection
. The processing direction of the input sequence. When set to "both" , the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. -
layout : an
MLGruWeightLayout
. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the update (z) , reset (r) , and new (n) gate, as indicated in the second dimension of the weight and bias tensor shape. When not specified, the default layout is "zrn" . -
activations : a sequence of
MLActivation
. A pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, it’s assumed to be the sigmoid ( "sigmoid" ) and the hyperbolic tangent ( "tanh" ) function respectively.
-
Returns:
a
sequence
of
MLOperand
.
The
first
element
of
the
sequence
is
a
3-D
tensor
of
shape
[num_directions,
batch_size,
hidden_size],
the
cell
output
from
the
last
time
step
of
the
network.
Additionally,
if
options.returnSequence
is
set
to
true,
the
second
element
is
the
4-D
output
tensor
of
shape
[steps,
num_directions,
batch_size,
hidden_size]
containing
every
cell
outputs
from
each
time
step
in
the
temporal
sequence.
const numDirections= ( options. direction== "both" ? 2 : 1 ); let hiddenState= options. initialHiddenState; if ( ! hiddenState) { const desc= { type: 'float32' , dimensions: [ numDirections, 1 , hiddenSize] }; const totalSize= numDirections* hiddenSize; hiddenState= builder. constant( desc, new Float32Array( totalSize). fill( 0 )); } let sequence= null ; let currentWeight= []; let currentRecurrentWeight= []; let currentBias= []; let currentRecurrentBias= []; for ( let dir= 0 ; dir< numDirections; ++ dir) { currentWeight. push( builder. squeeze( builder. slice( weight, [ dir, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); currentRecurrentWeight. push( builder. squeeze( builder. slice( recurrentWeight, [ dir, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); currentBias. push( options. bias? ( builder. squeeze( builder. slice( options. bias, [ dir, 0 ], [ 1 , - 1 ]), { axes: [ 0 ] })) : null ); currentRecurrentBias. push( options. recurrentBias? ( builder. squeeze( builder. slice( options. recurrentBias, [ dir, 0 ], [ 1 , - 1 ]), { axes: [ 0 ] })) : null ); } for ( let step= 0 ; step< steps; ++ step) { let currentHidden= []; let currentOutput= null ; for ( let dir= 0 ; dir< numDirections; ++ dir) { currentHidden. push( builder. squeeze( builder. slice( hiddenState, [ dir, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); } for ( let dir= 0 ; dir< numDirections; ++ dir) { let slice= ( dir== 1 || options. direction== "backward" ? steps- step- 1 : step); let currentInput= builder. squeeze( builder. slice( input, [ slice, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] }); let result= builder. reshape( builder. gruCell( currentInput, currentWeight[ dir], currentRecurrentWeight[ dir], currentHidden[ dir], hiddenSize, { bias: currentBias[ dir], recurrentBias: currentRecurrentBias[ dir], resetAfter: options. resetAfter, layout: options. layout, activations: options. activations}), [ 1 , - 1 , hiddenSize]); currentOutput= ( currentOutput? builder. concat([ currentOutput, result], 0 ) : result); } hiddenState= currentOutput; if ( options. returnSequence) { currentOutput= builder. reshape( currentOutput, [ 1 , numDirections, - 1 , hiddenSize]); sequence= ( sequence? builder. concat([ sequence, currentOutput], 0 ) : currentOutput); } } return ( sequence? [ hiddenState, sequence] : [ hiddenState]);
7.7.12. The gruCell() method
A single time step of the Gated Recurrent Unit [GRU] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of a recurrent network.dictionary {
MLGruCellOptions MLOperand ;
bias MLOperand ;
recurrentBias boolean =
resetAfter true ;MLGruWeightLayout = "zrn";
layout sequence <MLActivation >; };
activations partial interface MLGraphBuilder {MLOperand (
gruCell MLOperand ,
input MLOperand ,
weight MLOperand ,
recurrentWeight MLOperand ,
hiddenState unsigned long ,
hiddenSize optional MLGruCellOptions = {}); };
options
-
input : an
MLOperand
. The input 2-D tensor of shape [batch_size, input_size]. -
weight : an
MLOperand
. The 2-D input weight tensor of shape [3 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
recurrentWeight : an
MLOperand
. The 2-D recurrent weight tensor of shape [3 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
hiddenState : an
MLOperand
. The 2-D input hidden state tensor of shape [batch_size, hidden_size]. -
hiddenSize : an
unsigned long
scalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. -
options : an optional
MLGruCellOptions
. The optional parameters of the operation.-
bias : an
MLOperand
. The 1-D input bias tensor of shape [3 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
recurrentBias : an
MLOperand
. The 1-D recurrent bias tensor of shape [3 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
resetAfter : a
boolean
indicating whether to apply the reset gate after or before matrix multiplication. Default to true. -
layout : an
MLGruWeightLayout
. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the update (z) , reset (r) , and new (n) gate, as indicated in the first dimension of the weight and bias tensor shapes. When not specified, the default layout is "zrn" . -
activations : a sequence of
MLActivation
. A pair of activation functions with the first function used for the update (z) and reset (r) gate, and the second used for the new (n) gate. When not specified, it’s default to the sigmoid ( "sigmoid" ) and the hyperbolic tangent ( "tanh" ) function respectively.
-
Returns:
an
MLOperand
.
The
2-D
tensor
of
shape
[batch_size,
hidden_size],
the
cell
output
hidden
state
of
a
single
time
step
of
the
recurrent
network.
const one= builder. constant( 1 ); const zero= builder. constant( 0 ); // update gate (z) let z= builder. sigmoid( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 0 ], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 0 ], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 0 , 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 0 , 0 ], [ hiddenSize, - 1 ])) ) ) ) ); // reset gate (r) let r= builder. sigmoid( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ); // new gate (n) let n; if ( resetAfter) { n= builder. tanh( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. mul( r, builder. add( ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ) ) ); } else { n= builder. tanh( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( builder. mul( r, hiddenState), builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ); } // compute the new hidden state return builder. add( builder. mul( z, hiddenState), builder. mul( n, builder. sub( one, z)));
7.7.13. The hardSigmoid() method
Calculate the non-smooth function used in place of a sigmoid function on the input tensor.dictionary {
MLHardSigmoidOptions float = 0.2;
alpha float = 0.5; };
beta partial interface MLGraphBuilder {MLOperand (
hardSigmoid MLOperand ,
x optional MLHardSigmoidOptions = {});
options MLActivation (
hardSigmoid optional MLHardSigmoidOptions = {}); };
options
-
x : an
MLOperand
. The input tensor. -
options : an optional
MLHardSigmoidOptions
. The optional parameters of the operation.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x . -
an
MLActivation
. The activation function representing the hard sigmoid operation.
return builder. max( builder. min( builder. add( builder. mul( builder. constant( options. alpha), x), builder. constant( options. beta)), builder. constant( 1 )), builder. constant( 0 ));
7.7.14. The hardSwish() method
Computes the nonlinear function
y
=
x
*
max(0,
min(6,
(x
+
3)))
/
6
that
is
introduced
by
[MobileNetV3]
on
the
input
tensor
element-wise.
partial interface MLGraphBuilder {MLOperand (
hardSwish MLOperand );
x MLActivation (); };
hardSwish
-
x : an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x . -
an
MLActivation
. The activation function representing the hard-swish operation.
return builder. div( builder. mul( x, builder. max( builder. constant( 0 ), builder. min( builder. constant( 6 ), builder. add( x, builder. constant( 3 ))))), builder. constant( 6 ));
7.7.15. The instanceNormalization() method
Normalize the input features using [Instance-Normalization] . Unlike § 7.7.2 The batchNormalization() method where the mean and variance values used in the calculation are previously computed across the batch dimension during the model training phase, the mean and variance values used in the calculation of an instance normalization are computed internally on the fly per input feature.dictionary {
MLInstanceNormalizationOptions MLOperand ;
scale MLOperand ;
bias float = 1e-5;
epsilon MLInputOperandLayout = "nchw"; };
layout partial interface MLGraphBuilder {MLOperand (
instanceNormalization MLOperand ,
input optional MLInstanceNormalizationOptions = {}); };
options
-
input : an
MLOperand
. The input 4-D tensor. -
options : an optional
MLInstanceNormalizationOptions
. The optional parameters of the operation.-
scale : an
MLOperand
. The 1-D tensor of the scaling values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with nchw layout, the feature dimension is 1. -
bias : an
MLOperand
. The 1-D tensor of the bias values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with nchw layout, the feature dimension is 1. -
epsilon : a
float
scalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified. -
layout : an
MLInputOperandLayout
. This option specifies the layout format of the input. The default value is "nchw" .
-
Returns:
an
MLOperand
.
The
instance-normalized
4-D
tensor
of
the
same
shape
as
the
input
tensor.
// The mean reductions happen over the spatial dimensions of the input // e.g. axis 2 and 3 of the input tensor. const reduceOptions= { axes: [ 2 , 3 ], keepDimensions: true }; const mean= builder. reduceMean( input, reduceOptions); const variance= builder. reduceMean( builder. pow( builder. sub( input, mean), buider. constant( 2 )), reduceOptions); // The scale and bias values are applied per input feature // e.g. axis 1 of the input tensor. const shape= [ 1 , - 1 , 1 , 1 ]; return builder. add( builder. mul( builder. reshape( options. scale, shape), builder. div( builder. sub( input, mean), buidler. pow( builder. add( variance, options. epsilon), builder. constant( 0.5 )) ) ), builder. reshape( options. bias, shape) );
7.7.16. The leakyRelu() method
Calculate the leaky version of rectified linear function on the input tensor element-wise. The calculation follows the expression
max(0,
x)
+
alpha
∗
min(0,
x)
.
dictionary {
MLLeakyReluOptions float = 0.01; };
alpha partial interface MLGraphBuilder {MLOperand (
leakyRelu MLOperand ,
x optional MLLeakyReluOptions = {});
options MLActivation (
leakyRelu optional MLLeakyReluOptions = {}); };
options
-
x : an
MLOperand
. The input tensor. -
options : an optional
MLLeakyReluOptions
. The optional parameters of the operation.-
alpha : a
float
scalar multiplier, default to 0.01.
-
Returns:
-
an
MLOperand
. The output tensor of the same shape as x . -
an
MLActivation
. The activation function representing the leaky relu operation.
return builder. add( builder. max( builder. constant( 0 ), x), builder. mul( builder. constant( options. alpha), builder. min( builder. constant( 0 ), x)));
7.7.17. The linear() method
Calculate a linear function
y
=
alpha
*
x
+
beta
on
the
input
tensor.
dictionary {
MLLinearOptions float = 1;
alpha float = 0; };
beta partial interface MLGraphBuilder {MLOperand (
linear MLOperand ,
x optional MLLinearOptions = {});
options MLActivation (
linear optional MLLinearOptions = {}); };
options
-
x : an
MLOperand
. The input tensor. -
options : an optional
MLLinearOptions
. The optional parameters of the operation.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x . -
an
MLActivation
. The activation function representing the linear operation.
return builder. add( builder. mul( x, builder. constant( options. alpha)), builder. constant( options. beta));
7.7.18. The lstm() method
Long Short-Term Memory [LSTM] recurrent network uses an input, output, forget, and cell gate to compute the output state that rolls into the output across the temporal sequence of the network.enum {
MLLstmWeightLayout , // input-output-forget-cell gate ordering
"iofg" // input-forget-cell-output gate ordering };
"ifgo" dictionary {
MLLstmOptions MLOperand ;
bias MLOperand ;
recurrentBias MLOperand ;
peepholeWeight MLOperand ;
initialHiddenState MLOperand ;
initialCellState boolean =
returnSequence false ;MLRecurrentNetworkDirection = "forward";
direction MLLstmWeightLayout = "iofg";
layout sequence <MLActivation >; };
activations partial interface MLGraphBuilder {sequence <MLOperand >(
lstm MLOperand ,
input MLOperand ,
weight MLOperand ,
recurrentWeight unsigned long ,
steps unsigned long ,
hiddenSize optional MLLstmOptions = {}); };
options
-
input : an
MLOperand
. The input 3-D tensor of shape [steps, batch_size, input_size]. -
weight : an
MLOperand
. The 3-D input weight tensor of shape [num_directions, 4 * hidden_size, input_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
recurrentWeight : an
MLOperand
. The 3-D recurrent weight tensor of shape [num_directions, 4 * hidden_size, hidden_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
steps : an
unsigned long
scalar. The number of time steps in the recurrent network. The value must be greater than 0. -
hiddenSize : an
unsigned long
scalar. The value of the third dimension of the cell output tensor shape. It indicates the number of features in the hidden state. -
options : an optional
MLGruOptions
. The optional parameters of the operation.-
bias : an
MLOperand
. The 2-D input bias tensor of shape [num_directions, 4 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
recurrentBias : an
MLOperand
. The 2-D recurrent bias tensor of shape [num_directions, 4 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
peepholeWeight : an
MLOperand
. The 2-D weight tensor for peepholes of shape [num_directions, 4 * hidden_size]. The pack ordering of the weight vectors is for the input (i) , output (o) , and forget (f) gate respectively. -
initialHiddenState : an
MLOperand
. The 3-D initial hidden state tensor of shape [num_directions, batch_size, hidden_size]. When not specified, it’s assumed to be a tensor filled with zero. -
initialCellState : an
MLOperand
. The 3-D initial hidden state tensor of shape [num_directions, batch_size, hidden_size]. When not specified, it’s assumed to be a tensor filled with zero. -
returnSequence : a
boolean
indicating whether to also return the entire sequence with every output from each time step in it in addition to the output of the last time step. Default to false. -
direction : an
MLRecurrentNetworkDirection
. The processing direction of the input sequence. When set to "both" , the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. -
layout : an
MLLstmWeightLayout
. The ordering of the weight and bias vectors for the internal gates of LSTM, specifically the input (i) , output (o) , forget (f) , and cell (g) gate, as indicated in the second dimension of the weight and bias tensor shapes. When not specified, the default layout is "iofg" . -
activations : a sequence of
MLActivation
. A sequence of three activation functions, the first one is used for the input (i) , forget (f) , and output (o) gate, the second one is used for the cell (g) gate, and the last used for filtering the output cell state before combining it with the result of the output gate to form the output hidden state. When not specified, they are assumed to be of the sigmoid function ( "sigmoid" ) followed by two hyperbolic tangent functions ( "tanh" ) respectively.
-
Returns:
a
sequence
of
MLOperand
.
The
first
element
of
the
sequence
is
a
3-D
tensor
of
shape
[num_directions,
batch_size,
hidden_size],
the
output
hidden
state
from
the
last
time
step
of
the
network.
The
second
element
is
a
3-D
tensor
of
shape
[num_directions,
batch_size,
hidden_size],
the
output
cell
state
from
the
last
time
step
of
the
network.
Additionally,
if
options.returnSequence
is
set
to
true,
the
third
element
is
the
4-D
output
tensor
of
shape
[steps,
num_directions,
batch_size,
hidden_size]
containing
every
output
from
each
time
step
in
the
temporal
sequence.
const numDirections= ( options. direction== "both" ? 2 : 1 ); let hiddenState= options. initialHiddenState; let cellState= options. initialCellState; if ( ! hiddenState) { const desc= { type: 'float32' , dimensions: [ numDirections, 1 , hiddenSize] }; const totalSize= numDirections* hiddenSize; hiddenState= builder. constant( desc, new Float32Array( totalSize). fill( 0 )); } if ( ! cellState) { const desc= { type: 'float32' , dimensions: [ numDirections, 1 , hiddenSize] }; const totalSize= numDirections* hiddenSize; cellState= builder. constant( desc, new Float32Array( totalSize). fill( 0 )); } let sequence= null ; let currentWeight= []; let currentRecurrentWeight= []; let currentBias= []; let currentRecurrentBias= []; let currentPeepholeWeight= []; for ( let dir= 0 ; dir< numDirections; ++ dir) { currentWeight. push( builder. squeeze( builder. slice( weight, [ dir, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); currentRecurrentWeight. push( builder. squeeze( builder. slice( recurrentWeight, [ dir, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); currentBias. push( options. bias? ( builder. squeeze( builder. slice( options. bias, [ dir, 0 ], [ 1 , - 1 ]), { axes: [ 0 ] })) : null ); currentRecurrentBias. push( options. recurrentBias? ( builder. squeeze( builder. slice( options. recurrentBias, [ dir, 0 ], [ 1 , - 1 ]), { axes: [ 0 ] })) : null ); currentPeepholeWeight. push( options. peepholeWeight? ( builder. squeeze( builder. slice( options. peepholeWeight, [ dir, 0 ], [ 1 , - 1 ]), { axes: [ 0 ] })) : null ); } for ( let step= 0 ; step< steps; ++ step) { let currentHidden= []; let currentCell= []; let nextHidden= null ; let nextCell= null ; for ( let dir= 0 ; dir< numDirections; ++ dir) { currentHidden. push( builder. squeeze( builder. slice( hiddenState, [ dir, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); currentCell. push( builder. squeeze( builder. slice( cellState, [ dir, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); } for ( let dir= 0 ; dir< numDirections; ++ dir) { let slice= ( dir== 1 || options. direction== "backward" ? steps- step- 1 : step); let currentInput= builder. squeeze( builder. slice( input, [ slice, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] }); let results= builder. lstmCell( currentInput, currentWeight[ dir], currentRecurrentWeight[ dir], currentHidden[ dir], currentCell[ dir], hiddenSize, { bias: currentBias[ dir], recurrentBias: currentRecurrentBias[ dir], peepholeWeight: currentPeepholeWeight[ dir], layout: options. layout, activations: options. activations}); let output= builder. reshape( results[ 0 ], [ 1 , - 1 , hiddenSize]); let cell= builder. reshape( results[ 1 ], [ 1 , - 1 , hiddenSize]); nextHidden= ( nextHidden? builder. concat([ nextHidden, result], 0 ) : output); nextCell= ( nextCell? builder. concat([ nextCell, result], 0 ) : cell); } hiddenState= nextHidden; cellState= nextCell; if ( options. returnSequence) { nextHidden= builder. reshape( nextHidden, [ 1 , numDirections, - 1 , hiddenSize]); sequence= ( sequence? builder. concat([ sequence, nextHidden], 0 ) : nextHidden); } } return ( sequence? [ hiddenState, cellState, sequence] : [ hiddenState, cellState]);
7.7.19. The lstmCell() method
A single time step of the Long Short-Term Memory [LSTM] recurrent network using a cell state, an input, output, and forget gate to compute the cell state and the hidden state of the next time step that rolls into the output across the temporal sequence of the network.dictionary {
MLLstmCellOptions MLOperand ;
bias MLOperand ;
recurrentBias MLOperand ;
peepholeWeight MLLstmWeightLayout = "iofg";
layout sequence <MLActivation >; };
activations partial interface MLGraphBuilder {sequence <MLOperand >(
lstmCell MLOperand ,
input MLOperand ,
weight MLOperand ,
recurrentWeight MLOperand ,
hiddenState MLOperand ,
cellState unsigned long ,
hiddenSize optional MLLstmCellOptions = {}); };
options
-
input : an
MLOperand
. The input 2-D tensor of shape [batch_size, input_size]. -
weight : an
MLOperand
. The 2-D input weight tensor of shape [4 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
recurrentWeight : an
MLOperand
. The 2-D recurrent weight tensor of shape [4 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
hiddenState : an
MLOperand
. The 2-D input hidden state tensor of shape [batch_size, hidden_size]. -
cellState : an
MLOperand
. The 2-D input cell state tensor of shape [batch_size, hidden_size]. -
hiddenSize : an
unsigned long
scalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. -
options : an optional
MLLstmCellOptions
. The optional parameters of the operation.-
bias : an
MLOperand
. The 1-D input bias tensor of shape [4 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
recurrentBias : an
MLOperand
. The 1-D recurrent bias tensor of shape [4 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
peepholeWeight : an
MLOperand
. The 1-D weight tensor for peepholes of shape [3 * hidden_size]. The pack ordering of the weight vectors is for the input (i) , output (o) , and forget (f) gate respectively. -
layout : an
MLLstmWeightLayout
. The ordering of the weight and bias vectors for the internal gates of LSTM, specifically the input (i) , output (o) , forget (f) , and cell (g) gate, as indicated in the first dimension of the weight and bias tensor shapes. When not specified, the default layout is "iofg" . -
activations : a sequence of
MLActivation
. A sequence of three activation functions, the first one is used for the input (i) , forget (f) , and output (o) gate, the second one is used for the cell (g) gate, and the last used for filtering the output cell state before combining it with the result of the output gate to form the output hidden state. When not specified, they are assumed to be of the sigmoid function ( "sigmoid" ) followed by two hyperbolic tangent functions ( "tanh" ) respectively.
-
Returns:
a
sequence
of
MLOperand
.
The
first
element
of
the
sequence
is
the
output
hidden
state
of
the
current
time
step
of
the
recurrent
network.
The
following
element
is
the
output
cell
state.
Both
elements
are
2-D
tensors
of
shape
[batch_size,
hidden_size].
const zero= builder. constant( 0 ); // input gate (i) let i= builder. sigmoid( builder. add( builder. mul( cellState, ( options. peepholeWeight? builder. slice( options. peepholeWeight, [ 0 ], [ hiddenSize]) : zero) ), builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 0 ], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 0 ], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 0 , 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 0 , 0 ], [ hiddenSize, - 1 ])) ) ) ) ) ); // forget gate (f) let f= builder. sigmoid( builder. add( builder. mul( cellState, ( options. peepholeWeight? builder. slice( options. peepholeWeight, [ 2 * hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ) ); // cell gate (g) let g= builder. tanh( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 3 * hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 3 * hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 3 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 3 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ); // output gate (o) let o= builder. sigmoid( builder. add( builder. mul( cellState, ( options. peepholeWeight? builder. slice( options. peepholeWeight, [ hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ) ); // output cell state (ct) let ct= builder. add( builder. mul( f, cellState), builder. mul( i, g)); // output hidden state (ht) let ht= builder. mul( o, builder. tanh( ct)); return [ ht, ct];
7.7.20. The matmul() method
Compute the matrix product of two input tensors.partial interface MLGraphBuilder {MLOperand (
matmul MLOperand ,
a MLOperand ); };
b
Returns:
an
MLOperand
.
The
output
N-D
tensor
that
contains
the
matrix
product
of
two
input
tensors.
Compute the matrix product of two input tensors. It behaves as following:
-
If both a and b are 2-D, they are multiplied like conventional matrices and produce a 2-D tensor as the output.
-
If either a or b is N-D, N > 2, it is treated as a stack of matrices with dimensions corresponding to the last two indices. The matrix multiplication will be broadcasted accordingly by following [numpy-broadcasting-rule] . The output is a N-D tensor whose rank is the maximum rank of the input tensors. For each dimension, except the last two, of the output tensor, its size is the maximum size along that dimension of the input tensors.
-
If a is 1-D, it is converted to a 2-D tensor by prepending a 1 to its dimensions.
-
If b is 1-D, it is converted to a 2-D tensor by by appending a 1 to its dimensions.
-
If both a and b are 1-D, the operation is a vector dot-product, which produces a scalar output.
7.7.21. The pad() method
Inflate the tensor with constant or mirrored values on the edges.enum {
MLPaddingMode ,
"constant" ,
"edge" ,
"reflection" };
"symmetric" dictionary {
MLPadOptions MLPaddingMode = "constant";
mode float = 0; };
value partial interface MLGraphBuilder {MLOperand (
pad MLOperand ,
input MLOperand ,
padding optional MLPadOptions = {}); };
options
-
input : an
MLOperand
. The input tensor. -
padding : an
MLOperand
. The 2-D Tensor of integer values indicating the number of padding values to add at the beginning and end of each input dimensions. The tensor has shape [ n , 2] where n is the rank of the input tensor. For each dimension D of input , padding[D, 0] indicates how many values to add before the content in that dimension, and padding[D, 1] indicates how many values to add after the content in that dimension. -
options : an optional
MLPadOptions
. The optional parameters of the operation.-
mode : an
MLPaddingMode
. The different ways to pad the tensor. When not set, it’s assumed to be "constant". -
value : a
float
. The pad value when the options.mode is set to "constant" . When not set, it’s assumed to be 0.
-
Returns:
an
MLOperand
.
The
padded
output
tensor.
// input: [[1,2,3], [4,5,6]] const input= builder. constant( { type: 'float32' , dimensions: [ 2 , 3 ] }, new Float32Array([ 1 , 2 , 3 , 4 , 5 , 6 ])); // padding: [[1,1], [2,2]] const padding= builder. constant( { type: 'float32' , dimensions: [ 2 , 2 ] }, new Float32Array([ 1 , 1 , 2 , 2 ])); // "constant" padded: // [[0,0,0,0,0,0,0], // [0,0,1,2,3,0,0], // [0,0,4,5,6,0,0], // [0,0,0,0,0,0,0]] builder. pad( input, padding); // "edge" padded: // [[1,1,1,2,3,3,3], // [1,1,1,2,3,3,3], // [4,4,4,5,6,6,6], // [4,4,4,5,6,6,6]] builder. pad( input, padding, { mode: "edge" }); // "reflection" padded: // [[6,5,4,5,6,5,4], // [3,2,1,2,3,2,1], // [6,5,4,5,6,5,4], // [3,2,1,2,3,2,1]] builder. pad( input, padding, { mode: "reflection" }); // "symmetric" padded: // [[2,1,1,2,3,3,2], // [2,1,1,2,3,3,2], // [5,4,4,5,6,6,5], // [5,4,4,5,6,6,5]] builder. pad( input, padding, { mode: "symmetric" });
7.7.22. Pooling operations
Compute a mean , L2 norm , or max reduction operation across all the elements within the moving window over the input tensor. See the description of each type of reduction in § 7.7.23 Reduction operations .enum {
MLRoundingType ,
"floor" };
"ceil" dictionary {
MLPool2dOptions sequence <unsigned long >;
windowDimensions sequence <unsigned long >;
padding sequence <unsigned long >;
strides sequence <unsigned long >;
dilations MLAutoPad = "explicit";
autoPad MLInputOperandLayout = "nchw";
layout MLRoundingType = "floor";
roundingType sequence <unsigned long >; };
outputSizes partial interface MLGraphBuilder {MLOperand (
averagePool2d MLOperand ,
input optional MLPool2dOptions = {});
options MLOperand (
l2Pool2d MLOperand ,
input optional MLPool2dOptions = {});
options MLOperand (
maxPool2d MLOperand ,
input optional MLPool2dOptions = {}); };
options
-
input : an
MLOperand
. The input 4-D tensor. The logical shape is interpreted according to the value of options.layout . -
options : an optional
MLPool2dOptions
. The optional parameters of the operation.-
windowDimensions : a sequence of
unsigned long
of length 2. The dimensions of the sliding window, [window_height, window_width]. If not present, the window dimensions are assumed to be the height and width dimensions of the input shape. -
padding : a sequence of
unsigned long
of length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of input , [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0]. -
strides : a sequence of
unsigned long
of length 2. The stride of the sliding window for each spatial dimension of input , [stride_height, stride_width]. If not present, the values are assumed to be [1,1]. -
dilations : a sequence of
unsigned long
of length 2. The dilation factor for each spatial dimension of input , [dilation_height, dilation_width]. If not present, the values are assumed to be [1,1]. -
autoPad : an
MLAutoPad
. The automatic input padding options. By default, this argument is set to "explicit" , which means that the values in the options.padding array should be used for input padding. When the option is set other than "explicit" , the values in the options.padding array are ignored. With the "same-upper" option, the padding values are automatically computed such that the additional ending padding of the spatial input dimensions would allow all of the input values in the corresponding dimension to be filtered. The "same-lower" option is similar but padding is applied to the beginning padding of the spatial input dimensions instead of the ending one. -
layout : an
MLInputOperandLayout
. The default value is "nchw" . This option specifies the layout format of the input and output tensor as follow:"nchw":
-
input tensor: [batches, channels, height, width]
-
output tensor: [batches, channels, height, width]
"nhwc":
-
input tensor: [batches, height, width, channels]
-
output tensor: [batches, height, width, channels]
-
-
roundingType : an
MLRoundingType
. The option specifies the rounding function used to compute the output shape. -
outputSizes : a sequence of
unsigned long
of length 2. The sizes of the two spacial dimensions of the output tensor. When the output sizes are explicitly specified, the options.roundingType is ignored. If not specified, the output sizes are automatically computed.
-
Returns:
an
MLOperand
.
The
output
4-D
tensor
that
contains
the
result
of
the
reduction.
The
logical
shape
is
interpreted
according
to
the
value
of
layout
.
More
specifically,
if
the
options.roundingType
is
"floor"
,
the
spatial
dimensions
of
the
output
tensor
can
be
calculated
as
follow:
output size = floor(1 + (input size - filter size + beginning padding + ending padding) / stride)
or if options.roundingType is "ceil" :
output size = ceil(1 + (input size - filter size + beginning padding + ending padding) / stride)
// 'global' max pooling builder. maxPool2d( input);
7.7.23. Reduction operations
Reduce the input along the dimensions given in axes .dictionary {
MLReduceOptions sequence <long >=
axes null ;boolean =
keepDimensions false ; };partial interface MLGraphBuilder {MLOperand (
reduceL1 MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceL2 MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceLogSum MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceLogSumExp MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceMax MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceMean MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceMin MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceProduct MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceSum MLOperand ,
input optional MLReduceOptions = {});
options MLOperand (
reduceSumSquare MLOperand ,
input optional MLReduceOptions = {}); };
options
-
input : an
MLOperand
. The input tensor. -
options : an optional
MLReduceOptions
. The optional parameters of the operation.
Returns:
an
MLOperand
.
The
reduced
output
tensor.
Reduction types:
-
L1 : Compute the L1 norm of all the input values along the axes.
-
L2 : Compute the L2 norm of all the input values along the axes.
-
LogSum : Compute the log value of the sum of all the input values along the axes.
-
LogSumExp : Compute the log value of the sum of the exponent of all the input values along the axes.
-
Max : Compute the maximum value of all the input values along the axes.
-
Mean : Compute the average value of all the input values along the axes.
-
Min : Compute the minimum value of all the input values along the axes.
-
Product : Compute the product of all the input values along the axes.
-
Sum : Compute the sum of all the input values along the axes.
-
SumSquare : Compute the sum of the square of all the input values along the axes.
7.7.24. The relu() method
Compute the rectified linear function of the input tensor.partial interface MLGraphBuilder {MLOperand (
relu MLOperand );
x MLActivation (); };
relu
-
x : an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x . -
an
MLActivation
. The activation function representing the relu operation.
return builder. max( builder. constant( 0 ), x);
7.7.25. The resample2d() method
Resample the tensor values from the source to the destination spatial dimensions according to the scaling factors.enum {
MLInterpolationMode ,
"nearest-neighbor" };
"linear" dictionary {
MLResample2dOptions MLInterpolationMode = "nearest-neighbor";
mode sequence <float >;
scales sequence <unsigned long >;
sizes sequence <long >; };
axes partial interface MLGraphBuilder {MLOperand (
resample2d MLOperand ,
input optional MLResample2dOptions = {}); };
options
-
input : an
MLOperand
. The input 4-D tensor. -
options : an optional
MLResample2dOptions
. The optional parameters of the operation.-
mode : an
MLInterpolationMode
. The interpolation algorithm used to fill the output tensor values. If not set, it is assumed to be the Nearest Neighbor interpolation. -
scales : a sequence of
float
of length 2. Each value represents the scaling factor used to scale in each spatial dimensions of input, [scale_height, scale_width]. If not set, the values are assumed to be [1.0, 1.0]. -
sizes : a sequence of
unsigned long
of length 2. The target sizes for each spatial dimensions of input, [size_height, size_width]. When the target sizes are specified, the options.scales argument is ignored as the scaling factor values are derived from the target sizes of each spatial dimension of input. -
axes : a sequence of
long
of length 2. The two consecutive dimensions of the input tensor to which the interpolation algorithm applies. The valid values in the sequence are [0, 1], [1, 2] or [2, 3]. When not specified, the sequence is assumed to be [2, 3].
-
Returns:
an
MLOperand
.
The
output
4-D
tensor.
7.7.26. The reshape() method
Alter the shape of a tensor to a new shape. Reshape does not copy or change the content of the tensor. It just changes the tensor’s logical dimensions for the subsequent operations.partial interface MLGraphBuilder {MLOperand (
reshape MLOperand ,
input sequence <unsigned long ?>); };
newShape
-
input : an
MLOperand
. The input tensor. -
newShape : a sequence of
nullable
unsigned long
. The shape of the output tensor. The number of elements implied by newShape must be the same as the number of elements in the input tensor. Only one component of newShape can be the special value ofnull
. The size of the dimension with the valuenull
is computed so that the total size remains constant.
Returns:
an
MLOperand
.
The
output
tensor.
The
values
of
the
output
tensor
are
the
same
as
values
of
the
input
tensor.
The
shape
of
the
output
tensor
is
specified
by
the
newShape
argument.
7.7.27. The sigmoid() method
Compute the sigmoid function of the input tensor. The calculation follows the expression
1
/
(exp(-x)
+
1)
.
partial interface MLGraphBuilder {MLOperand (
sigmoid MLOperand );
x MLActivation (); };
sigmoid
-
x : an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x . -
an
MLActivation
. The activation function representing the sigmoid operation.
return builder. div( builder. constant( 1 ), builder. add( builder. exp( builder. neg( x)), builder. constant( 1 )));
7.7.28. The slice() method
Produce a slice of the input tensor.dictionary {
MLSliceOptions sequence <long >; };
axes partial interface MLGraphBuilder {MLOperand (
slice MLOperand ,
input sequence <long >,
starts sequence <long >,
sizes optional MLSliceOptions = {}); };
options
-
input : an
MLOperand
. The input tensor. -
starts : a sequence of
long
. The starting indices to slice of the corresponding axes of the input shape. A negative index value is interpreted as counting back from the end. For example, the value -1 -
sizes : a sequence of
long
. The lengths to slice of the corresponding axes of the input shape. The length value of -1 selects all the remaining elements from the starting index of the given axis. -
options : an optional
MLSliceOptions
. The optional parameters of the operation.-
axes : a sequence of
long
. The dimensions of the input shape to which starts and sizes apply. The values in the sequence are either within the [0, r -1] range where r is the input tensor rank, or the [ -r , -1] range where negative values mean counting back from the end of the input shape. When not specified, the sequence is assumed to be [0,1,.. r-1 ].
-
Returns:
an
MLOperand
.
The
output
tensor
of
the
same
rank
as
the
input
tensor
with
tensor
values
stripped
to
the
specified
starting
and
ending
indices
in
each
dimension.
7.7.29. The softmax() method
Compute the softmax values of the 2-D input tensor along axis 1.partial interface MLGraphBuilder {MLOperand (
softmax MLOperand );
x MLActivation (); };
softmax
-
x : an
MLOperand
. The input 2-D tensor.
Returns:
-
an
MLOperand
. The output 2-D tensor that contains the softmax results, of the same shape as the input tensor. -
an
MLActivation
. The activation function representing the softmax operation.
// This sample deploys a well-known implementation trick [1] to compute the // exponentials of the distances to the max value, instead of the exponentials // of the input values itself, in order to increase the numerical stability of // the result. // [1]: https://cs231n.github.io/linear-classify/#softmax const max_x= builder. reduceMax( x, { axes: [ 1 ], keepDimensions: true }); const exp_x= builder. exp( builder. sub( x, max_x)); return builder. div( exp_x, builder. reduceSum( exp_x, { axes: [ 1 ], keepDimensions: true }));
7.7.30. The softplus() method
Compute the softplus function of the input tensor. The calculation follows the expression
ln(1
+
exp(steepness
*
x))
/
steepness
.
dictionary {
MLSoftplusOptions float = 1; };
steepness partial interface MLGraphBuilder {MLOperand (
softplus MLOperand ,
x optional MLSoftplusOptions = {});
options MLActivation (
softplus optional MLSoftplusOptions = {}); };
options
-
x : an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x . -
an
MLActivation
. The activation function representing the softplus operation.
return builder. div( builder. log( builder. add( builder. exp( builder. mul( x, builder. constant( options. steepness))), builder. constant( 1 ))), builder. constant( options. steepness));
7.7.31. The softsign() method
Compute the softsign function of the input tensor. The calculation follows the expression
x
/
(1
+
|x|)
.
partial interface MLGraphBuilder {MLOperand (
softsign MLOperand );
x MLActivation (); };
softsign
-
x : an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x . -
an
MLActivation
. The activation function representing the softsign operation.
return builder. div( x, builder. add( builder. constant( 1 ), builder. abs( x)));
7.7.32. The split() method
Split the input tensor into a number of sub tensors along the given axis.dictionary {
MLSplitOptions long = 0; };
axis partial interface MLGraphBuilder {sequence <MLOperand >(
split MLOperand , (
input unsigned long or sequence <unsigned long >),
splits optional MLSplitOptions = {}); };
options
-
input : an
MLOperand
. The input tensor. -
splits : an
unsigned long
or a sequence ofunsigned long
. If anunsigned long
, it specifies the number of output tensors along the axis. The number must evenly divide the dimension size of input along options.axis . If a sequence ofunsigned long
, it specifies the sizes of each output tensor along the options.axis . The sum of sizes must equal to the dimension size of input along options.axis . -
options : an optional
MLSplitOptions
. The optional parameters of the operation.-
axis : a
long
. The dimension along which to split. Default to 0. A negative value is interpreted as counting back from the end.
-
Returns:
a
sequence
of
MLOperand
.
The
splitted
output
tensors.
If
splits
is
an
unsigned
long
,
the
length
of
the
output
sequence
equals
to
splits
.
The
shape
of
each
output
tensor
is
the
same
as
input
except
the
dimension
size
of
axis
equals
to
the
quotient
of
dividing
the
dimension
size
of
input
along
axis
by
splits
.
If
splits
is
a
sequence
of
unsigned
long
,
the
length
of
the
output
sequence
equals
to
the
length
of
splits
.
The
shape
of
the
i-th
output
tensor
is
the
same
as
as
input
except
along
axis
where
the
dimension
size
is
splits[i]
.
// This sample shows the case that the splits parameter is an array. const outputs= []; let start= 0 ; for ( const sizeof splits) { outputs. push( builder. slice( input, [ start], [ size], { axes: [ options. axis] })); start+= size; } return outputs;
7.7.33. The squeeze() method
Reduce the rank of a tensor by eliminating dimensions with size 1 of the tensor shape. Squeeze only affects the tensor’s logical dimensions. It does not copy or change the content in the tensor.dictionary {
MLSqueezeOptions sequence <long >; };
axes partial interface MLGraphBuilder {MLOperand (
squeeze MLOperand ,
input optional MLSqueezeOptions = {}); };
options
-
input : an
MLOperand
. The input tensor. -
options : an optional
MLSqueezeOptions
. The optional parameters of the operation.-
axes : a sequence of
long
. Indices to the shape dimensions of size 1 to eliminate. When not specified, every shape dimensions of size 1 in the tensor are eliminated.
-
Returns:
an
MLOperand
.
The
output
tensor
of
the
same
or
reduced
rank
with
the
shape
dimensions
of
size
1
eliminated.
7.7.34. The tanh() method
Compute the hyperbolic tangent function of the input tensor. The calculation follows the expression
(exp(2
*
x)
-
1)
/
(exp(2
*
x)
+
1)
.
partial interface MLGraphBuilder {MLOperand (
tanh MLOperand );
x MLActivation (); };
tanh
-
x : an
MLOperand
. The input tensor.
Returns:
-
an
MLOperand
. The output tensor of the same shape as x . -
an
MLActivation
. The activation function representing the tanh operation.
return builder. div( builder. sub( builder. exp( builder. mul( builder. constant( 2 ), x)), builder. constant( 1 )), builder. add( builder. exp( builder. mul( builder. constant( 2 ), x)), builder. constant( 1 )));
7.7.35. The transpose() method
Permute the dimensions of the input tensor according to the permutation argument.dictionary {
MLTransposeOptions sequence <long >; };
permutation partial interface MLGraphBuilder {MLOperand (
transpose MLOperand ,
input optional MLTransposeOptions = {}); };
options
-
input : an
MLOperand
. The input N-D tensor. -
options : an optional
MLTransposeOptions
. The optional parameters of the operation.-
permutation : a sequence of
long
values. The values used to permute the output shape. When it’s not specified, it’s set to[N-1...0]
, whereN
is the rank of the input tensor. These default values cause the output to become a transposed tensor of the input. When specified, the number of values in the sequence must be the same as the rank of the input tensor, and the values in the sequence must be within the range from 0 to N-1 with no two or more same values found in the sequence.
-
Returns:
an
MLOperand
.
The
permuted
or
transposed
N-D
tensor.
7.8. The MLGraph interface
The
MLGraph
interface
represents
a
compiled
computational
graph.
A
compiled
graph
once
constructed
is
immutable
and
cannot
be
subsequently
changed.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLGraph
MLGraph
has
the
following
internal
slots:
-
[[context]]
of typeMLContext
-
The context of type
MLContext
associated with thisMLGraph
. -
[[inputDescriptors]]
of type record <DOMString
,MLOperandDescriptor
> -
Maps the name of an input
MLOperand
to itsMLOperandDescriptor
for all inputMLOperand
s of thisMLGraph
. -
[[outputDescriptors]]
of type record <DOMString
,MLOperandDescriptor
> -
Maps the name of an output
MLOperand
to itsMLOperandDescriptor
for all outputMLOperand
s of thisMLGraph
. -
[[implementation]]
-
The underlying implementation provided by the User Agent.
7.9. The MLCommandEncoder interface
The
MLCommandEncoder
interface
represents
a
method
of
execution
that
synchronously
records
the
computational
workload
of
a
compiled
MLGraph
to
a
GPUCommandBuffer
on
the
calling
thread.
Since
the
workload
is
not
immediately
executed,
just
recorded,
this
method
allows
more
flexibility
for
the
caller
to
determine
how
and
when
the
recorded
commands
will
be
submitted
for
execution
on
the
GPU
relative
to
other
GPU
workload
on
the
same
or
different
queue.
typedef (GPUBuffer or GPUTexture );
MLGPUResource typedef record <DOMString ,MLGPUResource >; [
MLNamedGPUResources SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};
MLCommandEncoder
MLCommandEncoder
has
the
following
internal
slots:
-
[[context]]
of typeMLContext
-
The context of type
MLContext
associated with thisMLCommandEncoder
. -
[[implementation]]
-
The underlying implementation provided by the User Agent.
7.9.1. Graph Initialization
Record the initialization of the
MLGraph
.
This
is
a
necessary
step
for
optimal
performance
during
graph
execution
as
it
gives
the
platform
an
opportunity
to
prepare
and
optimize
constant
input
data
for
the
subsequent
execution
of
the
graph.
This
method
should
only
be
called
once
per
graph.
partial interface MLCommandEncoder {undefined (
initializeGraph MLGraph ); };
graph
-
graph : an
MLGraph
. The compiled graph to be initialized with graph constant inputs.
Returns:
undefined
.
MLGraphBuilder
.
constant(desc,
bufferView)
method
as
constant
operands
during
graph
construction
time.
7.9.2. Dispatch Execution Commands
Record the
MLGraph
execution
with
the
inputs
MLNamedGPUResources
and
outputs
MLNamedGPUResources
.
partial interface MLCommandEncoder {undefined (
dispatch MLGraph ,
graph MLNamedGPUResources ,
inputs MLNamedGPUResources ); };
outputs
-
graph : an
MLGraph
. The compiled graph to be executed. -
inputs : an
MLNamedGPUResources
. The resources of inputs. -
outputs : an
MLNamedGPUResources
. The pre-allocated resources of required outputs.
Returns:
undefined
.
-
If any of the following requirements are unmet, then throw a "
DataError
"DOMException
and stop.-
For each key -> value of inputs :
-
graph .
[[inputDescriptors]]
[ key ] must exist . -
Let inputDesc be graph .
[[inputDescriptors]]
[ key ]. -
If value is a
GPUBuffer
, then:-
value .
size
must equal to byte length of inputDesc .
-
-
-
For each key -> value of outputs :
-
graph .
[[outputDescriptors]]
[ key ] must exist . -
Let outputDesc be graph .
[[outputDescriptors]]
[ key ]. -
If value is a
GPUBuffer
, then:-
value .
size
must equal to byte length of outputDesc .
-
-
-
-
For each key -> value of inputs :
-
Set the input of graph .
[[implementation]]
that is associated with key to value .
-
-
For each key -> value of outputs :
-
Set the output of graph .
[[implementation]]
that is associated with key to value .
-
-
Issue a compute request of graph .
[[implementation]]
. -
If there is an error returned by graph .
[[implementation]]
, then:-
Throw an "
OperationError
"DOMException
and stop.
-
-
Return
undefined
.
7.9.3. Generate GPU Command Buffer
Complete the recording of ML workload and return a WebGPU-compatible
GPUCommandBuffer
containing
the
recorded
workload.
partial interface MLCommandEncoder {GPUCommandBuffer (
finish optional GPUCommandBufferDescriptor = {}); };
descriptor
-
descriptor : an optional
GPUCommandBufferDescriptor
. Descriptor of the command buffer.
Returns:
GPUCommandBuffer
.
8. Examples
const context= await navigator. ml. createContext({ powerPreference: 'low-power' });
constant1 ---+ +--- Add ---> intermediateOutput1 ---+ input1 ---+ | +--- Mul---> output constant2 ---+ | +--- Add ---> intermediateOutput2 ---+ input2 ---+
// Use tensors in 4 dimensions. const TENSOR_DIMS= [ 1 , 2 , 2 , 2 ]; const TENSOR_SIZE= 8 ; const builder= new MLGraphBuilder( context); // Create MLOperandDescriptor object. const desc= { type: 'float32' , dimensions: TENSOR_DIMS}; // constant1 is a constant MLOperand with the value 0.5. const constantBuffer1= new Float32Array( TENSOR_SIZE). fill( 0.5 ); const constant1= builder. constant( desc, constantBuffer1); // input1 is one of the input MLOperands. Its value will be set before execution. const input1= builder. input( 'input1' , desc); // constant2 is another constant MLOperand with the value 0.5. const constantBuffer2= new Float32Array( TENSOR_SIZE). fill( 0.5 ); const constant2= builder. constant( desc, constantBuffer2); // input2 is another input MLOperand. Its value will be set before execution. const input2= builder. input( 'input2' , desc); // intermediateOutput1 is the output of the first Add operation. const intermediateOutput1= builder. add( constant1, input1); // intermediateOutput2 is the output of the second Add operation. const intermediateOutput2= builder. add( constant2, input2); // output is the output MLOperand of the Mul operation. const output= builder. mul( intermediateOutput1, intermediateOutput2);
// Compile the constructed graph. const graph= await builder. build({ 'output' : output});
// Setup the input buffers with value 1. const inputBuffer1= new Float32Array( TENSOR_SIZE). fill( 1 ); const inputBuffer2= new Float32Array( TENSOR_SIZE). fill( 1 ); const outputBuffer= new Float32Array( TENSOR_SIZE); // Execute the compiled graph with the specified inputs. const inputs= { 'input1' : inputBuffer1, 'input2' : inputBuffer2, }; const outputs= { 'output' : outputBuffer}; await context. compute( graph, inputs, outputs); console. log( 'Output value: ' + outputBuffer); // Output value: 2.25,2.25,2.25,2.25,2.25,2.25,2.25,2.25
9. Appendices
9.1.
MLOperandType
and
ArrayBufferView
compatibility
MLOperandType
|
ArrayBufferView
|
---|---|
float32
|
Float32Array
|
int32
|
Int32Array
|
uint32
|
Uint32Array
|
int8
|
Int8Array
|
uint8
|
Uint8Array
|
clarify
the
usage
of
ArrayBufferView
for
float16
.
[Issue
#webmachinelearning/webnn#127]
10. Acknowledgements
This specification follows the concepts of the Android Neural Networks API C API.
Thanks to Tomoyuki Shimizu, Ningxin Hu, Zhiqiang Yu and Belem Zhang for the use cases.
Thanks to Nikhil Thorat, Daniel Smilkov, Ganesan Ramalingam, Rafael Cintron and Benjamin Poulain for their contributions to the API specification.
Thanks to Sangwhan Moon and the W3C Technical Architecture Group for review of this specification for web architecture fit, design consistency and developer ergonomics.
Thanks to W3C Privacy Interest Group for privacy and security review and feedback.
Thanks to Alex Gough and the Chrome Security team for security review and questions.
Thanks to Michal Karzynski for sharing practical guidelines and learnings from ONNX.
Thanks to Kaustubha Govind and Chrome privacy reviewers for feedback and privacy considerations.