1. Introduction
We’re working on this section. Meanwhile, please take a look at the explainer .
2. Use cases
2.1. Application Use Cases
This section illustrates application-level use cases for neural network inference hardware acceleration. All applications in those use cases can be built on top of pre-trained deep neural network (DNN) [models] .
2.1.1. Person Detection
A user opens a web-based video conferencing application, but she temporarily leaves from her room. The application is watching whether she is in front of her PC by using object detection (for example, using object detection approaches such as [SSD] or [YOLO] that use a single DNN) to detect regions in a camera input frame that include persons.
When she comes back, the application automatically detects her and notifies other online users that she is active now.
2.1.2. Semantic Segmentation
A user joins a teleconference via a web-based video conferencing application at her desk since no meeting room in her office is available. During the teleconference, she does not wish that her room and people in the background are visible. To protect the privacy of the other people and the surroundings, the application runs a machine learning model such as [DeepLabv3+] or [MaskR-CNN] to semantically split an image into segments and replaces segments that represent other people and background with another picture.
2.1.3. Skeleton Detection
A web-based video conferencing application tracks a pose of user’s skeleton by running a machine learning model, which allows for real-time human pose estimation, such as [PoseNet] to recognize her gesture and body language. When she raises her hand, her microphone is automatically unmuted and she can start speaking on the teleconference.
2.1.4. Face Recognition
There are multiple people in the conference room and they join an online meeting using a web-based video conferencing application. The application detects faces of participants by using object detection (for example, using object detection approaches such as [SSD] ) and checks whether each face was present at the previous meeting or not by running a machine learning model such as [FaceNet] , which verifies whether two faces would be identical or not.
2.1.5. Facial Landmark Detection
A user wants to find new glasses that beautifully fits her on an online glasses store. The online store offers web-based try-on simulator that runs a machine learning model such as Face Alignment Network [FAN] to detect facial landmarks like eyes, nose, mouth, etc. When she chooses a pair of glasses, the simulator properly renders the selected glasses on the detected position of eyes on her facial image.
2.1.6. Style Transfer
A user is looking for cosmetics on an online store and wondering which color may fit her face. The online store shows sample facial makeup images of cosmetics, and offers makeup simulator that runs a machine learning model like [ContextualLoss] or [PairedCycleGAN] to transfer the makeup style of the sample makeup image to her facial image. She can check how the selected makeup looks like on her face by the simulator.
2.1.7. Super Resolution
A web-based video conferencing is receiving a video stream from its peer, but the resolution of the video becomes lower due to network congestion. To prevent degradation of the perceived video quality, the application runs a machine learning model for super-resolution such as [SRGAN] to generate higher-resolution video frames.
2.1.8. Image Captioning
For better accessibility, a web-based presentation application provides automatic image captioning by running a machine learning model such as [im2txt] which predicts explanatory words of the presentation slides.
2.1.9. Machine Translation
Multiple people from various countries are talking via a web-based real-time text chat application. The application translates their conversation by using a machine learning model such as [GNMT] or [OpenNMT] , which translates every text into different language.
2.1.10. Emotion Analysis
A user is talking to her friend via a web-based real-time text chat application, and she is wondering how the friend feels because she cannot see the friend’s face. The application analyses the friend’s emotion by using a machine learning model such as [DeepMoji] , which infers emotion from input texts, and displays an emoji that represents the estimated emotion.
2.1.11. Video Summarization
A web-based video conferencing application records received video streams, and it needs to reduce recorded video data to be stored. The application generates the short version of the recorded video by using a machine learning model for video summarization such as [Video-Summarization-with-LSTM] .
2.1.12. Noise Suppression
A web-based video conferencing application records received audio streams, but usually the background noise is everywhere. The application leverages real-time noise suppression using Recurrent Neural Network such as [RNNoise] for suppressing background dynamic noise like baby cry or dog barking to improve audio experiences in video conferences.
2.1.13.
Detecting
fake
video
Fake
Video
A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. The fake video can swap the speaker’s face into the president’s face to incite a user politically or to manipulate user’s opinion. The deepfake detection applications such as [FaceForensics++] analyze the videos and protect a user against the fake videos or images. When she watches a fake video on the web, the detection application alerts her of the fraud video in real-time.
2.2. Framework Use Cases
This section collects framework-level use cases for a dedicated low-level API for neural network inference hardware acceleration. It is expected that Machine Learning frameworks will be key consumers of the Web Neural Network API (WebNN API) and the low-level details exposed through the WebNN API are abstracted out from typical web developers. However, it is also expected that web developers with specific interest and competence in Machine Learning will want to interface with the WebNN API directly instead of a higher-level ML framework.
2.2.1. Custom Layer
A web application developer wants to run a DNN model on the WebNN API. However, she has found that some of activation functions like [LeakyReLU] , [ELU] , etc. are not included in the WebNN API. To address this issue, she constructs custom layers of the additional activation functions on top of the WebNN API. Note that the scope of custom layers may include convolution, normalization, etc. as well as activation.
2.2.2. Network Concatenation
A web application uses a DNN model, and its model data of upper convolutional layers and lower fully-connected layers are stored in separate files, since model data of the fully-connected layers are periodically updated due to fine tuning at the server side.
Therefore, the application downloads both partial model files at first and concatenates them into a single model. When the model is updated, the application downloads fine-tuned part of the model and replace only the fully-connected layers with it.
2.2.3. Performance Adaptation
A
web
application
developer
has
a
concern
about
performance
of
her
DNN
model.
The
model
needs
to
run
on
both
a
mobile
devices.
device
with
a
low
power
CPU
as
well
as
on
a
laptop
with
a
powerful
CPU,
GPU
and
a
dedicated
AI
accelerator.
She
has
confirmed
that
it
the
model
may
run
too
slow
on
the
mobile
devices
device
which
do
does
not
have
GPU
acceleration.
To
address
this
issue,
her
web
application
refers
to
the
WebNN
API
to
confirm
whether
acceleration
is
available
or
not,
so
that
the
application
can
display
the
a
warning
for
devices
without
acceleration.
After
several
weeks,
she
has
developed
a
tiny
DNN
model
that
can
even
run
on
a
CPU.
In
order
to
accommodate
CPU
execution,
she
modifies
the
application
so
that
the
application
loads
the
tiny
model
in
the
case
of
CPU-only
devices.
When executing the DNN model on a laptop with a more powerful CPU, GPU and a dedicated AI accelerator, she wants to use the execution device that minimizes the inference time. To address this issue, she runs the model on each execution device and measures the inference time for each test run. This information helps her release a web application that provides the best possible user experience given available hardware.
2.2.4. Operation Level Execution
A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device.
3. Security Considerations
This API is disabled by default in all cross-origin frames using the § 6.2.1 Permissions Policy Integration . This prevents third-party content from using this API unless the embedding page explicitly sets a policy that grants permission.
This
API
allows
creation
of
an
MLContext
from
a
GPUDevice
or
WebGLRenderingContext
defined
by
WebGPU
and
WebGL
specifications
respectively.
See
WebGPU
Security
Considerations
and
WebGL
Security
Consideration
for
more
information
regarding
security
characteristics
of
these
contexts.
4. Privacy Considerations
This API enhances privacy compared to cloud-based inference, since input data such as locally sourced images or video streams stay within the browser’s sandbox.
This API exposes the minimum amount of information necessary to address the identified § 2 Use cases for the best performance and reliability of results.
No information from the underlying platform is exposed directly. An execution time analysis may reveal indirectly the performance of the underlying platform’s neural network hardware acceleration capabilities relative to another underlying platform.
Note: The group is soliciting further input on the proposed execution time analysis fingerprinting vector and will augment this section with more information and mitigations to inform the implementers of this API.
Implementers of this API are expected to be familiar with the WebGPU Privacy Considerations .
5. Programming Model
5.1. Overview
At the heart of neural networks is a computational graph of mathematical operations. These operations are the building blocks of modern machine learning technologies in computer vision, natural language processing, and robotics. The WebNN API is a specification for constructing, compiling, and executing computational graphs of neural networks.
The
MLGraph
interface
represents
a
compiled
computational
graph
(that
is,
a
model)
and
exposes
a
compute
method
to
perform
inference.
The
MLGraphBuilder
interface
serves
as
a
builder
(factory)
to
create
a
MLGraph
.
An
MLOperand
is
a
representation
of
data
that
flows
within
the
computational
graph,
which
include
input-values
for
inference,
constants
(including
trained
weights)
used
for
inference,
intermediate
values
(often
referred
to
as
activations)
computed
during
inference,
as
well
as
the
output
values
of
inference.
At
inference
time,
every
MLOperand
will
be
bound
to
a
tensor
(the
actual
data).
The
MLGraphBuilder
interface
enables
the
creation
of
MLOperand
s.
A
key
part
of
the
MLGraphBuilder
interface
are
the
operations
(such
as
gemm()
and
softmax()
).
The
operations
have
a
functional
semantics,
with
no
side
effects.
Each
operation
invocation
conceptually
returns
a
distinct
new
value,
without
changing
the
value
of
any
other
MLOperand
.
The
build()
method
of
the
MLGraphBuilder
interface
is
used
to
compile
and
optimize
the
computation
graph
used
to
compute
one
or
more
specified
outputs.
The
key
purpose
of
the
compilation
step
is
to
enable
optimizations
that
span
two
or
more
operations,
such
as
operation
or
loop
fusion.
The
compute()
method
of
the
MLGraph
interface
is
used
to
execute
the
compiled
computation
graph
(to
perform
inference).
The
caller
supplies
the
input
values
using
MLNamedInputs
,
binding
the
input
MLOperand
s
to
their
values.
The
caller
supplies
pre-allocated
buffers
for
output
MLOperand
s
using
MLNamedOutputs
.
The
runtime
values
(of
MLOperand
s)
are
tensors,
which
are
essentially
multidimensional
arrays.
The
representation
of
the
tensors
is
implementation
dependent,
but
it
typically
includes
the
array
data
stored
in
some
buffer
(memory)
and
some
metadata
describing
the
array
data
(such
as
its
shape).
As mentioned above, the operations have a functional semantics. This allows the implementation to potentially share the array data between multiple tensors. For example, the implementation of operations such as reshape, or slice, or squeeze may return a view of its input tensor that shares the same buffer as the input tensor. (In the case of reshape or squeeze, the entire data is shared, while in the case of slice, a part of the input data is shared.) The implementation may use views, as above, for intermediate values.
5.2. Device Selection
An
MLContext
interface
represents
a
global
state
of
neural
network
execution.
One
of
the
important
context
states
is
the
underlying
execution
device
that
manages
the
resources
and
facilitates
the
compilation
and
the
eventual
execution
of
the
neural
network
graph.
An
MLContext
could
be
created
from
a
specific
GPU
device
such
as
GPUDevice
or
WebGLRenderingContext
that
is
already
in
use
by
the
application,
in
which
case
the
corresponding
GPUBuffer
or
WebGLBuffer
resources
used
as
graph
constants,
as
well
as
the
GPUTexture
and
WebGLTexture
as
graph
inputs
must
also
be
created
from
the
same
device.
In
a
multi-adapter
configuration,
the
device
used
for
MLContext
must
be
created
from
the
same
adapter
as
the
device
used
to
allocate
the
resources
referenced
in
the
graph.
In
a
situation
when
a
GPU
context
executes
a
graph
with
a
constant
or
an
input
in
the
system
memory
as
an
ArrayBufferView
,
the
input
content
is
automatically
uploaded
from
the
system
memory
to
the
GPU
memory,
and
downloaded
back
to
the
system
memory
of
an
ArrayBufferView
output
buffer
at
the
end
of
the
graph
execution.
This
data
upload
and
download
cycles
will
only
occur
whenever
the
execution
device
requires
the
data
to
be
copied
out
of
and
back
into
the
system
memory,
such
as
in
the
case
of
the
GPU.
It
doesn’t
occur
when
the
device
is
a
CPU
device.
Additionally,
the
result
of
the
graph
execution
is
in
a
known
layout
format.
While
the
execution
may
be
optimized
for
a
native
memory
access
pattern
in
an
intermediate
result
within
the
graph,
the
output
of
the
last
operation
of
the
graph
must
convert
the
content
back
to
a
known
layout
format
at
the
end
of
the
graph
in
order
to
maintain
the
expected
behavior
from
the
caller’s
perspective.
When
an
MLContext
is
created
with
MLContextOptions
,
the
user
agent
selects
and
creates
the
underlying
execution
device
by
taking
into
account
the
application’s
preference
specified
in
the
MLPowerPreference
and
the
MLDevicePreference
options:
-
The "gpu" device provides the broadest range of achievable performance across graphics hardware platforms from consumer devices to professional workstations.
-
The "cpu" device provides the broadest reach of software compute availability, but with limited scalability of execution performance on the more complex neural networks.
-
When the device preference is not specified ( "default" ), the user agent selects the most suitable device to use.
The following table summarizes the types of resource supported by the device selected.
| Device Type | ArrayBufferView | GPUBuffer | GPUTexture | WebGLBuffer | WebGLTexture |
|---|---|---|---|---|---|
| GPUDevice | Yes | Yes | Yes | No | No |
| WebGLRenderingContext | Yes | No | No | Yes | Yes |
| default | Yes | No | No | No | No |
| gpu | Yes | No | No | No | No |
| cpu | Yes | No | No | No | No |
6. API
6.1. navigator.ml
A
ML
object
is
available
in
the
Window
and
DedicatedWorkerGlobalScope
contexts
through
the
Navigator
and
WorkerNavigator
interfaces
respectively
and
is
exposed
via
navigator.ml
:
interface mixin {NavigatorML [;[SecureContext ,SameObject ]readonly attribute ML ; };ml Navigator includes NavigatorML ;WorkerNavigator includes NavigatorML ;
6.2. ML
enum {MLDevicePreference ,"default" ,"gpu" };"cpu" enum { // Let the user agent select the most suitable behavior.MLPowerPreference , // Prioritizes execution speed over power consumption."default" , // Prioritizes power consumption over other considerations such as execution speed."high-performance" };"low-power" dictionary { // Preferred kind of device usedMLContextOptions MLDevicePreference = "default"; // Preference as related to power consumptiondevicePreference MLPowerPreference = "default"; };powerPreference [)][SecureContext ,Exposed =(Window ,DedicatedWorker )]interface { // Create a context with optionsML MLContext (createContext optional MLContextOptions = {}); // Create a context from WebGL rendering contextoptions MLContext (createContext WebGLRenderingContext ); // Create a context from WebGPU deviceglContext MLContext (createContext GPUDevice ); };gpuDevice
The
createContext()
method
steps
are:
-
If the responsible document is not allowed to use the webnn feature, then throw a "
SecurityError"DOMExceptionand abort these steps. -
Let context be a new
MLContextobject. -
Switch on the method’s first argument:
-
MLContextOptions - Set context ’s context type to default .
-
WebGLRenderingContext - Set context ’s context type to webgl .
-
GPUDevice - Set context ’s context type to webgpu .
- Otherwise
- Set context ’s context type to default .
-
-
Return context .
6.2.1. Permissions Policy Integration
This
specification
defines
a
policy-controlled
feature
identified
by
the
string
"
webnn
".
Its
default
allowlist
is
'self'
.
6.3. MLContext
The
MLContext
interface
represents
a
global
state
of
neural
network
compute
workload
and
execution
processes.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};MLContext
The
context
type
for
an
MLContext
is
either
"
default
",
"
webgl
"
or
"
webgpu
".
6.4. MLOperandDescriptor
enum {MLInputOperandLayout ,"nchw" };"nhwc" enum {MLOperandType ,"float32" ,"float16" ,"int32" ,"uint32" ,"int8" };"uint8" dictionary { // The operand type.MLOperandDescriptor required MLOperandType ; // The dimensions field is only required for tensor operands. // The negative value means an unknown dimension.type ;sequence <long >; };dimensions
6.5. MLOperand
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};MLOperand
6.6. MLOperator
The
MLOperator
interface
defines
a
type
of
operation
such
as
the
various
types
of
activation
function
used
to
create
other
operations
such
as
§ 6.7.4
conv2d
or
§ 6.7.1
batchNormalization
.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};MLOperator
6.7. MLGraphBuilder
The
MLGraphBuilder
interface
defines
a
set
of
operations
as
identified
by
the
§ 2
Use
cases
that
can
be
composed
into
a
computational
graph.
It
also
represents
the
intermediate
state
of
a
graph
building
session.
typedef record <DOMString ,MLOperand >;MLNamedOperands dictionary {MLBufferResourceView required (WebGLBuffer or GPUBuffer );resource = 0; ;unsigned long long = 0;offset unsigned long long ; };size ;typedef (ArrayBufferView or MLBufferResourceView );MLBufferView [)][SecureContext ,Exposed =(Window ,DedicatedWorker )]interface { // Construct the graph builder from the context.MLGraphBuilder (constructor MLContext ); // Create an operand for a graph input.context );MLOperand (input DOMString ,name MLOperandDescriptor ); // Create an operand for a graph constant.desc MLOperand (constant MLOperandDescriptor ,desc MLBufferView ); // Create a single-value operand from the specified number of the specified type.bufferView = "float32");MLOperand (constant double ,value optional MLOperandType = "float32"); // Compile the graph up to the specified output operandstype MLGraph (build MLNamedOperands ); };outputs
6.7.1. batchNormalization
Normalize the tensor values of input features across the batch dimension using [Batch-Normalization] . For each input feature, the mean and variance values of that feature supplied in this calculation as parameters are previously computed across the batch dimension of the input during the model training phase of this operation.dictionary {MLBatchNormalizationOptions MLOperand ;scale MLOperand ;bias = 1; = 1e-5;long = 1;axis float = 1e-5;epsilon MLOperator ; };activation partial interface MLGraphBuilder {MLOperand (batchNormalization MLOperand ,input MLOperand ,mean MLOperand ,variance optional MLBatchNormalizationOptions = {}); };options
-
input : an
MLOperand. The input N-D tensor. -
mean : an
MLOperand. The 1-D tensor of the mean values of the input features across the batch whose length is equal to the size of the input dimension denoted by options.axis . -
variance : an
MLOperand. The 1-D tensor of the variance values of the input features across the batch whose length is equal to the size of the input dimension denoted by options.axis . -
options : an optional
MLBatchNormalizationOptions. The optional parameters of the operation.-
scale : an
MLOperand. The 1-D tensor of the scaling values whose length is equal to the size of the input dimension denoted by options.axis . -
bias : an
MLOperand. The 1-D tensor of the bias values whose length is equal to the size of the input dimension denoted by options.axis . -
axis : a
longscalar. The index to the feature count dimension of the input shape for which the mean and variance values are. When it’s not specified, the default value is 1. -
epsilon : a
floatscalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified. -
activation : an
MLOperator. The optional activation function that immediately follows the normalization operation.
-
Returns:
an
MLOperand
.
The
batch-normalized
N-D
tensor
of
the
same
shape
as
the
input
tensor.
When input is a 4-D tensor of the "nchw" or "nhwc" layout, options.axis should be set to 1 or 3 respectively. The axis value designates the feature or channel count dimension of the input tensor.
const shape= [ 1 , - 1 , 1 , 1 ]; return builder. relu( builder. add( builder. mul( builder. reshape( options. scale, shape), builder. div( builder. sub( input, builder. reshape( mean, shape)), builder. pow( builder. add( builder. reshape( variance, shape), builder. constant( options. epsilon)), builder. constant( 0.5 )) )), builder. reshape( options. bias, shape)));
6.7.2. clamp
Clamp the input tensor element-wise within a range specified by the minimum and maximum values.dictionary {MLClampOptions MLOperand ;minValue MLOperand ; };maxValue partial interface MLGraphBuilder {MLOperand (clamp MLOperand ,x optional MLClampOptions = {});options MLOperator (clamp optional MLClampOptions = {}); };options
-
x : an
MLOperand. The input tensor. -
options : an optional
MLClampOptions. The optional parameters of the operation.-
minValue : an
MLOperand. Specifies the minimum values of the range. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape of x according to [numpy-broadcasting-rule] . When it is not specified, the clamping is not performed on the lower limit of the range. -
maxValue : an
MLOperand. Specifies the maximum values of the range. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape of x according to [numpy-broadcasting-rule] . When it is not specified, the clamping is not performed on the upper limit of the range.
-
Returns:
-
an
MLOperand. The output tensor of the same shape as x . -
an
MLOperator. The operator representing the clamp operation.
Clamp the input tensor element-wise within a range specified by minValue and maxValue . The calculation follows the expression min(max(x, minValue), maxValue). When minValue is not specified, the clamping is not performed on the lower limit. When maxValue is not specified, the clamping is not performed on the upper limit.
if ( options. minValue=== undefined ) { if ( options. maxValue=== undefined ) { return x; } else { return builder. min( x, options. maxValue); } } else { if ( options. maxValue=== undefined ) { return builder. max( x, options. minValue); } else { return builder. min( builder. max( x, options. minValue), options. maxValue); } }
6.7.3. concat
Concatenates the input tensors along a given axis.partial interface MLGraphBuilder {);MLOperand (concat sequence <MLOperand >,inputs long ); };axis
-
inputs : a sequence of
MLOperand. All input tensors must have the same shape, except for the size of the dimension to concatenate on. -
axis : a
longscalar. The axis that the inputs concatenate along, with the value in the interval [0, N) where N is the rank of all the inputs.
Returns:
an
MLOperand
.
The
concatenated
tensor
of
all
the
inputs
along
the
axis
.
The
output
tensor
has
the
same
shape
except
on
the
dimension
that
all
the
inputs
concatenated
along.
The
size
of
that
dimension
is
computed
as
the
sum
of
all
the
input
sizes
of
the
same
dimension.
6.7.4. conv2d
Compute a 2-D convolution given 4-D input and filter tensorsenum {MLFilterOperandLayout ,"oihw" ,"hwio" ,"ohwi" };"ihwo" enum {MLAutoPad ,"explicit" ,"same-upper" };"same-lower" dictionary {MLConv2dOptions ; ; ; ; ;sequence <long >;padding sequence <long >;strides sequence <long >;dilations sequence <long >;outputPadding sequence <long >;outputSizes MLAutoPad = "explicit";autoPad ; = 1;boolean =transpose false ;long = 1;groups MLInputOperandLayout = "nchw";inputLayout MLFilterOperandLayout = "oihw";filterLayout MLOperand ;bias MLOperator ; };activation partial interface MLGraphBuilder {MLOperand (conv2d MLOperand ,input MLOperand ,filter optional MLConv2dOptions = {}); };options
-
input : an
MLOperand. The input 4-D tensor. The logical shape is interpreted according to the value of options.inputLayout . -
filter : an
MLOperand. The filter 4-D tensor. The logical shape is interpreted according to the value of options.filterLayout and options.groups . -
options : an optional
MLConv2dOptions. The optional parameters of the operation.-
padding : a sequence of
longof length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of input , [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0]. -
strides : a sequence of
longof length 2. The stride of the sliding window for each spatial dimension of input , [stride_height, stride_width]. If not present, the values are assumed to be [1,1]. -
dilations : a sequence of
longof length 2. The dilation factor for each spatial dimension of input , [dilation_height, dilation_width]. If not present, the values are assumed to be [1,1]. -
outputPadding : a sequence of
longof length 2. The padding values applied to each spatial dimension of the output tensor when options.transpose is set to true. This explicit padding values are needed to disambiguate the output tensor shape for transposed convolution when the value of the options.strides is greater than 1. Note that these values are only used to disambiguate output shape when needed; it does not necessarily cause any padding value to be written to the output tensor. If not specified, the values are assumed to be [0,0]. -
outputSizes : a sequence of
longof length 2. The sizes of the last two dimensions of the output tensor when options.transpose is set to true. When the output sizes are explicitly specified, the output padding values in options.outputPadding are ignored. If not specified, the output sizes are automatically computed. -
autoPad : an
MLAutoPad. The automatic input padding options. By default, this argument is set to "explicit" , which means that the values in the options.padding array should be used for input padding. When the option is set other than "explicit" , the values in the options.padding array are ignored. With the "same-upper" option, the padding values are automatically computed such that the additional ending padding of the spatial input dimensions would allow all of the input values in the corresponding dimension to be filtered. The "same-lower" option is similar but padding is applied to the beginning padding of the spatial input dimensions instead of the ending one. -
transpose : a
booleanindicating that a transposed convolution operation is performed. Transposed convolution is used in upsampling networks to increase the resolution of a feature as opposed to the typical convolution process that reduces the feature’s resolution. When transposed convolution is performed, options.outputPadding may be needed to disambiguate the output tensor shape. If not present, this option is assumed to be false. -
groups : a
longscalar. The number of groups that input channels and output channels are divided into, default to 1. -
inputLayout : an
MLInputOperandLayout. The default value is "nchw" . This option specifies the layout format of the input and output tensor as follow:"nchw":
-
input tensor: [batches, input_channels, height, width]
-
output tensor: [batches, output_channels, height, width]
"nhwc":
-
input tensor: [batches, height, width, input_channels]
-
output tensor: [batches, height, width, output_channels]
-
-
filterLayout : a
MLFilterOperandLayout. The default value is "oihw" . This option specifies the layout format of the filter tensor as follow:"oihw":
-
[output_channels, input_channels/groups, height, width]
"hwio":
-
[height, width, input_channels/groups, output_channels]
"ohwi":
-
[output_channels, height, width, input_channels/groups]
"ihwo":
-
[input_channels/groups, height, width, output_channels]
-
-
bias : an
MLOperand. The additional 1-D tensor with the shape of [output_channels] whose values are to be added to the convolution result. -
activation : an
MLOperator. The optional activation function that immediately follows the convolution operation.
-
Returns:
an
MLOperand
.
The
output
4-D
tensor
that
contains
the
convolution
result.
The
output
shape
is
interpreted
according
to
the
options.inputLayout
value.
More
specifically,
the
spatial
dimensions
or
the
sizes
of
the
last
two
dimensions
of
the
output
tensor
for
the
nchw
input
layout
can
be
calculated
as
follow:
output size = 1 + (input size - filter size + beginning padding + ending padding) / stride
Whereas for the transposed convolution case with options.transpose set to true , unless the options.outputSizes values are explicitly specified, the options.outputPadding may be needed to compute the spatial dimension values of the output tensor as follow:
output size = (input size - 1) * stride + filter size - beginning padding - ending padding + output padding
6.7.5. element-wise binary operations
Compute the element-wise binary addition, subtraction, multiplication, division, maximum and minimum of the two input tensors.partial interface MLGraphBuilder {MLOperand (add MLOperand ,a MLOperand );b MLOperand (sub MLOperand ,a MLOperand );b MLOperand (mul MLOperand ,a MLOperand );b MLOperand (div MLOperand ,a MLOperand );b MLOperand (max MLOperand ,a MLOperand );b MLOperand (min MLOperand ,a MLOperand );b MLOperand (pow MLOperand ,a MLOperand ); };b
Returns:
an
MLOperand
.
The
output
tensor
that
contains
the
result
of
element-wise
binary
operation
of
the
two
input
tensors.
The element-wise binary operation will be broadcasted according to [numpy-broadcasting-rule] . The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.
Operation types:
-
add : Add the values of the two input tensors, element-wise.
-
sub : Subtract the values of the second input tensor from the values of the first input tensor, element-wise.
-
mul : Multiply the values of the two input tensors, element-wise.
-
div : Divide the values of the first input tensor with the values of the second tensor, element-wise.
-
max : Select the greater values of the two input tensors, element-wise.
-
min : Select the lesser values of the two input tensors, element-wise.
-
pow : Compute the values of the values of the first input tensor to the power of the values of the second input tensor, element-wise.
6.7.6. element-wise unary operations
Compute the element-wise unary operation for input tensor.partial interface MLGraphBuilder {MLOperand (abs MLOperand );x MLOperand (ceil MLOperand );x MLOperand (cos MLOperand );x MLOperand (exp MLOperand );x MLOperand (floor MLOperand );x MLOperand (log MLOperand );x MLOperand (neg MLOperand );x MLOperand (sin MLOperand );x MLOperand (tan MLOperand ); };x
-
x : an
MLOperand. The input tensor.
Returns:
an
MLOperand
.
The
output
tensor
that
contains
the
result
of
element-wise
unary
operation
of
the
input
tensor.
The
shape
of
the
output
tensor
is
the
same
as
the
shape
of
input
tensor.
Operation types:
-
abs : Compute the absolute value of the input tensor, element-wise.
-
ceil : Compute the ceiling of the input tensor, element-wise.
-
cos : Compute the cosine of the input tensor, element-wise.
-
exp : Compute the exponential of the input tensor, element-wise.
-
floor : Compute the floor of the input tensor, element-wise.
-
log : Compute the natural logarithm of the input tensor, element-wise.
-
neg : Compute the numerical negative value of the input tensor, element-wise.
-
sin : Compute the sine of the input tensor, element-wise.
-
tan : Compute the tangent of the input tensor, element-wise.
6.7.7. elu
Calculate the exponential linear unit function on the input tensor element-wise. The calculation follows the expression
max(0,
x)
+
alpha
*
(exp(min(0,
x))
-
1)
.
dictionary {MLEluOptions = 1;float = 1; };alpha partial interface MLGraphBuilder {MLOperand (elu MLOperand ,x optional MLEluOptions = {});options MLOperator (elu optional MLEluOptions = {}); };options
-
x : an
MLOperand. The input tensor. -
options : an optional
MLEluOptions. The optional parameters of the operation.-
alpha : a
floatscalar multiplier, default to 1.
-
Returns:
-
an
MLOperand. The output tensor of the same shape as x . -
an
MLOperator. The operator representing the elu operation.
return builder. add( builder. max( 0 , x), builder. mul( builder. constant( options. alpha), builder. sub( builder. exp( builder. min( builder. constant( 0 ), x)), builder. constant( 1 ))));
6.7.8. gemm
Calculate the general matrix multiplication of the Basic Linear Algebra Subprograms . The calculation follows the expression
alpha
*
A
*
B
+
beta
*
C
,
where
A
is
a
2-D
tensor
with
shape
[M,
K]
or
[K,
M],
B
is
a
2-D
tensor
with
shape
[K,
N]
or
[N,
K],
and
C
is
broadcastable
to
the
shape
[M,
N].
A
and
B
may
optionally
be
transposed
prior
to
the
calculation.
dictionary {MLGemmOptions MLOperand ;c = 1.0; = 1.0; ; ;float = 1.0;alpha float = 1.0;beta boolean =aTranspose false ;boolean =bTranspose false ; };partial interface MLGraphBuilder {MLOperand (gemm MLOperand ,a MLOperand ,b optional MLGemmOptions = {}); };options
-
a : an
MLOperand. The first input 2-D tensor with shape [M, K] if aTranspose is false, or [K, M] if aTranspose is true. -
b : an
MLOperand. The second input 2-D tensor with shape [K, N] if bTranspose is false, or [N, K] if bTranspose is true. -
options : an optional
MLGemmOptions. The optional parameters of the operation.-
c : an
MLOperand. The third input tensor. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape [M, N] according to [numpy-broadcasting-rule] . When it is not specified, the computation is done as if c is a scalar 0.0. -
alpha : a
floatscalar multiplier for the first input, default to 1.0. -
beta : a
floatscalar multiplier for the third input, default to 1.0. -
aTranspose : a
booleanindicating if the first input should be transposed prior to calculating the output, default to false. -
bTranspose : a
booleanindicating if the second input should be transposed prior to calculating the output, default to false.
-
Returns:
an
MLOperand
.
The
output
2-D
tensor
of
shape
[M,
N]
that
contains
the
calculated
product
of
all
the
inputs.
if ( options. aTranspose) a= builder. transpose( a); if ( options. bTranspose) b= builder. transpose( b); let ab= builder. matmul( builder. mul( builder. constant( options. alpha), a), b); return ( c? builder. add( ab, builder. mul( builder. constant( options. beta), c)) : ab);
6.7.9. gru
Gated Recurrent Unit [GRU] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of the Networkenum {MLRecurrentNetworkWeightLayout , // update-reset-new gate ordering"zrn" // reset-update-new gate ordering };"rzn" enum {MLRecurrentNetworkDirection ,"forward" ,"backward" };"both" dictionary {MLGruOptions MLOperand ;bias MLOperand ;recurrentBias MLOperand ;initialHiddenState ; ;boolean =resetAfter true ;boolean =returnSequence false ;MLRecurrentNetworkDirection = "forward";direction MLRecurrentNetworkWeightLayout = "zrn";layout ;sequence <MLOperator >; };activations partial interface MLGraphBuilder {, = {});sequence <MLOperand >(gru MLOperand ,input MLOperand ,weight MLOperand ,recurrentWeight long ,steps long ,hiddenSize optional MLGruOptions = {}); };options
-
input : an
MLOperand. The input 3-D tensor of shape [steps, batch_size, input_size]. -
weight : an
MLOperand. The 3-D input weight tensor of shape [num_directions, 3 * hidden_size, input_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the layout argument. -
recurrentWeight : an
MLOperand. The 3-D recurrent weight tensor of shape [num_directions, 3 * hidden_size, hidden_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the layout argument. -
steps : a
longscalar. The number of time steps in the recurrent network. The value must be greater than 0. -
hiddenSize : a
longscalar. The value of the third dimension of the cell output tensor shape. It indicates the number of features in the hidden state. -
options : an optional
MLGruOptions. The optional parameters of the operation.-
bias : an
MLOperand. The 2-D input bias tensor of shape [num_directions, 3 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
recurrentBias : an
MLOperand. The 2-D recurrent bias tensor of shape [num_directions, 3 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
initialHiddenState : an
MLOperand. The 3-D initial hidden state tensor of shape [num_directions, batch_size, hidden_size]. When not specified, it’s assumed to be a tensor filled with zero. -
resetAfter : a
booleanindicating whether to apply the reset gate after or before matrix multiplication. Default to true. -
returnSequence : a
booleanindicating whether to also return the entire sequence with every cell output from each time step in it in addition to the cell output of the last time step. Default to false. -
direction : a
MLRecurrentNetworkDirection. The processing direction of the input sequence. When set to "both" , the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. -
layout : a
MLRecurrentNetworkWeightLayout. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the update (z) , reset (r) , and new (n) gate, as indicated in the second dimension of the weight and bias tensor shape. When not specified, the default layout is "zrn" . -
activations : a sequence of
MLOperator. A pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, it’s assumed to be the sigmoid ( "sigmoid" ) and the hyperbolic tangent ( "tanh" ) function respectively.
-
Returns:
a
sequence
of
MLOperand
.
The
first
element
of
the
sequence
is
a
3-D
tensor
of
shape
[num_directions,
batch_size,
hidden_size],
the
cell
output
from
the
last
time
step
of
the
network.
Additionally,
if
returnSequence
is
set
to
true,
the
second
element
is
the
4-D
output
tensor
of
shape
[steps,
num_directions,
batch_size,
hidden_size]
containing
every
cell
outputs
from
each
time
step
in
the
temporal
sequence.
const numDirections= ( options. direction== "both" ? 2 : 1 ); let hiddenState= options. initialHiddenState; if ( ! hiddenState) { const desc= { type: 'float32' , dimensions: [ numDirections, 1 , hiddenSize] }; const totalSize= numDirections* hiddenSize; hiddenStatehiddenState= builder. constant( desc, new Float32Array( totalSize). fill( 0 )); } let sequence= null ; let cellWeight= []; let cellRecurrentWeight= []; let cellBias= []; let cellRecurrentBias= []; for ( let slot= 0 ; slot< numDirections; ++ slot) { cellWeight. push( builder. squeeze( builder. slice( weight, [ slot, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); cellRecurrentWeight. push( builder. squeeze( builder. slice( recurrentWeight, [ slot, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); cellBias. push( options. bias? ( builder. squeeze( builder. slice( options. bias, [ slot, 0 ], [ 1 , - 1 ]), { axes: [ 0 ] })) : null ); cellRecurrentBias. push( options. recurrentBias? ( builder. squeeze( builder. slice( options. recurrentBias, [ slot, 0 ], [ 1 , - 1 ]), { axes: [ 0 ] })) : null ); } for ( let step= 0 ; step< steps; ++ step) { let cellHidden= []; let cellOutput= null ; for ( let slot= 0 ; slot< numDirections; ++ slot) { cellHidden. push( builder. squeeze( builder. slice( hiddenState, [ slot, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); } for ( let slot= 0 ; slot< numDirections; ++ slot) { let slice= ( slot== 1 || options. direction== "backward" ? steps- step- 1 : step); let cellInput= builder. squeeze( builder. slice( input, [ slice, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] }); let result= builder. reshape( builder. gruCell( cellInput, cellWeight[ slot], cellRecurrentWeight[ slot], cellHidden[ slot], hiddenSize, { bias: cellBias[ slot], recurrentBias: cellRecurrentBias[ slot], resetAfter: options. resetAfter, layout: options. layout, activations: options. activations}), [ 1 , - 1 , hiddenSize]); cellOutput= ( cellOutput? builder. concat([ cellOutput, result], 0 ) : result); } hiddenState= cellOutput; if ( options. returnSequence) { cellOutput= builder. reshape( cellOutput, [ 1 , numDirections, - 1 , hiddenSize]); sequence= ( sequence? builder. concat([ sequence, cellOutput], 0 ) : cellOutput); } } return ( sequence? [ hiddenState, sequence] : [ hiddenState]);
6.7.10. gruCell
A single time step of the Gated Recurrent Unit [GRU] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of a recurrent network.dictionary {MLGruCellOptions MLOperand ;bias MLOperand ;recurrentBias ;boolean =resetAfter true ;MLRecurrentNetworkWeightLayout = "zrn";layout ;sequence <MLOperator >; };activations partial interface MLGraphBuilder {MLOperand (gruCell MLOperand ,input MLOperand ,weight MLOperand ,recurrentWeight = {});MLOperand ,hiddenState long ,hiddenSize optional MLGruCellOptions = {}); };options
-
input : an
MLOperand. The input 2-D tensor of shape [batch_size, input_size]. -
weight : an
MLOperand. The 2-D input weight tensor of shape [3 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the layout argument. -
recurrentWeight : an
MLOperand. The 2-D recurrent weight tensor of shape [3 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the layout argument. -
hiddenState : an
MLOperand. The 2-D input hidden state tensor of shape [batch_size, hidden_size]. -
hiddenSize : a
longscalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. -
options : an optional
MLGruCellOptions. The optional parameters of the operation.-
bias : an
MLOperand. The 1-D input bias tensor of shape [3 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
recurrentBias : an
MLOperand. The 1-D recurrent bias tensor of shape [3 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
resetAfter : a
booleanindicating whether to apply the reset gate after or before matrix multiplication. Default to true. -
layout : a
MLRecurrentNetworkWeightLayout. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the update (z) , reset (r) , and new (n) gate, as indicated in the first dimension of the weight and bias tensor shapes. When not specified, the default layout is "zrn" . -
activations : a sequence of
MLOperator. A pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, it’s default to the sigmoid ( "sigmoid" ) and the hyperbolic tangent ( "tanh" ) function respectively.
-
Returns:
an
MLOperand
.
The
2-D
tensor
of
shape
[batch_size,
hidden_size],
the
cell
output
hidden
state
of
a
single
time
step
of
the
recurrent
network.
const one= builder. constant( 1 ); const zero= builder. constant( 0 ); // update gate let z= builder. sigmoid( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 0 ], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 0 ], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 0 , 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 0 , 0 ], [ hiddenSize, - 1 ])) ) ) ) ); // reset gate let r= builder. sigmoid( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ); // new gate let n; if ( resetAfter) { n= builder. tanh( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. mul( r, builder. add( ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ) ) ); } else { n= builder. tanh( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( builder. mul( r, hiddenState), builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ); } // compute the new hidden state return builder. add( builder. mul( z, hiddenState), builder. mul( n, builder. sub( one, z)));
6.7.11. hardSigmoid
Calculate the non-smooth function used in place of a sigmoid function on the input tensor.dictionary {MLHardSigmoidOptions = 0.2; = 0.5;float = 0.2;alpha float = 0.5; };beta partial interface MLGraphBuilder {MLOperand (hardSigmoid MLOperand ,x optional MLHardSigmoidOptions = {});options MLOperator (hardSigmoid optional MLHardSigmoidOptions = {}); };options
-
x : an
MLOperand. The input tensor. -
options : an optional
MLHardSigmoidOptions. The optional parameters of the operation.
Returns:
-
an
MLOperand. The output tensor of the same shape as x . -
an
MLOperator. The operator representing the hard sigmoid operation.
return builder. max( builder. min( builder. add( builder. mul( builder. constant( options. alpha), x), builder. constant( options. beta)), builder. constant( 1 )), builder. constant( 0 ));
6.7.12. hardSwish
Computes the nonlinear function
y
=
x
*
max(0,
min(6,
(x
+
3)))
/
6
that
is
introduced
by
[MobileNetV3]
on
the
input
tensor
element-wise.
partial interface MLGraphBuilder {MLOperand (hardSwish MLOperand );x MLOperator (); };hardSwish
-
x : an
MLOperand. The input tensor.
Returns:
-
an
MLOperand. The output tensor of the same shape as x . -
an
MLOperator. The operator representing the hard-swish operation.
return builder. div( builder. mul( x, builder. max( builder. constant( 0 ), builder. min( builder. constant( 6 ), builder. add( x, builder. constant( 3 ))))), builder. constant( 6 ));
6.7.13. instanceNormalization
Normalize the input features using [Instance-Normalization] . Unlike § 6.7.1 batchNormalization where the mean and variance values used in the calculation are previously computed across the batch dimension during the model training phase, the mean and variance values used in the calculation of an instance normalization are computed internally on the fly per input feature.dictionary {MLInstanceNormalizationOptions MLOperand ;scale MLOperand ;bias = 1e-5;float = 1e-5;epsilon MLInputOperandLayout = "nchw"; };layout partial interface MLGraphBuilder {MLOperand (instanceNormalization MLOperand ,input optional MLInstanceNormalizationOptions = {}); };options
-
input : an
MLOperand. The input 4-D tensor. -
options : an optional
MLInstanceNormalizationOptions. The optional parameters of the operation.-
scale : an
MLOperand. The 1-D tensor of the scaling values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with nchw layout, the feature dimension is 1. -
bias : an
MLOperand. The 1-D tensor of the bias values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with nchw layout, the feature dimension is 1. -
epsilon : a
floatscalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified. -
layout : an
MLInputOperandLayout. This option specifies the layout format of the input. The default value is "nchw" .
-
Returns:
an
MLOperand
.
The
instance-normalized
4-D
tensor
of
the
same
shape
as
the
input
tensor.
// The mean reductions happen over the spatial dimensions of the input // e.g. axis 2 and 3 of the input tensor. const reduceOptions= { axes: [ 2 , 3 ], keepDimensions: true }; const mean= builder. reduceMean( input, reduceOptions); const variance= builder. reduceMean( builder. pow( builder. sub( input, mean), buider. constant( 2 )), reduceOptions); // The scale and bias values are applied per input feature // e.g. axis 1 of the input tensor. const shape= [ 1 , - 1 , 1 , 1 ]; return builder. add( builder. mul( builder. reshape( options. scale, shape), builder. div( builder. sub( input, mean), buidler. pow( builder. add( variance, options. epsilon), builder. constant( 0.5 )) ) ), builder. reshape( options. bias, shape) );
6.7.14. leakyRelu
Calculate the leaky version of rectified linear function on the input tensor element-wise. The calculation follows the expression
max(0,
x)
+
alpha
∗
min(0,
x)
.
dictionary {MLLeakyReluOptions = 0.01;float = 0.01; };alpha partial interface MLGraphBuilder {MLOperand (leakyRelu MLOperand ,x optional MLLeakyReluOptions = {});options MLOperator (leakyRelu optional MLLeakyReluOptions = {}); };options
-
x : an
MLOperand. The input tensor. -
options : an optional
MLLeakyReluOptions. The optional parameters of the operation.-
alpha : a
floatscalar multiplier, default to 0.01.
-
Returns:
-
an
MLOperand. The output tensor of the same shape as x . -
an
MLOperator. The operator representing the leaky relu operation.
return builder. add( builder. max( builder. constant( 0 ), x), builder. mul( builder. constant( options. alpha), builder. min( builder. constant( 0 ), x)));
6.7.15. matmul
Compute the matrix product of two input tensors.partial interface MLGraphBuilder {MLOperand (matmul MLOperand ,a MLOperand ); };b
Returns:
an
MLOperand
.
The
output
N-D
tensor
that
contains
the
matrix
product
of
two
input
tensors.
Compute the matrix product of two input tensors. It behaves as following:
-
If both a and b are 2-D, they are multiplied like conventional matrices and produce a 2-D tensor as the output.
-
If either a or b is N-D, N > 2, it is treated as a stack of matrices with dimensions corresponding to the last two indices. The matrix multiplication will be broadcasted accordingly by following [numpy-broadcasting-rule] . The output is a N-D tensor whose rank is the maximum rank of the input tensors. For each dimension, except the last two, of the output tensor, its size is the maximum size along that dimension of the input tensors.
-
If a is 1-D, it is converted to a 2-D tensor by prepending a 1 to its dimensions.
-
If b is 1-D, it is converted to a 2-D tensor by by appending a 1 to its dimensions.
-
If both a and b are 1-D, the operation is a vector dot-product, which produces a scalar output.
6.7.16. linear
Calculate a linear function
y
=
alpha
*
x
+
beta
on
the
input
tensor.
dictionary {MLLinearOptions = 1; = 0;float = 1;alpha float = 0; };beta partial interface MLGraphBuilder {MLOperand (linear MLOperand ,x optional MLLinearOptions = {});options MLOperator (linear optional MLLinearOptions = {}); };options
-
x : an
MLOperand. The input tensor. -
options : an optional
MLLinearOptions. The optional parameters of the operation.
Returns:
-
an
MLOperand. The output tensor of the same shape as x . -
an
MLOperator. The operator representing the linear operation.
return builder. add( builder. mul( x, builder. constant( options. alpha)), builder. constant( options. beta));
6.7.17. pad
Inflate the tensor with constant or mirrored values on the edges.enum {MLPaddingMode ,"constant" ,"edge" ,"reflection" };"symmetric" dictionary {MLPadOptions MLPaddingMode = "constant";mode = 0;float = 0; };value partial interface MLGraphBuilder {MLOperand (pad MLOperand ,input MLOperand ,padding optional MLPadOptions = {}); };options
-
input : an
MLOperand. The input tensor. -
padding : an
MLOperand. The 2-D Tensor of integer values indicating the number of padding values to add at the beginning and end of each input dimensions. The tensor has shape [ n , 2] where n is the rank of the input tensor. For each dimension D of input , padding[D, 0] indicates how many values to add before the content in that dimension, and padding[D, 1] indicates how many values to add after the content in that dimension. -
options : an optional
MLPadOptions. The optional parameters of the operation.-
mode : a
MLPaddingMode. The different ways to pad the tensor. When not set, it’s assumed to be "constant". -
value : a
float. The pad value when the options.mode is set to "constant" . When not set, it’s assumed to be 0.
-
Returns:
an
MLOperand
.
The
padded
output
tensor.
// input: [[1,2,3], [4,5,6]] const input= builder. constant( { type: 'float32' , dimensions: [ 2 , 3 ] }, new Float32Array([ 1 , 2 , 3 , 4 , 5 , 6 ])); // padding: [[1,1], [2,2]] const padding= builder. constant( { type: 'float32' , dimensions: [ 2 , 2 ] }, new Float32Array([ 1 , 1 , 2 , 2 ])); // "constant" padded: // [[0,0,0,0,0,0,0], // [0,0,1,2,3,0,0], // [0,0,4,5,6,0,0], // [0,0,0,0,0,0,0]] builder. pad( input, padding); // "edge" padded: // [[1,1,1,2,3,3,3], // [1,1,1,2,3,3,3], // [4,4,4,5,6,6,6], // [4,4,4,5,6,6,6]] builder. pad( input, padding, { mode: "edge" }); // "reflection" padded: // [[6,5,4,5,6,5,4], // [3,2,1,2,3,2,1], // [6,5,4,5,6,5,4], // [3,2,1,2,3,2,1]] builder. pad( input, padding, { mode: "reflection" }); // "symmetric" padded: // [[2,1,1,2,3,3,2], // [2,1,1,2,3,3,2], // [5,4,4,5,6,6,5], // [5,4,4,5,6,6,5]] builder. pad( input, padding, { mode: "symmetric" });
6.7.18. pooling operations
Compute a mean , L2 norm , or max reduction operation across all the elements within the moving window over the input tensor. See the description of each type of reduction in § 6.7.19 reduction operations .dictionary {MLPool2dOptions ; ; ; ;sequence <long >;windowDimensions sequence <long >;padding sequence <long >;strides sequence <long >;dilations MLAutoPad = "explicit";autoPad MLInputOperandLayout = "nchw"; };layout partial interface MLGraphBuilder {MLOperand (averagePool2d MLOperand ,input optional MLPool2dOptions = {});options MLOperand (l2Pool2d MLOperand ,input optional MLPool2dOptions = {});options MLOperand (maxPool2d MLOperand ,input optional MLPool2dOptions = {}); };options
-
input : an
MLOperand. The input 4-D tensor. The logical shape is interpreted according to the value of options.layout . -
options : an optional
MLPool2dOptions. The optional parameters of the operation.-
windowDimensions : a sequence of
longof length 2. The dimensions of the sliding window, [window_height, window_width]. If not present, the window dimensions are assumed to be the height and width dimensions of the input shape. -
padding : a sequence of
longof length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of input , [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0]. -
strides : a sequence of
longof length 2. The stride of the sliding window for each spatial dimension of input , [stride_height, stride_width]. If not present, the values are assumed to be [1,1]. -
dilations : a sequence of
longof length 2. The dilation factor for each spatial dimension of input , [dilation_height, dilation_width]. If not present, the values are assumed to be [1,1]. -
autoPad : an
MLAutoPad. The automatic input padding options. By default, this argument is set to "explicit" , which means that the values in the options.padding array should be used for input padding. When the option is set other than "explicit" , the values in the options.padding array are ignored. With the "same-upper" option, the padding values are automatically computed such that the additional ending padding of the spatial input dimensions would allow all of the input values in the corresponding dimension to be filtered. The "same-lower" option is similar but padding is applied to the beginning padding of the spatial input dimensions instead of the ending one. -
layout : an
MLInputOperandLayout. The default value is "nchw" . This option specifies the layout format of the input and output tensor as follow:"nchw":
-
input tensor: [batches, channels, height, width]
-
output tensor: [batches, channels, height, width]
"nhwc":
-
input tensor: [batches, height, width, channels]
-
output tensor: [batches, height, width, channels]
-
-
Returns:
an
MLOperand
.
The
output
4-D
tensor
that
contains
the
result
of
the
reduction.
The
logical
shape
is
interpreted
according
to
the
value
of
layout
.
// 'global' max pooling builder. maxPool2d( input);
6.7.19. reduction operations
Reduce the input along the dimensions given in axes .dictionary {MLReduceOptions ; ;sequence <long >=axes null ;boolean =keepDimensions false ; };partial interface MLGraphBuilder {MLOperand (reduceL1 MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceL2 MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceLogSum MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceLogSumExp MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceMax MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceMean MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceMin MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceProduct MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceSum MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceSumSquare MLOperand ,input optional MLReduceOptions = {}); };options
-
input : an
MLOperand. The input tensor. -
options : an optional
MLReduceOptions. The optional parameters of the operation.
Returns:
an
MLOperand
.
The
reduced
output
tensor.
Reduction types:
-
L1 : Compute the L1 norm of all the input values along the axes.
-
L2 : Compute the L2 norm of all the input values along the axes.
-
LogSum : Compute the log value of the sum of all the input values along the axes.
-
LogSumExp : Compute the log value of the sum of the exponent of all the input values along the axes.
-
Max : Compute the maximum value of all the input values along the axes.
-
Mean : Compute the average value of all the input values along the axes.
-
Min : Compute the minimum value of all the input values along the axes.
-
Product : Compute the product of all the input values along the axes.
-
Sum : Compute the sum of all the input values along the axes.
-
SumSquare : Compute the sum of the square of all the input values along the axes.
6.7.20. relu
Compute the rectified linear function of the input tensor.partial interface MLGraphBuilder {MLOperand (relu MLOperand );x MLOperator (); };relu
-
x : an
MLOperand. The input tensor.
Returns:
-
an
MLOperand. The output tensor of the same shape as x . -
an
MLOperator. The operator representing the relu operation.
return builder. max( builder. constant( 0 ), x);
6.7.21. resample
Resample the tensor values from the source to the destination dimensions according to the scaling factors.enum {MLInterpolationMode ,"nearest-neighbor" };"linear" dictionary {MLResampleOptions MLInterpolationMode = "nearest-neighbor";mode ; ;sequence <float >;scales sequence <long >; };sizes partial interface MLGraphBuilder {MLOperand (resample MLOperand ,input optional MLResampleOptions = {}); };options
-
input : an
MLOperand. The input 4-D tensor. -
options : an optional
MLResampleOptions. The optional parameters of the operation.-
mode : an
MLInterpolationMode. The interpolation algorithm used to fill the output tensor values. If not set, it is assumed to be the Nearest Neighbor interpolation. -
scales : a sequence of
floatof length 4. Each value represents the scaling factor used to scale in each input dimensions. -
sizes : a sequence of
longof length 4. The target sizes for each input dimensions. When the target sizes are specified, the options.scales argument is ignored as the scaling factor values are derived from the target sizes of each input dimension.
-
Returns:
an
MLOperand
.
The
output
4-D
tensor.
6.7.22. reshape
Alter the shape of a tensor to a new shape. Reshape does not copy or change the content of the tensor. It just changes the tensor’s logical dimensions for the subsequent operations.partial interface MLGraphBuilder {);MLOperand (reshape MLOperand ,input sequence <long >); };newShape
-
input : an
MLOperand. The input tensor. -
newShape : a sequence of
long. The shape of the output tensor. The number of elements implied by newShape must be the same as the number of elements in the input tensor. Only one component of newShape can be the special value of -1. The size of the dimension with the value -1 is computed so that the total size remains constant.
Returns:
an
MLOperand
.
The
output
tensor.
The
values
of
the
output
tensor
are
the
same
as
values
of
the
input
tensor.
The
shape
of
the
output
tensor
is
specified
by
the
newShape
argument.
6.7.23. sigmoid
Compute the sigmoid function of the input tensor. The calculation follows the expression
1
/
(exp(-x)
+
1)
.
partial interface MLGraphBuilder {MLOperand (sigmoid MLOperand );x MLOperator (); };sigmoid
-
x : an
MLOperand. The input tensor.
Returns:
-
an
MLOperand. The output tensor of the same shape as x . -
an
MLOperator. The operator representing the sigmoid operation.
return builder. div( builder. constant( 1 ), builder. add( builder. exp( builder. neg( x)), builder. constant( 1 )));
6.7.24. slice
Produce a slice of the input tensor.dictionary {MLSliceOptions ;sequence <long >; };axes partial interface MLGraphBuilder {,MLOperand (slice MLOperand ,input sequence <long >,starts sequence <long >,sizes optional MLSliceOptions = {}); };options
-
input : an
MLOperand. The input tensor. -
starts : a sequence of
long. The starting indices to slice of the corresponding axes of the input shape. A negative index value is interpreted as counting back from the end. For example, the value -1 -
sizes : a sequence of
long. The lengths to slice of the corresponding axes of the input shape. The length value of -1 selects all the remaining elements from the starting index of the given axis. -
options : an optional
MLSliceOptions. The optional parameters of the operation.-
axes : a sequence of
long. The dimensions of the input shape to which starts and sizes apply. The values in the sequence are either within the [0, r -1] range where r is the input tensor rank, or the [ -r , -1] range where negative values mean counting back from the end of the input shape. When not specified, the sequence is assumed to be [0,1,.. r-1 ].
-
Returns:
an
MLOperand
.
The
output
tensor
of
the
same
rank
as
the
input
tensor
with
tensor
values
stripped
to
the
specified
starting
and
ending
indices
in
each
dimension.
6.7.25. softmax
Compute the softmax values of the 2-D input tensor along axis 1.partial interface MLGraphBuilder {MLOperand (softmax MLOperand ); };x
-
x : an
MLOperand. The input 2-D tensor.
Returns:
an
MLOperand
.
The
output
2-D
tensor
that
contains
the
softmax
results,
of
the
same
shape
as
the
input
tensor.
// This sample deploys a well-known implementation trick [1] to compute the // exponentials of the distances to the max value, instead of the exponentials // of the input values itself, in order to increase the numerical stability of // the result. // [1]: https://cs231n.github.io/linear-classify/#softmax const max_x= builder. reduceMax( x, { axes: [ 1 ], keepDimensions: true }); const exp_x= builder. exp( builder. sub( x, max)); return builder. div( exp_x, builder. reduceSum( exp_x, { axes: [ 1 ], keepDimensions: true }));
6.7.26. softplus
Compute the softplus function of the input tensor. The calculation follows the expression
ln(1
+
exp(steepness
*
x))
/
steepness
.
dictionary {MLSoftplusOptions = 1;float = 1; };steepness partial interface MLGraphBuilder {MLOperand (softplus MLOperand ,x optional MLSoftplusOptions = {});options MLOperator (softplus optional MLSoftplusOptions = {}); };options
-
x : an
MLOperand. The input tensor.
Returns:
-
an
MLOperand. The output tensor of the same shape as x . -
an
MLOperator. The operator representing the softplus operation.
return builder. div( builder. log( builder. add( builder. exp( builder. mul( x, builder. constant( options. steepness))), builder. constant( 1 ))), builder. constant( options. steepness));
6.7.27. softsign
Compute the softsign function of the input tensor. The calculation follows the expression
x
/
(1
+
|x|)
.
partial interface MLGraphBuilder {MLOperand (softsign MLOperand );x MLOperator (); };softsign
-
x : an
MLOperand. The input tensor.
Returns:
-
an
MLOperand. The output tensor of the same shape as x . -
an
MLOperator. The operator representing the softsign operation.
return builder. div( x, builder. add( builder. constant( 1 ), build. abs( x)));
6.7.28. split
Split the input tensor into a number of sub tensors along the given axis.dictionary {MLSplitOptions = 0;long = 0; };axis partial interface MLGraphBuilder {, (,sequence <MLOperand >(split MLOperand , (input unsigned long or sequence <unsigned long >),splits optional MLSplitOptions = {}); };options
-
input : an
MLOperand. The input tensor. -
splits : an
unsigned longor a sequence ofunsigned long. If anunsigned long, it specifies the number of output tensors along the axis. The number must evenly divide the dimension size of input along options.axis . If a sequence ofunsigned long, it specifies the sizes of each output tensor along the options.axis . The sum of sizes must equal to the dimension size of input along options.axis . -
options : an optional
MLSplitOptions. The optional parameters of the operation.-
axis : a
long. The dimension along which to split. Default to 0. A negative value is interpreted as counting back from the end.
-
Returns:
a
sequence
of
MLOperand
.
The
splitted
output
tensors.
If
splits
is
an
unsigned
long
,
the
length
of
the
output
sequence
equals
to
splits
.
The
shape
of
each
output
tensor
is
the
same
as
input
except
the
dimension
size
of
axis
equals
to
the
quotient
of
dividing
the
dimension
size
of
input
along
axis
by
splits
.
If
splits
is
a
sequence
of
unsigned
long
,
the
length
of
the
output
sequence
equals
to
the
length
of
splits
.
The
shape
of
the
i-th
output
tensor
is
the
same
as
as
input
except
along
axis
where
the
dimension
size
is
splits[i]
.
// This sample shows the case that the splits parameter is an array. const outputs= []; let start= 0 ; for ( const sizeof splits) { outputs. push( builder. slice( input, [ start], [ size], { axis: [ options. axis] })); start+= size; } return outputs;
6.7.29. squeeze
Reduce the rank of a tensor by eliminating dimensions with size 1 of the tensor shape. Squeeze only affects the tensor’s logical dimensions. It does not copy or change the content in the tensor.dictionary {MLSqueezeOptions ;sequence <long >; };axes partial interface MLGraphBuilder {MLOperand (squeeze MLOperand ,input optional MLSqueezeOptions = {}); };options
-
input : an
MLOperand. The input tensor. -
options : an optional
MLSqueezeOptions. The optional parameters of the operation.-
axes : a sequence of
long. Indices to the shape dimensions of size 1 to eliminate. When not specified, every shape dimensions of size 1 in the tensor are eliminated.
-
Returns:
an
MLOperand
.
The
output
tensor
of
the
same
or
reduced
rank
with
the
shape
dimensions
of
size
1
eliminated.
6.7.30. tanh
Compute the hyperbolic tangent function of the input tensor. The calculation follows the expression
(exp(2
*
x)
-
1)
/
(exp(2
*
x)
+
1)
.
partial interface MLGraphBuilder {MLOperand (tanh MLOperand );x MLOperator (); };tanh
-
x : an
MLOperand. The input tensor.
Returns:
-
an
MLOperand. The output tensor of the same shape as x . -
an
MLOperator. The operator representing the tanh operation.
return builder. div( builder. sub( builder. exp( builder. mul( builder. constant( 2 ), x)), builder. constant( 1 )), builder. add( builder. exp( builder. mul( builder. constant( 2 ), x)), builder. constant( 1 )));
6.7.31. transpose
Permute the dimensions of the input tensor according to the permutation argument.dictionary {MLTransposeOptions ;sequence <long >; };permutation partial interface MLGraphBuilder {MLOperand (transpose MLOperand ,input optional MLTransposeOptions = {}); };options
-
input : an
MLOperand. The input N-D tensor. -
options : an optional
MLTransposeOptions. The optional parameters of the operation.-
permutation : a sequence of
longvalues. The values used to permute the output shape. When it’s not specified, it’s set to[N-1...0], whereNis the rank of the input tensor. These default values cause the output to become a transposed tensor of the input. When specified, the number of values in the sequence must be the same as the rank of the input tensor, and the values in the sequence must be within the range from 0 to N-1 with no two or more same values found in the sequence.
-
Returns:
an
MLOperand
.
The
permuted
or
transposed
N-D
tensor.
6.8. MLGraph
The
MLGraph
interface
represents
a
compiled
computational
graph.
A
compiled
graph
once
constructed
is
immutable
and
cannot
be
subsequently
changed.
typedef (MLBufferView or WebGLTexture or GPUTexture );MLResource dictionary {MLInput required MLResource ;resource ;required sequence <long >; };dimensions ; ;typedef record <DOMString , (MLResource or MLInput )>;MLNamedInputs typedef record <DOMString ,MLResource >;MLNamedOutputs [)][SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {MLGraph );undefined compute (MLNamedInputs inputs ,MLNamedOutputs outputs ); };
MLGraph
has
the
following
internal
slots:
-
[[context]]of typeMLContext -
The context of type
MLContextassociated with thisMLGraph. -
[[inputDescriptors]]of type record <DOMString,MLOperandDescriptor> -
Maps the name of an input
MLOperandto itsMLOperandDescriptorfor all inputMLOperands of thisMLGraph. -
[[outputNames]]of type sequence <DOMString> -
Contains the names of all output
MLOperands of thisMLGraph. -
[[implementation]] -
The underlying implemenation provided by the User Agent.
-
compute(inputs, outputs) -
Compute the
MLGraphgivenMLNamedInputsandMLNamedOutputs. Return once the compute has completed and the results inMLNamedOutputsare ready to be consumed.Called on:MLGraphthis .Arguments:
Arguments for the MLGraph.compute(inputs, outputs) method. Parameter Type Nullable Optional Description inputsMLNamedInputs ✘ ✘ an MLNamedInputs. The resources and optional dimensions of inputs for the compute.outputsMLNamedOutputs ✘ ✘ an MLNamedOutputs. The pre-allocated resources of required outputs for the compute.Returns:
undefined.-
If any of the following requirements are unmet, then throw a
DataErrorDOMExceptionand stop.-
For each key -> value of inputs :
-
this .
[[inputDescriptors]][ key ] must exist. -
Let inputDesc be this .
[[inputDescriptors]][ key ]. -
Let inputSize be 1.
-
If value is an
MLInput, then:-
The length of value .
dimensionsmust be the same as the length of inputDesc .dimensions. -
Let i be 0.
-
While true:
-
Let dimension be value .
dimensions[ i ]. -
dimension must be greater than 0.
-
If inputDesc .
dimensions[ i ] is greater than 0, then dimension must be equal to inputDesc .dimensions[ i ]. -
Set inputSize to the product of inputSize and dimension .
-
Increment i by 1.
-
If i if equal to the length of value .
dimensions, then break.
-
-
-
Else:
-
For each dimension of inputDesc .
dimensions:-
The value of dimension must be greater than 0.
-
Set inputSize to the product of inputSize and dimension .
-
-
-
If value is an
MLInput, then let resource be value .resource. -
If value is an
MLResource, then let resource be value . -
If resource is an
ArrayBufferView, then:-
The kind of resource must be compatible with inputDesc .
typeaccording to this table . -
The length of resource must be the same as inputSize .
-
-
-
For each key -> value of outputs :
-
this .
[[outputNames]][ key ] must exist.
-
-
-
For each key -> value of inputs :
-
Let inputDesc be this .
[[inputDescriptors]][ key ]. -
Let inputTensor be a new tensor for this .
[[implementation]]of data type that is compatible with inputDesc .type. -
If value is an
MLInput, then:-
Set the dimensions of inputTensor to value .
dimensions.
-
-
Else:
-
Set the dimensions of inputTensor to inputDesc .
dimensions.
-
-
If value is an
MLInput, then:-
Set the values of inputTensor to the values of value .
resource.
-
-
If value is an
MLResource, then:-
Set the values of inputTensor to the values of value .
-
-
Set the input of this .
[[implementation]]that is associated with key to inputTensor .
-
-
For each key -> value of outputs :
-
Issue a compute request for output of this .
[[implementation]]that is associated with key . -
Wait for the compute request to be completed.
-
If there is an error returned by this .
[[implementation]], then:-
Throw an
OperationErrorDOMExceptionand stop.
-
-
Else:
-
Let outputTensor be the output tensor returned by this .
[[implementation]]. -
If the kind of value is not compatible with the value type of outputTensor , then throw a
DataErrorDOMExceptionand stop. -
Let outputSize be 1.
-
For each dimension of dimensions of outputTensor :
-
Set outputSize to the product of outputSize and dimension .
-
-
If outputSize is greater than the length of value , then:
-
Throw a
DataErrorDOMExceptionand stop.
-
-
Else:
-
Set the values of value to the values of outputTensor .
-
-
-
-
Return
undefined.
Describe the algorithm steps for this .
[[context]]created fromWebGLRenderingContextandGPUDevice. -
6.8.1. Examples
function sizeOfShape( array) { return array. reduce( ( accumulator, currentValue) => accumulator* currentValue); } const context= navigator. ml. createContext(); // Create a graph with dynamic shaped inputs. const builder= new MLGraphBuilder( context); const descA= { type: 'float32' , dimensions: [ - 1 , 4 ]}; const a= builder. input( 'a' , descA); const descB= { type: 'float32' , dimensions: [ 4 , - 1 ]}; const b= builder. input( 'b' , descB); const c= builder. matmul( a, b); const graph= builder. build({ 'c' : c}); function allocateAndCompute( shapeA, shapeB, shapeC) { const bufferA= new Float32Array( sizeOfShape( shapeA)). fill( 0.5 ); const bufferB= new Float32Array( sizeOfShape( shapeB)). fill( 0.5 ); const bufferC= new Float32Array( sizeOfShape( shapeC)); // Specify the shape of inputs when computing. const inputs= { 'a' : { resource: bufferA, dimensions: shapeA}, 'b' : { resource: bufferB, dimensions: shapeB}, }; const outputs= { 'c' : bufferC}; graph. compute( inputs, outputs); console. log( `values: ${ bufferC} ` ); } allocateAndCompute([ 3 , 4 ], [ 4 , 3 ], [ 3 , 3 ]); allocateAndCompute([ 4 , 4 ], [ 4 , 4 ], [ 4 , 4 ]); allocateAndCompute([ 5 , 4 ], [ 4 , 5 ], [ 5 , 5 ]);
const context= navigator. ml. createContext(); // Build a graph with two outputs. const builder= new MLGraphBuilder( context); const descA= { type: 'float32' , dimensions: [ 3 , 4 ]}; const a= builder. input( 'a' , descA); const descB= { type: 'float32' , dimensions: [ 4 , 3 ]}; const bufferB= new Float32Array( sizeOfShape( descB. dimensions)). fill( 0.5 ); const b= builder. constant( descB, bufferB); const descC= { type: 'float32' , dimensions: [ 3 , 3 ]}; const bufferC= new Float32Array( sizeOfShape( descC. dimensions)). fill( 1 ); const c= builder. constant( descC, bufferC); const d= builder. matmul( a, b); const e= builder. add( d, c); const graph= builder. build({ 'd' : d, 'e' : e}); const bufferA= new Float32Array( sizeOfShape( descA. dimensions)). fill( 0.5 ); const inputs= { 'a' : bufferA}; // Compute d. const bufferD= new Float32Array( sizeOfShape([ 3 , 3 ])); graph. compute( inputs, { 'd' : bufferD}); console. log( `values: ${ bufferD} ` ); // Compute e. const bufferE= new Float32Array( sizeOfShape([ 3 , 3 ])); graph. compute( inputs, { 'e' : bufferE}); console. log( `values: ${ bufferE} ` );
7. Examples
const context= navigator. ml. createContext({ powerPreference: 'low-power' });
constant1 ---+
+--- Add ---> intermediateOutput1 ---+
input1 ---+ |
+--- Mul---> output
constant2 ---+ |
+--- Add ---> intermediateOutput2 ---+
input2 ---+
// Use tensors in 4 dimensions. const TENSOR_DIMS= [ 1 , 2 , 2 , 2 ]; const TENSOR_SIZE= 8 ; const builder= new MLGraphBuilder( context); // Create MLOperandDescriptor object. const desc= { type: 'float32' , dimensions: TENSOR_DIMS}; // constant1 is a constant MLOperand with the value 0.5. const constantBuffer1= new Float32Array( TENSOR_SIZE). fill( 0.5 ); const constant1= builder. constant( desc, constantBuffer1); // input1 is one of the input MLOperands. Its value will be set before execution. const input1= builder. input( 'input1' , desc); // constant2 is another constant MLOperand with the value 0.5. const constantBuffer2= new Float32Array( TENSOR_SIZE). fill( 0.5 ); const constant2= builder. constant( desc, constantBuffer2); // input2 is another input MLOperand. Its value will be set before execution. const input2= builder. input( 'input2' , desc); // intermediateOutput1 is the output of the first Add operation. const intermediateOutput1= builder. add( constant1, input1); // intermediateOutput2 is the output of the second Add operation. const intermediateOutput2= builder. add( constant2, input2); // output is the output MLOperand of the Mul operation. const output= builder. mul( intermediateOutput1, intermediateOutput2);
// Compile the constructed graph. const graph= builder. build({ 'output' : output});
// Setup the input buffers with value 1. const inputBuffer1= new Float32Array( TENSOR_SIZE). fill( 1 ); const inputBuffer2= new Float32Array( TENSOR_SIZE). fill( 1 ); const outputBuffer= new Float32Array( TENSOR_SIZE); // Execute the compiled graph with the specified inputs. const inputs= { 'input1' : inputBuffer1, 'input2' : inputBuffer2, }; const outputs= { 'output' : outputBuffer}; graph. compute( inputs, outputs); console. log( 'Output value: ' + outputBuffer); // Output value: 2.25,2.25,2.25,2.25,2.25,2.25,2.25,2.25
8. Appendices
8.1.
MLOperandType
and
ArrayBufferView
compatibility
MLOperandType
|
ArrayBufferView
|
|---|---|
float32
|
Float32Array
|
int32
|
Int32Array
|
uint32
|
Uint32Array
|
int8
|
Int8Array
|
uint8
|
Uint8Array
|
clarify
the
usage
of
ArrayBufferView
for
float16
.
<https://github.com/webmachinelearning/webnn/issues/127>
[Issue
#webmachinelearning/webnn#127]
9. Acknowledgements
This specification follows the concepts of the Android Neural Networks API C API.
Thanks to Tomoyuki Shimizu, Ningxin Hu, Zhiqiang Yu and Belem Zhang for the use cases.
Thanks to Nikhil Thorat, Daniel Smilkov, Ganesan Ramalingam, Rafael Cintron and Benjamin Poulain for their contributions to the API specification.