1. Introduction
We’re working on this section. Meanwhile, please take a look at the explainer .
2. Use cases
2.1. Application Use Cases
This section illustrates application-level use cases for neural network inference hardware acceleration. All applications in those use cases can be built on top of pre-trained deep neural network (DNN) [models] .
2.1.1. Person Detection
A user opens a web-based video conferencing application, but she temporarily leaves from her room. The application is watching whether she is in front of her PC by using object detection (for example, using object detection approaches such as [SSD] or [YOLO] that use a single DNN) to detect regions in a camera input frame that include persons.
When she comes back, the application automatically detects her and notifies other online users that she is active now.
2.1.2. Semantic Segmentation
A user joins a teleconference via a web-based video conferencing application at her desk since no meeting room in her office is available. During the teleconference, she does not wish that her room and people in the background are visible. To protect the privacy of the other people and the surroundings, the application runs a machine learning model such as [DeepLabv3+] or [MaskR-CNN] to semantically split an image into segments and replaces segments that represent other people and background with another picture.
2.1.3. Skeleton Detection
A web-based video conferencing application tracks a pose of user’s skeleton by running a machine learning model, which allows for real-time human pose estimation, such as [PoseNet] to recognize her gesture and body language. When she raises her hand, her microphone is automatically unmuted and she can start speaking on the teleconference.
2.1.4. Face Recognition
There are multiple people in the conference room and they join an online meeting using a web-based video conferencing application. The application detects faces of participants by using object detection (for example, using object detection approaches such as [SSD] ) and checks whether each face was present at the previous meeting or not by running a machine learning model such as [FaceNet] , which verifies whether two faces would be identical or not.
2.1.5. Facial Landmark Detection
A user wants to find new glasses that beautifully fits her on an online glasses store. The online store offers web-based try-on simulator that runs a machine learning model such as Face Alignment Network [FAN] to detect facial landmarks like eyes, nose, mouth, etc. When she chooses a pair of glasses, the simulator properly renders the selected glasses on the detected position of eyes on her facial image.
2.1.6. Style Transfer
A user is looking for cosmetics on an online store and wondering which color may fit her face. The online store shows sample facial makeup images of cosmetics, and offers makeup simulator that runs a machine learning model like [ContextualLoss] or [PairedCycleGAN] to transfer the makeup style of the sample makeup image to her facial image. She can check how the selected makeup looks like on her face by the simulator.
2.1.7. Super Resolution
A web-based video conferencing is receiving a video stream from its peer, but the resolution of the video becomes lower due to network congestion. To prevent degradation of the perceived video quality, the application runs a machine learning model for super-resolution such as [SRGAN] to generate higher-resolution video frames.
2.1.8. Image Captioning
For better accessibility, a web-based presentation application provides automatic image captioning by running a machine learning model such as [im2txt] which predicts explanatory words of the presentation slides.
2.1.9. Machine Translation
Multiple people from various countries are talking via a web-based real-time text chat application. The application translates their conversation by using a machine learning model such as [GNMT] or [OpenNMT] , which translates every text into different language.
2.1.10. Emotion Analysis
A user is talking to her friend via a web-based real-time text chat application, and she is wondering how the friend feels because she cannot see the friend’s face. The application analyses the friend’s emotion by using a machine learning model such as [DeepMoji] , which infers emotion from input texts, and displays an emoji that represents the estimated emotion.
2.1.11. Video Summarization
A web-based video conferencing application records received video streams, and it needs to reduce recorded video data to be stored. The application generates the short version of the recorded video by using a machine learning model for video summarization such as [Video-Summarization-with-LSTM] .
2.1.12. Noise Suppression
A web-based video conferencing application records received audio streams, but usually the background noise is everywhere. The application leverages real-time noise suppression using Recurrent Neural Network such as [RNNoise] for suppressing background dynamic noise like baby cry or dog barking to improve audio experiences in video conferences.
2.2. Framework Use Cases
This section collects framework-level use cases for a dedicated low-level API for neural network inference hardware acceleration. It is expected that Machine Learning frameworks will be key consumers of the Web Neural Network API (WebNN API) and the low-level details exposed through the WebNN API are abstracted out from typical web developers. However, it is also expected that web developers with specific interest and competence in Machine Learning will want to interface with the WebNN API directly instead of a higher-level ML framework.
2.2.1. Custom Layer
A web application developer wants to run a DNN model on the WebNN API. However, she has found that some of activation functions like [LeakyReLU] , [ELU] , etc. are not included in the WebNN API. To address this issue, she constructs custom layers of the additional activation functions on top of the WebNN API. Note that the scope of custom layers may include convolution, normalization, etc. as well as activation.
2.2.2. Network Concatenation
A web application uses a DNN model, and its model data of upper convolutional layers and lower fully-connected layers are stored in separate files, since model data of the fully-connected layers are periodically updated due to fine tuning at the server side.
Therefore, the application downloads both partial model files at first and concatenates them into a single model. When the model is updated, the application downloads fine-tuned part of the model and replace only the fully-connected layers with it.
2.2.3. Performance Adaptation
A web application developer has a concern about performance of her DNN model on mobile devices. She has confirmed that it may run too slow on mobile devices which do not have GPU acceleration. To address this issue, her web application refers to the WebNN API to confirm whether acceleration is available or not, so that the application can display the warning for devices without acceleration.
After several weeks, she has developed a tiny DNN model that can even run on CPU. In order to accommodate CPU execution, she modifies the application so that the application loads the tiny model in the case of CPU-only devices.
2.2.4. Operation Level Execution
A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device.
3. Security Considerations
This API is disabled by default in all cross-origin frames using the § 6.2.1 Permissions Policy Integration . This prevents third-party content from using this API unless the embedding page explicitly sets a policy that grants permission.
This
API
allows
creation
of
an
MLContext
from
a
GPUDevice
or
WebGLRenderingContext
defined
by
WebGPU
and
WebGL
specifications
respectively.
See
WebGPU
Security
Considerations
and
WebGL
Security
Consideration
for
more
information
regarding
security
characteristics
of
these
contexts.
4. Privacy Considerations
This API enhances privacy compared to cloud-based inference, since input data such as locally sourced images or video streams stay within the browser’s sandbox.
This API exposes the minimum amount of information necessary to address the identified § 2 Use cases for the best performance and reliability of results.
No information from the underlying platform is exposed directly. An execution time analysis may reveal indirectly the performance of the underlying platform’s neural network hardware acceleration capabilities relative to another underlying platform.
Note: The group is soliciting further input on the proposed execution time analysis fingerprinting vector and will augment this section with more information and mitigations to inform the implementers of this API.
Implementers of this API are expected to be familiar with the WebGPU Privacy Considerations .
5. Programming Model
3.1.
5.1.
Timelines
This section is non-normative.
A computer system with a user agent at the front-end and ML device at the back-end has components working on different timelines in parallel:
- Content timeline
-
Associated with the execution of the Web script. It includes calling all methods described by this specification.
Steps executed on the content timeline look like this. - Device timeline
-
Associated with the ML device operations that are issued by the user agent. It includes creation of ML devices and resources and state objects, which are typically synchronous operations from the point of view of the user agent part that controls the ML device, but can live in a separate OS process.
Steps executed on the device timeline look like this. - Queue timeline
-
Associated with the execution of operations on the compute units of the ML device. It includes actual copy and compute jobs that run on the ML device.
Steps executed on the queue timeline look like this.
In this specification, asynchronous operations are used when the result value depends on work that happens on any timeline other than the Content timeline . They are represented by callbacks and promises in JavaScript.
MLGraph.compute()
:
-
User issues a compute request by calling
MLGraph.compute()on the Content timeline and gets a promise in return. -
User agent processes the compute request on the Device timeline by calling the OS ML API.
-
After the ML device operating on Queue timeline is done, the user agent makes the results ready to be consumed by user and resolves the promise.
3.2.
5.2.
Device
Selection
An
MLContext
interface
represents
a
global
state
of
neural
network
execution.
One
of
the
important
context
states
is
the
underlying
execution
device
that
manages
the
resources
and
facilitates
the
compilation
and
the
eventual
execution
of
the
neural
network
graph.
An
MLContext
could
be
created
from
a
specific
GPU
device
such
as
GPUDevice
or
WebGLRenderingContext
that
is
already
in
use
by
the
application,
in
which
case
the
corresponding
GPUBuffer
or
WebGLBuffer
resources
used
as
graph
constants,
as
well
as
the
GPUTexture
and
WebGLTexture
as
graph
inputs
must
also
be
created
from
the
same
device.
In
a
multi-adapter
configuration,
the
device
used
for
MLContext
must
be
created
from
the
same
adapter
as
the
device
used
to
allocate
the
resources
referenced
in
the
graph.
In
a
situation
when
a
GPU
context
executes
a
graph
with
a
constant
or
an
input
in
the
system
memory
as
an
ArrayBufferView
,
the
input
content
is
automatically
uploaded
from
the
system
memory
to
the
GPU
memory,
and
downloaded
back
to
the
system
memory
of
an
ArrayBufferView
output
buffer
at
the
end
of
the
graph
execution.
This
data
upload
and
download
cycles
will
only
occur
whenever
the
execution
device
requires
the
data
to
be
copied
out
of
and
back
into
the
system
memory,
such
as
in
the
case
of
the
GPU.
It
doesn’t
occur
when
the
device
is
a
CPU
device.
Additionally,
the
result
of
the
graph
execution
is
in
a
known
layout
format.
While
the
execution
may
be
optimized
for
a
native
memory
access
pattern
in
an
intermediate
result
within
the
graph,
the
output
of
the
last
operation
of
the
graph
must
convert
the
content
back
to
a
known
layout
format
at
the
end
of
the
graph
in
order
to
maintain
the
expected
behavior
from
the
caller’s
perspective.
When
an
MLContext
is
created
with
MLContextOptions
,
the
user
agent
selects
and
creates
the
underlying
execution
device
by
taking
into
account
the
application’s
preference
specified
in
the
MLPowerPreference
and
the
MLDevicePreference
options:
-
The "gpu" device provides the broadest range of achievable performance across graphics hardware platforms from consumer devices to professional workstations.
-
The "cpu" device provides the broadest reach of software compute availability, but with limited scalability of execution performance on the more complex neural networks.
-
When the device preference is not specified ( "default" ), the user agent selects the most suitable device to use.
The following table summarizes the types of resource supported by the device selected.
| Device Type | ArrayBufferView | GPUBuffer | GPUTexture | WebGLBuffer | WebGLTexture |
|---|---|---|---|---|---|
| GPUDevice | Yes | Yes | Yes | No | No |
| WebGLRenderingContext | Yes | No | No | Yes | Yes |
| default | Yes | No | No | No | No |
| gpu | Yes | No | No | No | No |
| cpu | Yes | No | No | No | No |
4.
6.
API
4.1.
6.1.
navigator.ml
A
ML
object
is
available
in
the
Window
and
DedicatedWorkerGlobalScope
contexts
through
the
Navigator
and
WorkerNavigator
interfaces
respectively
and
is
exposed
via
navigator.ml
:
interface mixin { [NavigatorML SecureContext ,SameObject ]readonly attribute ML ; };ml Navigator includes NavigatorML ;WorkerNavigator includes NavigatorML ;
4.2.
6.2.
ML
======= Theenum {MLDevicePreference ,"default" ,"gpu" };"cpu" enum { // Let the user agent select the most suitable behavior.MLPowerPreference , // Prioritizes execution speed over power consumption."default" , // Prioritizes power consumption over other considerations such as execution speed."high-performance" };"low-power" dictionary { // Preferred kind of device usedMLContextOptions MLDevicePreference = "default"; // Preference as related to power consumptiondevicePreference MLPowerPreference = "default"; }; [powerPreference SecureContext ,Exposed =(Window ,DedicatedWorker )]interface { // Create a context with optionsML = {});MLContext (createContext optional MLContextOptions = {}); // Create a context from WebGL rendering contextoptions );MLContext (createContext WebGLRenderingContext ); // Create a context from WebGPU deviceglContext );MLContext (createContext GPUDevice ); };gpuDevice
createContext()
method
steps
are:
-
If the responsible document is not allowed to use the webnn feature, then throw a "
SecurityError"DOMExceptionand abort these steps. -
Let context be a new
MLContextobject. -
Switch on the method’s first argument:
-
MLContextOptions - Set context ’s context type to default .
-
WebGLRenderingContext - Set context ’s context type to webgl .
-
GPUDevice - Set context ’s context type to webgpu .
- Otherwise
- Set context ’s context type to default .
-
-
Return context .
4.2.1.
6.2.1.
Permissions
Policy
Integration
This
specification
defines
a
policy-controlled
feature
identified
by
the
string
"
webnn
".
Its
default
allowlist
is
'self'
.
4.3.
6.3.
MLContext
The
MLContext
interface
represents
a
global
state
of
neural
network
compute
workload
and
execution
processes.
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};MLContext
The
context
type
for
an
MLContext
is
either
"
default
",
"
webgl
"
or
"
webgpu
".
4.4.
6.4.
MLOperandDescriptor
enum {MLInputOperandLayout ,"nchw" };"nhwc" enum {MLOperandType ,"float32" ,"float16" ,"int32" ,"uint32" ,"int8" };"uint8" dictionary { // The operand type.MLOperandDescriptor required MLOperandType ; // The dimensions field is only required for tensor operands. // The negative value means an unknown dimension.type sequence <long >; };dimensions
4.5.
6.5.
MLOperand
[SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {};MLOperand
4.6.
6.6.
MLGraphBuilder
The
MLGraphBuilder
interface
defines
a
set
of
operations
as
identified
by
the
§ 2
Use
cases
that
can
be
composed
into
a
computational
graph.
It
also
represents
the
intermediate
state
of
a
graph
building
session.
typedef record <DOMString ,MLOperand >;MLNamedOperands dictionary {MLBufferResourceView required (WebGLBuffer or GPUBuffer );resource unsigned long long = 0;offset unsigned long long ; };size typedef (ArrayBufferView or MLBufferResourceView ); [MLBufferView SecureContext ,Exposed =(Window ,DedicatedWorker )]interface { // Construct the graph builder from the context.MLGraphBuilder );(constructor MLContext ); // Create an operand for a graph input.context MLOperand (input DOMString ,name MLOperandDescriptor ); // Create an operand for a graph constant.desc MLOperand (constant MLOperandDescriptor ,desc MLBufferView ); // Create a single-value operand from the specified number of the specified type.bufferView MLOperand (constant double ,value optional MLOperandType = "float32"); // Compile the graph up to the specified output operandstype Promise <MLGraph >(build MLNamedOperands ); };outputs
4.6.1.
6.6.1.
batchNormalization
Normalize
the
tensor
values
of
input
features
across
the
batch
dimension
using
[Batch-Normalization]
.
For
each
input
feature,
the
mean
and
variance
values
of
that
feature
supplied
in
this
calculation
as
parameters
are
previously
computed
across
the
batch
dimension
of
the
input
during
the
model
training
phrase
of
this
operation.
dictionary {MLBatchNormalizationOptions MLOperand ;scale MLOperand ;bias long = 1;axis float = 1e-5; };epsilon partial interface MLGraphBuilder {MLOperand (batchNormalization MLOperand ,input MLOperand ,mean MLOperand ,variance optional MLBatchNormalizationOptions = {}); };options
-
input : an
MLOperand. The input N-D tensor. -
mean : an
MLOperand. The 1-D tensor of the mean values of the input features across the batch whose length is equal to the size of the input dimension denoted by options.axis . -
variance : an
MLOperand. The 1-D tensor of the variance values of the input features across the batch whose length is equal to the size of the input dimension denoted by options.axis . -
options : an optional
MLBatchNormalizationOptions. The optional parameters of the operation.-
scale : an
MLOperand. The 1-D tensor of the scaling values whose length is equal to the size of the input dimension denoted by options.axis . -
bias : an
MLOperand. The 1-D tensor of the bias values whose length is equal to the size of the input dimension denoted by options.axis . -
axis : a
longscalar. The index to the feature count dimension of the input shape for which the mean and variance values are. When it’s not specified, the default value is 1. -
epsilon : a
floatscalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified.
-
Returns:
an
MLOperand
.
The
batch-normalized
N-D
tensor
of
the
same
shape
as
the
input
tensor.
When input is a 4-D tensor of the "nchw" or "nhwc" layout, options.axis should be set to 1 or 3 respectively. The axis value designates the feature or channel count dimension of the input tensor.
const shape= [ 1 , - 1 , 1 , 1 ]; return builder. add( builder. mul( builder. reshape( options. scale, shape), builder. div( builder. sub( input, builder. reshape( mean, shape)), builder. pow( builder. add( builder. reshape( variance, shape), builder. constant( options. epsilon)), builder. constant( 0.5 )) ) ), builder. reshape( options. bias, shape) );
4.6.2.
6.6.2.
clamp
Clamp
the
input
tensor
element-wise
within
a
range
specified
by
the
minimum
and
maximum
values.
dictionary {MLClampOptions MLOperand ;minValue MLOperand ; };maxValue partial interface MLGraphBuilder {MLOperand (clamp MLOperand ,x optional MLClampOptions = {}); };options
-
x : an
MLOperand. The input tensor. -
options : an optional
MLClampOptions. The optional parameters of the operation.-
minValue : an
MLOperand. Specifies the minimum values of the range. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape of x according to [numpy-broadcasting-rule] . When it is not specified, the clamping is not performed on the lower limit of the range. -
maxValue : an
MLOperand. Specifies the maximum values of the range. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape of x according to [numpy-broadcasting-rule] . When it is not specified, the clamping is not performed on the upper limit of the range.
-
Returns:
an
MLOperand
.
The
output
tensor
of
the
same
shape
as
x
.
Clamp the input tensor element-wise within a range specified by minValue and maxValue . The calculation follows the expression min(max(x, minValue), maxValue). When minValue is not specified, the clamping is not performed on the lower limit. When maxValue is not specified, the clamping is not performed on the upper limit.
if ( options. minValue=== undefined ) { if ( options. maxValue=== undefined ) { return x; } else { return builder. min( x, options. maxValue); } } else { if ( options. maxValue=== undefined ) { return builder. max( x, options. minValue); } else { return builder. min( builder. max( x, options. minValue), options. maxValue); } }
4.6.3.
6.6.3.
concat
Concatenates
the
input
tensors
along
a
given
axis.
partial interface MLGraphBuilder {MLOperand (concat sequence <MLOperand >,inputs long ); };axis
-
inputs : a sequence of
MLOperand. All input tensors must have the same shape, except for the size of the dimension to concatenate on. -
axis : a
longscalar. The axis that the inputs concatenate along, with the value in the interval [0, N) where N is the rank of all the inputs.
Returns:
an
MLOperand
.
The
concatenated
tensor
of
all
the
inputs
along
the
axis
.
The
output
tensor
has
the
same
shape
except
on
the
dimension
that
all
the
inputs
concatenated
along.
The
size
of
that
dimension
is
computed
as
the
sum
of
all
the
input
sizes
of
the
same
dimension.
4.6.4.
6.6.4.
conv2d
Compute
a
2-D
convolution
given
4-D
input
and
filter
tensors
enum {MLFilterOperandLayout ,"oihw" ,"hwio" ,"ohwi" };"ihwo" enum {MLAutoPad ,"explicit" ,"same-upper" };"same-lower" dictionary {MLConv2dOptions sequence <long >;padding sequence <long >;strides sequence <long >;dilations sequence <long >;outputPadding sequence <long >;outputSizes MLAutoPad = "explicit";autoPad boolean =transpose false ;long = 1;groups MLInputOperandLayout = "nchw";inputLayout MLFilterOperandLayout = "oihw"; };filterLayout partial interface MLGraphBuilder {MLOperand (conv2d MLOperand ,input MLOperand ,filter optional MLConv2dOptions = {}); };options
-
input : an
MLOperand. The input 4-D tensor. The logical shape is interpreted according to the value of options.layout . -
filter : an
MLOperand. The filter 4-D tensor. The logical shape is interpreted according to the value of options.layout and options.groups . -
options : an optional
MLConv2dOptions. The optional parameters of the operation.-
padding : a sequence of
longof length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of input , [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0]. -
strides : a sequence of
longof length 2. The stride of the sliding window for each spatial dimension of input , [stride_height, stride_width]. If not present, the values are assumed to be [1,1]. -
dilations : a sequence of
longof length 2. The dilation factor for each spatial dimension of input , [dilation_height, dilation_width]. If not present, the values are assumed to be [1,1]. -
outputPadding : a sequence of
longof length 2. The padding values applied to each spatial dimension of the output tensor when options.transpose is set to true. This explicit padding values are needed to disambiguate the output tensor shape for transposed convolution when the value of the options.strides is greater than 1. Note that these values are only used to disambiguate output shape when needed; it does not necessarily cause any padding value to be written to the output tensor. If not specified, the values are assumed to be [0,0]. -
outputSizes : a sequence of
longof length 2. The sizes of the last two dimensions of the output tensor when options.transpose is set to true. When the output sizes are explicitly specified, the output padding values in options.outputPadding are ignored. If not specified, the output sizes are automatically computed. -
autoPad : an
MLAutoPad. The automatic input padding options. By default, this argument is set to "explicit" , which means that the values in the options.padding array should be used for input padding. When the option is set other than "explicit" , the values in the options.padding array are ignored. With the "same-upper" option, the padding values are automatically computed such that the additional ending padding of the spatial input dimensions would allow all of the input values in the corresponding dimension to be filtered. The "same-lower" option is similar but padding is applied to the beginning padding of the spatial input dimensions instead of the ending one. -
transpose : a
booleanindicating that a transposed convolution operation is performed. Transposed convolution is used in upsampling networks to increase the resolution of a feature as opposed to the typical convolution process that reduces the feature’s resolution. When transposed convolution is performed, options.outputPadding may be needed to disambiguate the output tensor shape. If not present, this option is assumed to be false. -
groups : a
longscalar. The number of groups that input channels and output channels are divided into, default to 1. -
inputLayout : an
MLInputOperandLayout. The default value is "nchw" . This option specifies the layout format of the input and output tensor as follow:"nchw":
-
input tensor: [batches, input_channels, height, width]
-
output tensor: [batches, output_channels, height, width]
"nhwc":
-
input tensor: [batches, height, width, input_channels]
-
output tensor: [batches, height, width, output_channels]
-
-
filterLayout : a
MLFilterOperandLayout. The default value is "oihw" . This option specifies the layout format of the filter tensor as follow:"oihw":
-
[output_channels, input_channels/groups, height, width]
"hwio":
-
[height, width, input_channels/groups, output_channels]
"ohwi":
-
[output_channels, height, width, input_channels/groups]
"ihwo":
-
[input_channels/groups, height, width, output_channels]
-
-
Returns:
an
MLOperand
.
The
output
4-D
tensor
that
contains
the
convolution
result.
The
output
shape
is
interpreted
according
to
the
options.layout
value.
More
specifically
the
sizes
of
the
last
two
dimensions
of
the
output
tensor,
the
spatial
dimensions,
for
the
convolution
operation
can
be
calculated
as
follow:
output size = 1 + (input size - filter size + beginning padding + ending padding) / stride
Whereas for the transposed convolution case with options.transpose set to true , unless the options.outputSizes values are explicitly specified, the options.outputPadding may be needed to compute the spatial dimension values of the output tensor as follow:
output size = (input size - 1) * stride + filter size - beginning padding - ending padding + output padding
4.6.5.
6.6.5.
element-wise
binary
operations
Compute
the
element-wise
binary
addition,
subtraction,
multiplication,
division,
maximum
and
minimum
of
the
two
input
tensors.
partial interface MLGraphBuilder {MLOperand (add MLOperand ,a MLOperand );b MLOperand (sub MLOperand ,a MLOperand );b MLOperand (mul MLOperand ,a MLOperand );b MLOperand (div MLOperand ,a MLOperand );b MLOperand (max MLOperand ,a MLOperand );b MLOperand (min MLOperand ,a MLOperand );b MLOperand (pow MLOperand ,a MLOperand ); };b
Returns:
an
MLOperand
.
The
output
tensor
that
contains
the
result
of
element-wise
binary
operation
of
the
two
input
tensors.
The element-wise binary operation will be broadcasted according to [numpy-broadcasting-rule] . The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.
Operation types:
-
add : Add the values of the two input tensors, element-wise.
-
sub : Subtract the values of the second input tensor from the values of the first input tensor, element-wise.
-
mul : Multiply the values of the two input tensors, element-wise.
-
div : Divide the values of the first input tensor with the values of the second tensor, element-wise.
-
max : Select the greater values of the two input tensors, element-wise.
-
min : Select the lesser values of the two input tensors, element-wise.
-
pow : Compute the values of the values of the first input tensor to the power of the values of the second input tensor, element-wise.
4.6.6.
6.6.6.
element-wise
unary
operations
Compute
the
element-wise
unary
operation
for
input
tensor.
partial interface MLGraphBuilder {MLOperand (abs MLOperand );x MLOperand (ceil MLOperand );x MLOperand (cos MLOperand );x MLOperand (exp MLOperand );x MLOperand (floor MLOperand );x MLOperand (log MLOperand );x MLOperand (neg MLOperand );x MLOperand (relu MLOperand );x MLOperand (sigmoid MLOperand );x MLOperand (sin MLOperand );x MLOperand (tan MLOperand );x MLOperand (tanh MLOperand ); };x
-
x : an
MLOperand. The input tensor.
Returns:
an
MLOperand
.
The
output
tensor
that
contains
the
result
of
element-wise
unary
operation
of
the
input
tensor.
The
shape
of
the
output
tensor
is
the
same
as
the
shape
of
input
tensor.
Operation types:
-
abs : Compute the absolute value of the input tensor, element-wise.
-
ceil : Compute the ceiling of the input tensor, element-wise.
-
cos : Compute the cosine of the input tensor, element-wise.
-
exp : Compute the exponential of the input tensor, element-wise.
-
floor : Compute the floor of the input tensor, element-wise.
-
log : Compute the natural logarithm of the input tensor, element-wise.
-
neg : Compute the numerical negative value of the input tensor, element-wise.
-
relu : Compute the rectified linear function of the input tensor, element-wise.
The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.return builder. max( builder. constant( 0 ), x); -
sigmoid : Compute the sigmoid function of the input tensor, element-wise.
-
sin : Compute the sine of the input tensor, element-wise.
-
tan : Compute the tangent of the input tensor, element-wise.
-
tanh : Compute the hyperbolic tangent of the input tensor, element-wise.
4.6.7.
6.6.7.
gemm
Calculate
the
general
matrix
multiplication
of
the
Basic
Linear
Algebra
Subprograms
.
The
calculation
follows
the
expression
alpha
*
A
*
B
+
beta
*
C
,
where
A
,
B
,
and
C
are
matrices,
and
A
and
B
may
optionally
be
transposed
prior
to
the
calculation.
dictionary {MLGemmOptions MLOperand ;c float = 1.0;alpha float = 1.0;beta boolean =aTranspose false ;boolean =bTranspose false ; };partial interface MLGraphBuilder {MLOperand (gemm MLOperand ,a MLOperand ,b optional MLGemmOptions = {}); };options
-
a : an
MLOperand. The first input 2-D tensor. -
b : an
MLOperand. The second input 2-D tensor. -
options : an optional
MLGemmOptions. The optional parameters of the operation.-
c : an
MLOperand. The third input 2-D tensor. -
alpha : a
floatscalar multiplier for the first input, default to 1.0. -
beta : a
floatscalar multiplier for the third input, default to 1.0. -
aTranspose : a
booleanindicating if the first input should be transposed prior to calculating the output, default to false. -
bTranspose : a
booleanindicating if the second input should be transposed prior to calculating the output, default to false.
-
Returns:
an
MLOperand
.
The
output
2-D
tensor
that
contains
the
calculated
product
of
all
the
inputs.
if ( options. aTranspose) a= builder. transpose( a); if ( options. bTranspose) b= builder. transpose( b); let ab= builder. matmul( builder. mul( builder. constant( options. alpha), a), b); return ( c? builder. add( ab, builder. mul( builder. constant( options. beta), c)) : ab);
4.6.8.
6.6.8.
gru
Gated
Recurrent
Unit
[GRU]
recurrent
network
using
an
update
gate
and
a
reset
gate
to
compute
the
hidden
state
that
rolls
into
the
output
across
the
temporal
sequence
of
the
Network
enum {MLRecurrentNetworkWeightLayout , // update-reset-new gate ordering"zrn" // reset-update-new gate ordering };"rzn" enum {MLRecurrentNetworkActivation ,"relu" ,"sigmoid" };"tanh" enum {MLRecurrentNetworkDirection ,"forward" ,"backward" };"both" dictionary {MLGruOptions MLOperand ;bias MLOperand ;recurrentBias MLOperand ;initialHiddenState boolean =resetAfter true ;boolean =returnSequence false ;MLRecurrentNetworkDirection = "forward";direction MLRecurrentNetworkWeightLayout = "zrn";layout sequence <MLRecurrentNetworkActivation >; };activations partial interface MLGraphBuilder {sequence <MLOperand >(gru MLOperand ,input MLOperand ,weight MLOperand ,recurrentWeight long ,steps long ,hiddenSize optional MLGruOptions = {}); };options
-
input : an
MLOperand. The input 3-D tensor of shape [steps, batch_size, input_size]. -
weight : an
MLOperand. The 3-D input weight tensor of shape [num_directions, 3 * hidden_size, input_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the layout argument. -
recurrentWeight : an
MLOperand. The 3-D recurrent weight tensor of shape [num_directions, 3 * hidden_size, hidden_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the layout argument. -
steps : a
longscalar. The number of time steps in the recurrent network. The value must be greater than 0. -
hiddenSize : a
longscalar. The value of the third dimension of the cell output tensor shape. It indicates the number of features in the hidden state. -
options : an optional
MLGruOptions. The optional parameters of the operation.-
bias : an
MLOperand. The 2-D input bias tensor of shape [num_directions, 3 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
recurrentBias : an
MLOperand. The 2-D recurrent bias tensor of shape [num_directions, 3 * hidden_size]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to the options.layout argument. -
initialHiddenState : an
MLOperand. The 3-D initial hidden state tensor of shape [num_directions, batch_size, hidden_size]. When not specified, it’s assumed to be a tensor filled with zero. -
resetAfter : a
booleanindicating whether to apply the reset gate after or before matrix multiplication. Default to true. -
returnSequence : a
booleanindicating whether to also return the entire sequence with every cell output from each time step in it in addition to the cell output of the last time step. Default to false. -
direction : a
MLRecurrentNetworkDirection. The processing direction of the input sequence. When set to "both" , the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. -
layout : a
MLRecurrentNetworkWeightLayout. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the update (z) , reset (r) , and new (n) gate, as indicated in the second dimension of the weight and bias tensor shape. When not specified, the default layout is "zrn" . -
activations : a sequence of
MLRecurrentNetworkActivation. A pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, it’s assumed to be the sigmoid ( "sigmoid" ) and the hyperbolic tangent ( "tanh" ) function respectively.
-
Returns:
a
sequence
of
MLOperand
.
The
first
element
of
the
sequence
is
a
3-D
tensor
of
shape
[num_directions,
batch_size,
hidden_size],
the
cell
output
from
the
last
time
step
of
the
network.
Additionally,
if
returnSequence
is
set
to
true,
the
second
element
is
the
4-D
output
tensor
of
shape
[steps,
num_directions,
batch_size,
hidden_size]
containing
every
cell
outputs
from
each
time
step
in
the
temporal
sequence.
const numDirections= ( options. direction== "both" ? 2 : 1 ); let hiddenState= options. initialHiddenState; if ( ! hiddenState) { const desc= { type: 'float32' , dimensions: [ numDirections, 1 , hiddenSize] }; const totalSize= numDirections* hiddenSize; hiddenState= builder. constant( desc, new Float32Array( totalSize). fill( 0 )); } let sequence= null ; let cellWeight= []; let cellRecurrentWeight= []; let cellBias= []; let cellRecurrentBias= []; for ( let slot= 0 ; slot< numDirections; ++ slot) { cellWeight. push( builder. squeeze( builder. slice( weight, [ slot, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); cellRecurrentWeight. push( builder. squeeze( builder. slice( recurrentWeight, [ slot, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); cellBias. push( options. bias? ( builder. squeeze( builder. slice( options. bias, [ slot, 0 ], [ 1 , - 1 ]), { axes: [ 0 ] })) : null ); cellRecurrentBias. push( options. recurrentBias? ( builder. squeeze( builder. slice( options. recurrentBias, [ slot, 0 ], [ 1 , - 1 ]), { axes: [ 0 ] })) : null ); } for ( let step= 0 ; step< steps; ++ step) { let cellHidden= []; let cellOutput= null ; for ( let slot= 0 ; slot< numDirections; ++ slot) { cellHidden. push( builder. squeeze( builder. slice( hiddenState, [ slot, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] })); } for ( let slot= 0 ; slot< numDirections; ++ slot) { let slice= ( slot== 1 || options. direction== "backward" ? steps- step- 1 : step); let cellInput= builder. squeeze( builder. slice( input, [ slice, 0 , 0 ], [ 1 , - 1 , - 1 ]), { axes: [ 0 ] }); let result= builder. reshape( builder. gruCell( cellInput, cellWeight[ slot], cellRecurrentWeight[ slot], cellHidden[ slot], hiddenSize, { bias: cellBias[ slot], recurrentBias: cellRecurrentBias[ slot], resetAfter: options. resetAfter, layout: options. layout, activations: options. activations}), [ 1 , - 1 , hiddenSize]); cellOutput= ( cellOutput? builder. concat([ cellOutput, result], 0 ) : result); } hiddenState= cellOutput; if ( options. returnSequence) { cellOutput= builder. reshape( cellOutput, [ 1 , numDirections, - 1 , hiddenSize]); sequence= ( sequence? builder. concat([ sequence, cellOutput], 0 ) : cellOutput); } } return ( sequence? [ hiddenState, sequence] : [ hiddenState]);
4.6.9.
6.6.9.
gruCell
A
single
time
step
of
the
Gated
Recurrent
Unit
[GRU]
recurrent
network
using
an
update
gate
and
a
reset
gate
to
compute
the
hidden
state
that
rolls
into
the
output
across
the
temporal
sequence
of
a
recurrent
network.
dictionary {MLGruCellOptions MLOperand ;bias MLOperand ;recurrentBias boolean =resetAfter true ;MLRecurrentNetworkWeightLayout = "zrn";layout sequence <MLRecurrentNetworkActivation >; };activations partial interface MLGraphBuilder {MLOperand (gruCell MLOperand ,input MLOperand ,weight MLOperand ,recurrentWeight MLOperand ,hiddenState long ,hiddenSize optional MLGruCellOptions = {}); };options
-
input : an
MLOperand. The input 2-D tensor of shape [batch_size, input_size]. -
weight : an
MLOperand. The 2-D input weight tensor of shape [3 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the layout argument. -
recurrentWeight : an
MLOperand. The 2-D recurrent weight tensor of shape [3 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the layout argument. -
hiddenState : an
MLOperand. The 2-D input hidden state tensor of shape [batch_size, hidden_size]. -
hiddenSize : a
longscalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. -
options : an optional
MLGruCellOptions. The optional parameters of the operation.-
bias : an
MLOperand. The 1-D input bias tensor of shape [3 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
recurrentBias : an
MLOperand. The 1-D recurrent bias tensor of shape [3 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the options.layout argument. -
resetAfter : a
booleanindicating whether to apply the reset gate after or before matrix multiplication. Default to true. -
layout : a
MLRecurrentNetworkWeightLayout. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the update (z) , reset (r) , and new (n) gate, as indicated in the first dimension of the weight and bias tensor shapes. When not specified, the default layout is "zrn" . -
activations : a sequence of
MLRecurrentNetworkActivation. A pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, it’s default to the sigmoid ( "sigmoid" ) and the hyperbolic tangent ( "tanh" ) function respectively.
-
Returns:
an
MLOperand
.
The
2-D
tensor
of
shape
[batch_size,
hidden_size],
the
cell
output
hidden
state
of
a
single
time
step
of
the
recurrent
network.
const one= builder. constant( 1 ); const zero= builder. constant( 0 ); // update gate let z= builder. sigmoid( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 0 ], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 0 ], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 0 , 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 0 , 0 ], [ hiddenSize, - 1 ])) ) ) ) ); // reset gate let r= builder. sigmoid( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ); // new gate let n; if ( resetAfter) { n= builder. tanh( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. mul( r, builder. add( ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero), builder. matmul( hiddenState, builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ) ) ); } else { n= builder. tanh( builder. add( builder. add( ( options. bias? builder. slice( options. bias, [ 2 * hiddenSize], [ hiddenSize]) : zero), ( options. recurrentBias? builder. slice( options. recurrentBias, [ 2 * hiddenSize], [ hiddenSize]) : zero) ), builder. add( builder. matmul( input, builder. transpose( builder. slice( weight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ), builder. matmul( builder. mul( r, hiddenState), builder. transpose( builder. slice( recurrentWeight, [ 2 * hiddenSize, 0 ], [ hiddenSize, - 1 ])) ) ) ) ); } // compute the new hidden state return builder. add( builder. mul( z, hiddenState), builder. mul( n, builder. sub( one, z)));
4.6.10.
6.6.10.
instanceNormalization
Normalize
the
input
features
using
[Instance-Normalization]
.
Unlike
dictionary {MLInstanceNormalizationOptions MLOperand ;scale MLOperand ;bias float = 1e-5;epsilon MLInputOperandLayout = "nchw"; };layout partial interface MLGraphBuilder {MLOperand (instanceNormalization MLOperand ,input optional MLInstanceNormalizationOptions = {}); };options
-
input : an
MLOperand. The input 4-D tensor. -
options : an optional
MLInstanceNormalizationOptions. The optional parameters of the operation.-
scale : an
MLOperand. The 1-D tensor of the scaling values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with nchw layout, the feature dimension is 1. -
bias : an
MLOperand. The 1-D tensor of the bias values whose length is equal to the size of the feature dimension of the input e.g. for the input tensor with nchw layout, the feature dimension is 1. -
epsilon : a
floatscalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified. -
layout : an
MLInputOperandLayout. This option specifies the layout format of the input. The default value is "nchw" .
-
Returns:
an
MLOperand
.
The
instance-normalized
4-D
tensor
of
the
same
shape
as
the
input
tensor.
// The mean reductions happen over the spatial dimensions of the input // e.g. axis 2 and 3 of the input tensor. const reduceOptions= { axes: [ 2 , 3 ], keepDimensions: true }; const mean= builder. reduceMean( input, reduceOptions); const variance= builder. reduceMean( builder. pow( builder. sub( input, mean), buider. constant( 2 )), reduceOptions); // The scale and bias values are applied per input feature // e.g. axis 1 of the input tensor. const shape= [ 1 , - 1 , 1 , 1 ]; return builder. add( builder. mul( builder. reshape( options. scale, shape), builder. div( builder. sub( input, mean), buidler. pow( builder. add( variance, options. epsilon), builder. constant( 0.5 )) ) ), builder. reshape( options. bias, shape) );
4.6.11.
6.6.11.
leakyRelu
dictionary {MLLeakyReluOptions float = 0.01; };alpha partial interface MLGraphBuilder {MLOperand (leakyRelu MLOperand ,x optional MLLeakyReluOptions = {}); };options
-
x : an
MLOperand. The input tensor. -
options : an optional
MLLeakyReluOptions. The optional parameters of the operation.-
alpha : a
floatscalar multiplier, default to 0.01.
-
Returns:
an
MLOperand
.
The
output
tensor
of
the
same
shape
as
x
.
Calculate
the
leaky
version
of
rectified
linear
function
on
the
input
tensor
element-wise.
The
calculation
follows
the
expression
max(0,
x)
+
alpha
∗
min(0,
x)
.
return builder. add( builder. max( builder. constant( 0 ), x), builder. mul( builder. constant( options. alpha), builder. min( builder. constant( 0 ), x)));
4.6.12.
6.6.12.
matmul
Compute
the
matrix
product
of
two
input
tensors.
partial interface MLGraphBuilder {MLOperand (matmul MLOperand ,a MLOperand ); };b
Returns:
an
MLOperand
.
The
output
N-D
tensor
that
contains
the
matrix
product
of
two
input
tensors.
Compute the matrix product of two input tensors. It behaves as following:
-
If both a and b are 2-D, they are multiplied like conventional matrices and produce a 2-D tensor as the output.
-
If either a or b is N-D, N > 2, it is treated as a stack of matrices with dimensions corresponding to the last two indices. The matrix multiplication will be broadcasted accordingly by following [numpy-broadcasting-rule] . The output is a N-D tensor whose rank is the maximum rank of the input tensors. For each dimension, except the last two, of the output tensor, its size is the maximum size along that dimension of the input tensors.
-
If a is 1-D, it is converted to a 2-D tensor by prepending a 1 to its dimensions.
-
If b is 1-D, it is converted to a 2-D tensor by by appending a 1 to its dimensions.
-
If both a and b are 1-D, the operation is a vector dot-product, which produces a scalar output.
4.6.13.
6.6.13.
pad
Inflate
the
tensor
with
constant
or
mirrored
values
on
the
edges.
enum {MLPaddingMode ,"constant" ,"edge" ,"reflection" };"symmetric" dictionary {MLPadOptions MLPaddingMode = "constant";mode float = 0; };value partial interface MLGraphBuilder {MLOperand (pad MLOperand ,input MLOperand ,padding optional MLPadOptions = {}); };options
-
input : an
MLOperand. The input tensor. -
padding : an
MLOperand. The 2-D Tensor of integer values indicating the number of padding values to add at the beginning and end of each input dimensions. The tensor has shape [ n , 2] where n is the rank of the input tensor. For each dimension D of input , padding[D, 0] indicates how many values to add before the content in that dimension, and padding[D, 1] indicates how many values to add after the content in that dimension. -
options : an optional
MLPadOptions. The optional parameters of the operation.-
mode : a
MLPaddingMode. The different ways to pad the tensor. When not set, it’s assumed to be "constant". -
value : a
float. The pad value when the options.mode is set to "constant" . When not set, it’s assumed to be 0.
-
Returns:
an
MLOperand
.
The
padded
output
tensor.
// input: [[1,2,3], [4,5,6]] const input= builder. constant( { type: 'float32' , dimensions: [ 2 , 3 ] }, new Float32Array([ 1 , 2 , 3 , 4 , 5 , 6 ])); // padding: [[1,1], [2,2]] const padding= builder. constant( { type: 'float32' , dimensions: [ 2 , 2 ] }, new Float32Array([ 1 , 1 , 2 , 2 ])); // "constant" padded: // [[0,0,0,0,0,0,0], // [0,0,1,2,3,0,0], // [0,0,4,5,6,0,0], // [0,0,0,0,0,0,0]] builder. pad( input, padding); // "edge" padded: // [[1,1,1,2,3,3,3], // [1,1,1,2,3,3,3], // [4,4,4,5,6,6,6], // [4,4,4,5,6,6,6]] builder. pad( input, padding, { mode: "edge" }); // "reflection" padded: // [[6,5,4,5,6,5,4], // [3,2,1,2,3,2,1], // [6,5,4,5,6,5,4], // [3,2,1,2,3,2,1]] builder. pad( input, padding, { mode: "reflection" }); // "symmetric" padded: // [[2,1,1,2,3,3,2], // [2,1,1,2,3,3,2], // [5,4,4,5,6,6,5], // [5,4,4,5,6,6,5]] builder. pad( input, padding, { mode: "symmetric" });
4.6.14.
6.6.14.
pooling
operations
Compute
a
mean
,
L2
norm
,
or
max
reduction
operation
across
all
the
elements
within
the
moving
window
over
the
input
tensor.
See
the
description
of
each
type
of
reduction
in
dictionary {MLPool2dOptions sequence <long >;windowDimensions sequence <long >;padding sequence <long >;strides sequence <long >;dilations MLAutoPad = "explicit";autoPad MLInputOperandLayout = "nchw"; };layout partial interface MLGraphBuilder {MLOperand (averagePool2d MLOperand ,input optional MLPool2dOptions = {});options MLOperand (l2Pool2d MLOperand ,input optional MLPool2dOptions = {});options MLOperand (maxPool2d MLOperand ,input optional MLPool2dOptions = {}); };options
-
input : an
MLOperand. The input 4-D tensor. The logical shape is interpreted according to the value of options.layout . -
options : an optional
MLPool2dOptions. The optional parameters of the operation.-
windowDimensions : a sequence of
longof length 2. The dimensions of the sliding window, [window_height, window_width]. If not present, the window dimensions are assumed to be the height and width dimensions of the input shape. -
padding : a sequence of
longof length 4. The additional rows and columns added to the beginning and ending of each spatial dimension of input , [beginning_height, ending_height, beginning_width, ending_width]. If not present, the values are assumed to be [0,0,0,0]. -
strides : a sequence of
longof length 2. The stride of the sliding window for each spatial dimension of input , [stride_height, stride_width]. If not present, the values are assumed to be [1,1]. -
dilations : a sequence of
longof length 2. The dilation factor for each spatial dimension of input , [dilation_height, dilation_width]. If not present, the values are assumed to be [1,1]. -
autoPad : an
MLAutoPad. The automatic input padding options. By default, this argument is set to "explicit" , which means that the values in the options.padding array should be used for input padding. When the option is set other than "explicit" , the values in the options.padding array are ignored. With the "same-upper" option, the padding values are automatically computed such that the additional ending padding of the spatial input dimensions would allow all of the input values in the corresponding dimension to be filtered. The "same-lower" option is similar but padding is applied to the beginning padding of the spatial input dimensions instead of the ending one. -
layout : an
MLInputOperandLayout. The default value is "nchw" . This option specifies the layout format of the input and output tensor as follow:"nchw":
-
input tensor: [batches, channels, height, width]
-
output tensor: [batches, channels, height, width]
"nhwc":
-
input tensor: [batches, height, width, channels]
-
output tensor: [batches, height, width, channels]
-
-
Returns:
an
MLOperand
.
The
output
4-D
tensor
that
contains
the
result
of
the
reduction.
The
logical
shape
is
interpreted
according
to
the
value
of
layout
.
// 'global' max pooling builder. maxPool2d( input);
4.6.15.
6.6.15.
reduction
operations
Reduce
the
input
along
the
dimensions
given
in
axes
.
dictionary {MLReduceOptions sequence <long >=axes null ;boolean =keepDimensions false ; };partial interface MLGraphBuilder {MLOperand (reduceL1 MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceL2 MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceLogSum MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceLogSumExp MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceMax MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceMean MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceMin MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceProduct MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceSum MLOperand ,input optional MLReduceOptions = {});options MLOperand (reduceSumSquare MLOperand ,input optional MLReduceOptions = {}); };options
-
input : an
MLOperand. The input tensor. -
options : an optional
MLReduceOptions. The optional parameters of the operation.
Returns:
an
MLOperand
.
The
reduced
output
tensor.
Reduction types:
-
L1 : Compute the L1 norm of all the input values along the axes.
-
L2 : Compute the L2 norm of all the input values along the axes.
-
LogSum : Compute the log value of the sum of all the input values along the axes.
-
LogSumExp : Compute the log value of the sum of the exponent of all the input values along the axes.
-
Max : Compute the maximum value of all the input values along the axes.
-
Mean : Compute the average value of all the input values along the axes.
-
Min : Compute the minimum value of all the input values along the axes.
-
Product : Compute the product of all the input values along the axes.
-
Sum : Compute the sum of all the input values along the axes.
-
SumSquare : Compute the sum of the square of all the input values along the axes.
4.6.16.
6.6.16.
resample
Resample
the
tensor
values
from
the
source
to
the
destination
dimensions
according
to
the
scaling
factors.
enum {MLInterpolationMode ,"nearest-neighbor" };"linear" dictionary {MLResampleOptions MLInterpolationMode = "nearest-neighbor";mode sequence <float >;scales sequence <long >; };sizes partial interface MLGraphBuilder {MLOperand (resample MLOperand ,input optional MLResampleOptions = {}); };options
-
input : an
MLOperand. The input 4-D tensor. -
options : an optional
MLResampleOptions. The optional parameters of the operation.-
mode : an
MLInterpolationMode. The interpolation algorithm used to fill the output tensor values. If not set, it is assumed to be the Nearest Neighbor interpolation. -
scales : a sequence of
floatof length 4. Each value represents the scaling factor used to scale in each input dimensions. -
sizes : a sequence of
longof length 4. The target sizes for each input dimensions. When the target sizes are specified, the options.scales argument is ignored as the scaling factor values are derived from the target sizes of each input dimension.
-
Returns:
an
MLOperand
.
The
output
4-D
tensor.
4.6.17.
6.6.17.
reshape
Alter
the
shape
of
a
tensor
to
a
new
shape.
Reshape
does
not
copy
or
change
the
content
of
the
tensor.
It
just
changes
the
tensor’s
logical
dimensions
for
the
subsequent
operations.
partial interface MLGraphBuilder {MLOperand (reshape MLOperand ,input sequence <long >); };newShape
-
input : an
MLOperand. The input tensor. -
newShape : a sequence of
long. The shape of the output tensor. The number of elements implied by newShape must be the same as the number of elements in the input tensor. Only one component of newShape can be the special value of -1. The size of the dimension with the value -1 is computed so that the total size remains constant.
Returns:
an
MLOperand
.
The
output
tensor.
The
values
of
the
output
tensor
are
the
same
as
values
of
the
input
tensor.
The
shape
of
the
output
tensor
is
specified
by
the
newShape
argument.
4.6.18.
6.6.18.
slice
Produce
a
slice
of
the
input
tensor.
dictionary {MLSliceOptions sequence <long >; };axes partial interface MLGraphBuilder {MLOperand (slice MLOperand ,input sequence <long >,starts sequence <long >,sizes optional MLSliceOptions = {}); };options
-
input : an
MLOperand. The input tensor. -
starts : a sequence of
long. The starting indices to slice of the corresponding axes of the input shape. A negative index value is interpreted as counting back from the end. For example, the value -1 -
sizes : a sequence of
long. The lengths to slice of the corresponding axes of the input shape. The length value of -1 selects all the remaining elements from the starting index of the given axis. -
options : an optional
MLSliceOptions. The optional parameters of the operation.-
axes : a sequence of
long. The dimensions of the input shape to which starts and sizes apply. The values in the sequence are either within the [0, r -1] range where r is the input tensor rank, or the [ -r , -1] range where negative values mean counting back from the end of the input shape. When not specified, the sequence is assumed to be [0,1,.. r-1 ].
-
Returns:
an
MLOperand
.
The
output
tensor
of
the
same
rank
as
the
input
tensor
with
tensor
values
stripped
to
the
specified
starting
and
ending
indices
in
each
dimension.
4.6.19.
6.6.19.
softmax
Compute
the
softmax
values
of
the
2-D
input
tensor
along
axis
1.
partial interface MLGraphBuilder {MLOperand (softmax MLOperand ); };x
-
x : an
MLOperand. The input 2-D tensor.
Returns:
an
MLOperand
.
The
output
2-D
tensor
that
contains
the
softmax
results,
of
the
same
shape
as
the
input
tensor.
// This sample deploys a well-known implementation trick [1] to compute the // exponentials of the distances to the max value, instead of the exponentials // of the input values itself, in order to increase the numerical stability of // the result. // [1]: https://cs231n.github.io/linear-classify/#softmax const max_x= builder. reduceMax( x, { axes: [ 1 ], keepDimensions: true }); const exp_x= builder. exp( builder. sub( x, max)); return builder. div( exp_x, builder. reduceSum( exp_x, { axes: [ 1 ], keepDimensions: true }));
4.6.20.
6.6.20.
split
Split
the
input
tensor
into
a
number
of
sub
tensors
along
the
given
axis.
dictionary {MLSplitOptions long = 0; };axis partial interface MLGraphBuilder {sequence <MLOperand >(split MLOperand , (input unsigned long or sequence <unsigned long >),splits optional MLSplitOptions = {}); };options
-
input : an
MLOperand. The input tensor. -
splits : an
unsigned longor a sequence ofunsigned long. If anunsigned long, it specifies the number of output tensors along the axis. The number must evenly divide the dimension size of input along options.axis . If a sequence ofunsigned long, it specifies the sizes of each output tensor along the options.axis . The sum of sizes must equal to the dimension size of input along options.axis . -
options : an optional
MLSplitOptions. The optional parameters of the operation.-
axis : a
long. The dimension along which to split. Default to 0. A negative value is interpreted as counting back from the end.
-
Returns:
a
sequence
of
MLOperand
.
The
splitted
output
tensors.
If
splits
is
an
unsigned
long
,
the
length
of
the
output
sequence
equals
to
splits
.
The
shape
of
each
output
tensor
is
the
same
as
input
except
the
dimension
size
of
axis
equals
to
the
quotient
of
dividing
the
dimension
size
of
input
along
axis
by
splits
.
If
splits
is
a
sequence
of
unsigned
long
,
the
length
of
the
output
sequence
equals
to
the
length
of
splits
.
The
shape
of
the
i-th
output
tensor
is
the
same
as
as
input
except
along
axis
where
the
dimension
size
is
splits[i]
.
// This sample shows the case that the splits parameter is an array. const outputs= []; let start= 0 ; for ( const sizeof splits) { outputs. push( builder. slice( input, [ start], [ size], { axis: [ options. axis] })); start+= size; } return outputs;
4.6.21.
6.6.21.
squeeze
Reduce
the
rank
of
a
tensor
by
eliminating
dimensions
with
size
1
of
the
tensor
shape.
Squeeze
only
affects
the
tensor’s
logical
dimensions.
It
does
not
copy
or
change
the
content
in
the
tensor.
dictionary {MLSqueezeOptions sequence <long >; };axes partial interface MLGraphBuilder {MLOperand (squeeze MLOperand ,input optional MLSqueezeOptions = {}); };options
-
input : an
MLOperand. The input tensor. -
options : an optional
MLSqueezeOptions. The optional parameters of the operation.-
axes : a sequence of
long. Indices to the shape dimensions of size 1 to eliminate. When not specified, every shape dimensions of size 1 in the tensor are eliminated.
-
Returns:
an
MLOperand
.
The
output
tensor
of
the
same
or
reduced
rank
with
the
shape
dimensions
of
size
1
eliminated.
4.6.22.
6.6.22.
transpose
Permute
the
dimensions
of
the
input
tensor
according
to
the
permutation
argument.
dictionary {MLTransposeOptions sequence <long >; };permutation partial interface MLGraphBuilder {MLOperand (transpose MLOperand ,input optional MLTransposeOptions = {}); };options
-
input : an
MLOperand. The input N-D tensor. -
options : an optional
MLTransposeOptions. The optional parameters of the operation.-
permutation : a sequence of
longvalues. The values used to permute the output shape. When it’s not specified, it’s set to[N-1...0], whereNis the rank of the input tensor. These default values cause the output to become a transposed tensor of the input. When specified, the number of values in the sequence must be the same as the rank of the input tensor, and the values in the sequence must be within the range from 0 to N-1 with no two or more same values found in the sequence.
-
Returns:
an
MLOperand
.
The
permuted
or
transposed
N-D
tensor.
4.7.
6.7.
MLGraph
The
MLGraph
interface
represents
a
compiled
computational
graph.
A
compiled
graph
once
constructed
is
immutable
and
cannot
be
subsequently
changed.
dictionary {MLInput required (MLBufferView or WebGLTexture or GPUTexture );data sequence <long >; };dimensions dictionary { (MLOutput MLBufferView or WebGLTexture or GPUTexture );data sequence <long >; };dimensions typedef record <DOMString ,MLInput >;MLNamedInputs typedef record <DOMString ,MLOutput >; [MLNamedOutputs SecureContext ,Exposed =(Window ,DedicatedWorker )]interface {MLGraph Promise <MLNamedOutputs >compute (MLNamedInputs inputs ,optional MLNamedOutputs outputs = {}); };
MLGraph
has
the
following
internal
slots:
-
[[context]]of typeMLContext -
The context of type
MLContextassociated with thisMLGraph. -
[[inputOperands]]of type record <DOMString,MLOperandDescriptor> -
Maps the name of an input
MLOperandto itsMLOperandDescriptorfor all inputMLOperands of thisMLGraph. -
[[outputOperands]]of type sequence <DOMString> -
Contains the names of all output
MLOperands of thisMLGraph. -
[[implementation]] -
The underlying implemenation provided by the User Agent.
-
compute(inputs, outputs) -
Issue a compute request of the
MLGraphgivenMLNamedInputsand optionalMLNamedOutputs. The returnedPromiseresolves when the results inMLNamedOutputsare ready to be consumed.Called on:MLGraphthis .Arguments:
Arguments for the MLGraph.compute(inputs, outputs) method. Parameter Type Nullable Optional Description inputsMLNamedInputs ✘ ✘ a MLNamedInputs. The data and optional dimensions of inputs for the compute request.outputsMLNamedOutputs ✘ ✔ an optional MLNamedOutputs. The names and pre-allocated resources of required outputs for the compute request. Default to be an empty record which means that the compute request is for all outputs.Returns:
Promise<MLNamedOutputs>. The dimensions and data of outputs returned by the compute request.-
Let promise be a new promise .
-
If any of the following requirements are unmet, then reject promise with a
TypeErrorand stop.-
For each key -> value of inputs :
-
this .
[[inputOperands]][ key ] must exist. -
Let inputOperand be this .
[[inputOperands]][ key ]. -
If value .
datais anArrayBufferView, then:-
The kind of value .
datamust be compatible to inputOperand .typeaccording to this table .
-
-
If value .
dimensionswas given, then:-
The length of value .
dimensionsmust be the same as the length of inputOperand .dimensions. -
Let i be 0.
-
While true:
-
Let dimension be value .
dimensions[ i ]. -
dimension must be greater than 0.
-
If inputOperand .
dimensions[ i ] is greater than 0, then dimension must be equal to inputOperand .dimensions[ i ]. -
Set i to i + 1.
-
If i if equal to the length of value .
dimensions, then break.
-
-
-
Else:
-
For each dimension of inputOperand .
dimensions:-
The value of dimension must be greater than 0.
-
-
-
-
If outputs was not an empty record , then:
-
For each key -> value of outputs :
-
this .
[[outputOperands]][ key ] must exist. -
If value .
datawas given, then the kind of value .datamust be compatible to this .[[outputOperands]][ key ] according to this table .
-
-
-
-
Let requiredOutputNames be a new ordered set <
DOMString>. -
If outputs was not an empty record , then:
-
For each key -> value of outputs :
-
Append key to requiredOutputNames .
-
-
-
Else:
-
For each key -> value of this .
[[outputOperands]]:-
Append key to requiredOutputNames .
-
-
-
Let copiedInputs be a new
MLNamedInputs. -
For each key -> value of inputs :
-
Let copiedInputs be a new
MLInput. -
Let copiedInputs .
databe a newArrayBufferViewthat has the same kind and length as value .data's. -
Set the content of copiedInputs .
datato the content of value .data. -
Let copiedInputs .
dimensionsbe a new sequence <long> that has the same length of value .dimensions's. -
Set the content of copiedInputs .
dimensionsto the content of value .dimensions. -
Set copiedInputs [key] to copiedInputs .
-
-
Let results be a new
MLNamedOutputs. -
Let remainingOutputNames be a new ordered set <
DOMString>. -
Set the content of remainingOutputNames to the content of requiredOutputNames .
-
Issue the following steps on the Device timeline of this .
[[implementation]]:-
For each outputName of requiredOutputNames :
-
Issue a compute request of this .
[[implementation]]for output whose name is outputName with given copiedInputs . -
When the compute request is completed, issue the following steps on the appropriate Queue timeline :
-
If there is an error returned by this .
[[implementation]], then:-
reject promise with an
OperationErrorand stop.
-
-
Else:
-
Let outputRank be a
unsigned long. -
Set outputRank to the rank of output tensor returned by this .
[[implementation]]. -
Let outputDemisions be a new sequence <
long> of size outputRank . -
Let i be 0.
-
Let outputSize to 1.
-
While true:
-
Set outputDimensions [ i ] to the dimension at i th axis of output tensor returned by this .
[[implementation]]. -
Set outputSize to outputSize * outputDimensions [ i ].
-
Set i to i + 1.
-
If i is equal to outputRank , then break.
-
-
Set results [ outputName ].
dimensionsto outputDemisions . -
If this .
[[context]]is created fromMLContextOptions, then:-
If outputs [ outputName ].
datawas given, then:-
If outputs|[ outputName ].
datais not anArrayBufferView, then reject promise with anTypeErrorand stop. -
If the kind of outputs [ outputName ].
datais not compatible to output tensor according to this table , then reject promise with aTypeErrorand stop. -
If the length of outputs [ outputName ].
datais less than outputSize , then reject promise with aTypeErrorand stop. -
Set the content of outputs [ outputName ].
datato the content of output tensor returned by this .[[implementation]].
-
-
Else:
-
Let results [ outputName ].
databe a newArrayBufferViewof size outputSize and kind that is compatible to output tensor according to this table . -
Set the content of results [ outputName ].
datato the content of output tensor returned by this .[[implementation]].
-
-
-
Remove outputName from remainingOutputNames .
-
If remainingOutputNames is empty, then resolve promise with results and stop.
-
-
-
-
-
Return promise .
Describe the algorithm steps for this .
[[context]]created fromWebGLRenderingContextandGPUDevice. -
4.7.1.
6.7.1.
Examples
const context= navigator. ml. createContext(); // Create a graph with dynamic shaped inputs. const builder= new MLGraphBuilder( context); const descA= { type: 'float32' , dimensions: [ - 1 , 4 ]}; const a= builder. input( 'a' , descA); const descB= { type: 'float32' , dimensions: [ 4 , - 1 ]}; const b= builder. input( 'b' , descB); const c= builder. matmul( a, b); const graph= await builder. build({ c}); async function compute( shapeA, shapeB) { const bufferA= new Float32Array( sizeOfShape( shapeA)). fill( 0.5 ); const bufferB= new Float32Array( sizeOfShape( shapeB)). fill( 0.5 ); // Specify the shape of inputs when computing. const inputs= { 'a' : { data: bufferA, dimensions: shapeA}, 'b' : { data: bufferB, dimensions: shapeB}, }; const outputs= await graph. compute( inputs); console. log( `shape: [ ${ outputs. c. dimensions} ], values: ${ outputs. c. data} ` ); } await compute([ 3 , 4 ], [ 4 , 3 ]); await compute([ 4 , 4 ], [ 4 , 4 ]); await compute([ 5 , 4 ], [ 4 , 5 ]);
const context= navigator. ml. createContext(); // The following code multiplies matrix a of shape [3, 4] with matrix b of shape [4, 3] // into matrix c of shape [3, 3]. const builder= new MLGraphBuilder( context); const descA= { type: 'float32' , dimensions: [ 3 , 4 ]}; const a= builder. input( 'a' , descA); const descB= { type: 'float32' , dimensions: [ 4 , 3 ]}; const bufferB= new Float32Array( sizeOfShape( descB. dimensions)). fill( 0.5 ); const b= builder. constant( descB, bufferB); const c= builder. matmul( a, b); const graph= await builder. build({ c}); const bufferA= new Float32Array( sizeOfShape( descA. dimensions)). fill( 0.5 ); const inputs= { 'a' : { data: bufferA}}; // Pre-allocate output buffer for c. const outputs= { 'c' : { data: new Float32Array( sizeOfShape([ 3 , 3 ]))}}; await graph. compute( inputs, outputs); console. log( `values: ${ outputs. c. data} ` );
const context= navigator. ml. createContext(); // Build a graph with two outputs. const builder= new MLGraphBuilder( context); const descA= { type: 'float32' , dimensions: [ 3 , 4 ]}; const a= builder. input( 'a' , descA); const descB= { type: 'float32' , dimensions: [ 4 , 3 ]}; const bufferB= new Float32Array( sizeOfShape( descB. dimensions)). fill( 0.5 ); const b= builder. constant( descB, bufferB); const descC= { type: 'float32' , dimensions: [ 3 , 3 ]}; const bufferC= new Float32Array( sizeOfShape( descC. dimensions)). fill( 1 ); const c= builder. constant( descC, bufferC); const d= builder. matmul( a, b); const e= builder. add( d, c); const graph= await builder. build({ d, e}); const bufferA= new Float32Array( sizeOfShape( descA. dimensions)). fill( 0.5 ); const inputs= { 'a' : { data: bufferA}}; // Compute both d and e. let outputs= await graph. compute( inputs); console. log( `outputs include ${ Object. keys( outputs) } ` ); // Compute d. outputs= await graph. compute( inputs, { d}); console. log( `outputs include ${ Object. keys( outputs) } ` ); console. log( `shape: [ ${ outputs. d. dimensions} ], values: ${ outputs. d. data} ` ); // Compute e. outputs= await graph. compute( inputs, { e}); console. log( `outputs include ${ Object. keys( outputs) } ` ); console. log( `shape: [ ${ outputs. e. dimensions} ], values: ${ outputs. e. data} ` );
5.
7.
Examples
const context= navigator. ml. createContext({ powerPreference: 'low-power' });
constant1 ---+
+--- Add ---> intermediateOutput1 ---+
input1 ---+ |
+--- Mul---> output
constant2 ---+ |
+--- Add ---> intermediateOutput2 ---+
input2 ---+
// Use tensors in 4 dimensions. const TENSOR_DIMS= [ 1 , 2 , 2 , 2 ]; const TENSOR_SIZE= 8 ; const builder= new MLGraphBuilder( context); // Create MLOperandDescriptor object. const desc= { type: 'float32' , dimensions: TENSOR_DIMS}; // constant1 is a constant MLOperand with the value 0.5. const constantBuffer1= new Float32Array( TENSOR_SIZE). fill( 0.5 ); const constant1= builder. constant( desc, constantBuffer1); // input1 is one of the input MLOperands. Its value will be set before execution. const input1= builder. input( 'input1' , desc); // constant2 is another constant MLOperand with the value 0.5. const constantBuffer2= new Float32Array( TENSOR_SIZE). fill( 0.5 ); const constant2= builder. constant( desc, constantBuffer2); // input2 is another input MLOperand. Its value will be set before execution. const input2= builder. input( 'input2' , desc); // intermediateOutput1 is the output of the first Add operation. const intermediateOutput1= builder. add( constant1, input1); // intermediateOutput2 is the output of the second Add operation. const intermediateOutput2= builder. add( constant2, input2); // output is the output MLOperand of the Mul operation. const output= builder. mul( intermediateOutput1, intermediateOutput2);
// Compile the constructed graph. const graph= await builder. build({ 'output' : output});
// Setup the input buffers with value 1. const inputBuffer1= new Float32Array( TENSOR_SIZE). fill( 1 ); const inputBuffer2= new Float32Array( TENSOR_SIZE). fill( 1 ); // Asynchronously execute the compiled graph with the specified inputs. const inputs= { 'input1' : { data: inputBuffer1}, 'input2' : { data: inputBuffer2}, }; const outputs= await graph. compute( inputs); // Log the shape and computed result of the output operand. console. log( 'Output shape: ' + outputs. output. dimensions); // Output shape: 1,2,2,2 console. log( 'Output value: ' + outputs. output. data); // Output value: 2.25,2.25,2.25,2.25,2.25,2.25,2.25,2.25
6.
8.
Appendices
6.1.
8.1.
MLOperandType
and
ArrayBufferView
compatibility
MLOperandType
|
ArrayBufferView
|
|---|---|
float32
|
Float32Array
|
int32
|
Int32Array
|
uint32
|
Uint32Array
|
int8
|
Int8Array
|
uint8
|
Uint8Array
|
clarify
the
usage
of
ArrayBufferView
for
float16
.
<https://github.com/webmachinelearning/webnn/issues/127>
7.
9.
Acknowledgements
This specification follows the concepts of the Android Neural Networks API C API.
Thanks to Tomoyuki Shimizu, Ningxin Hu, Zhiqiang Yu and Belem Zhang for the use cases.
Thanks to Nikhil Thorat, Daniel Smilkov, Ganesan Ramalingam, Rafael Cintron and Benjamin Poulain for their contributions to the API specification.