1. Introduction
For now, see the explainer .
2. The translator API
partial interface AI {readonly attribute AITranslatorFactory translator ; }; [Exposed =(Window ,Worker ),SecureContext ]interface {
AITranslatorFactory Promise <AITranslator >create (AITranslatorCreateOptions );
options Promise <AIAvailability >availability (AITranslatorCreateCoreOptions ); }; [
options Exposed =(Window ,Worker ),SecureContext ]interface {
AITranslator Promise <DOMString >translate (DOMString ,
input optional AITranslatorTranslateOptions = {} );
options ReadableStream translateStreaming (DOMString ,
input optional AITranslatorTranslateOptions = {} );
options readonly attribute DOMString sourceLanguage ;readonly attribute DOMString targetLanguage ; };AITranslator includes AIDestroyable ;dictionary {
AITranslatorCreateCoreOptions required DOMString ;
sourceLanguage required DOMString ; };
targetLanguage dictionary :
AITranslatorCreateOptions AITranslatorCreateCoreOptions {AbortSignal ;
signal AICreateMonitorCallback ; };
monitor dictionary {
AITranslatorTranslateOptions AbortSignal ; };
signal
Every
AI
has
a
translator
factory
,
an
AITranslatorFactory
object.
Upon
creation
of
the
AI
object,
its
translator
factory
must
be
set
to
a
new
AITranslatorFactory
object
created
in
the
AI
object’s
relevant
realm
.
The
translator
getter
steps
are
to
return
this
’s
translator
factory
.
2.1. Creation
create(
options
)
method
steps
are:
-
If this ’s relevant global object is a
Window
whose associated Document is not fully active , then return a promise rejected with an "InvalidStateError
"DOMException
. -
If options ["
signal
"] exists and is aborted , then return a promise rejected with options ["signal
"]'s abort reason . -
Validate and canonicalize translator options given options .
This can mutate options .
-
Return the result of creating an AI model object given this ’s relevant realm , options , compute translator options availability , download the translation model , initialize the translation model , and create a translator object .
AITranslatorCreateCoreOptions
options
,
perform
the
following
steps.
They
mutate
options
in
place
to
canonicalize
language
tags,
and
throw
a
TypeError
if
any
are
invalid.
-
Validate and canonicalize language tags given options and "
sourceLanguage
". -
Validate and canonicalize language tags given options and "
targetLanguage
".
AITranslatorCreateCoreOptions
options
:
-
Assert : these steps are running in parallel .
-
Initiate the download process for everything the user agent needs to translate text from options ["
sourceLanguage
"] to options ["targetLanguage
"].This could include both a base translation model and specific language arc material, or perhaps material for multiple language arcs if an intermediate language is used.
-
If the download process cannot be started for any reason, then return false.
-
Return true.
AITranslatorCreateCoreOptions
options
:
-
Assert : these steps are running in parallel .
-
Perform any necessary initialization operations for the AI model backing the user agent’s capabilities for translating from options ["
sourceLanguage
"] to options ["targetLanguage
"].This could include loading the model into memory, or loading any fine-tunings necessary to support the specific options in question.
-
If initialization failed for any reason, then return false.
-
Return true.
AITranslatorCreateCoreOptions
options
:
-
Assert : these steps are running on realm ’s surrounding agent ’s event loop .
-
Return a new
AITranslator
object, created in realm , with- source language
-
options ["
sourceLanguage
"] - target language
-
options ["
targetLanguage
"]
2.2. Availability
availability(
options
)
method
steps
are:
-
If this ’s relevant global object is a
Window
whose associated Document is not fully active , then return a promise rejected with an "InvalidStateError
"DOMException
. -
Validate and canonicalize translator options given options .
-
Let promise be a new promise created in this ’s relevant realm .
-
-
Let availability be the result of computing translator options availability given options .
-
Queue a global task on the AI task source given this ’s relevant global object to perform the following steps:
-
If availability is null, then reject promise with an "
UnknownError
"DOMException
. -
Otherwise, resolve promise with availability .
-
-
AITranslatorCreateCoreOptions
options
,
perform
the
following
steps.
They
return
either
an
AIAvailability
value
or
null,
and
they
mutate
options
in
place
to
update
language
tags
to
their
best-fit
matches.
-
Assert : this algorithm is running in parallel .
-
Let availabilities be the user agent’s translator language arc availabilities .
-
If availabilities is null, then return null.
-
For each languageArc → availability in availabilities :
-
Let sourceLanguageBestFit be LookupMatchingLocaleByBestFit (« languageArc ’s source language », « options ["
sourceLanguage
"] »). -
Let targetLanguageBestFit be LookupMatchingLocaleByBestFit (« languageArc ’s target language », « options ["
targetLanguage
"] »). -
If sourceLanguageBestFit and targetLanguageBestFit are both not undefined, then:
-
Set options ["
sourceLanguage
"] to sourceLanguageBestFit .[[locale]]. -
Set options ["
targetLanguage
"] to targetLanguageBestFit .[[locale]]. -
Return availability .
-
-
-
If ( options ["
sourceLanguage
"], options ["targetLanguage
"]) can be fulfilled by the identity translation , then return "available
".Such cases could also return "
downloadable
", "downloading
", or "available
" because of the above steps, if the user agent has specific entries in its translator language arc availabilities for the given language arc. However, the identity translation is always available, so this step ensures that we never return "unavailable
" for such cases.One language arc that can be fulfilled by the identity translation is (
"en-US"
,"en-GB"
). It is conceivable that an implementation might support a specialized model for this translation, which would show up in the translator language arc availabilities .On the other hand, it’s pretty unlikely that an implementation has any specialized model for the language arc ("
en-x-asdf
", "en-x-xyzw
"). In such a case, this step takes over, and later calls to the translate algorithm will use the identity translation.Note that when this step takes over, options ["
sourceLanguage
"] and options ["targetLanguage
"] are not modified, so if this algorithm is being called fromcreate()
, that means the resultingAITranslator
object’ssourceLanguage
andtargetLanguage
properties will return the original inputs, and not some canonicalized form. -
Return "
unavailable
".
A language arc is a tuple of two strings, a source language and a target language . Each item is a Unicode canonicalized locale identifier .
AIAvailability
values,
or
null.
-
Assert : this algorithm is running in parallel .
-
If there is some error attempting to determine what language arcs the user agent supports translating text between, which the user agent believes to be transient (such that re-querying the translator language arc availabilities could stop producing such an error), then return null.
-
Return a map from language arcs to
AIAvailability
values, where each key is a language arc that the user agent supports translating text between, filled according to the following constraints:-
If the user agent supports translating text from the source language to the target language of the language arc without performing any downloading operations, then the map must contain an entry whose key is that language arc and whose value is "
available
". -
If the user agent supports translating text from the source language to the target language of the language arc , but only after finishing a currently-ongoing download, then the map must contain an entry whose key is that language arc and whose value is "
downloading
". -
If the user agent supports translating text from the source language to the target language of the language arc , but only after performing a not-currently ongoing download, then the map must contain an entry whose key is that language arc and whose value is "
downloadable
". -
The keys must not include any language arcs that overlap with the other keys .
-
-
("
en
", "zh-Hans
") → "available
" -
("
en
", "zh-Hant
") → "downloadable
"
The
use
of
LookupMatchingLocaleByBestFit
means
that
availability()
will
probably
give
the
following
answers:
function a( sourceLanguage, targetLanguage) { return ai. translator. availability({ sourceLanguage, targetLanguage}) : } await a( "en" , "zh-Hans" ) === "available" ; await a( "en" , "zh-Hant" ) === "downloadable" ; await a( "en" , "zh" ) === "available" ; // zh will best-fit to zh-Hans await a( "en" , "zh-TW" ) === "downloadable" ; // zh-TW will best-fit to zh-Hant await a( "en" , "zh-HK" ) === "available" ; // zh-HK will best-fit to zh-Hans await a( "en" , "zh-CN" ) === "available" ; // zh-CN will best-fit to zh-Hans await a( "en-US" , "zh-Hant" ) === "downloadable" ; // en-US will best-fit to en await a( "en-GB" , "zh-Hant" ) === "downloadable" ; // en-GB will best-fit to en // Even very unexpected subtags will best-fit to en or zh-Hans await a( "en-Braille-x-lolcat" , "zh-Hant" ) === "downloadable" ; await a( "en" , "zh-BR-Kana" ) === "available" ;
-
Let sourceLanguages be the set composed of the source languages of each item in otherArcs .
-
If LookupMatchingLocaleByBestFit ( sourceLanguages , « arc ’s source language ») is not undefined, then return true.
-
Let targetLanguages be the set composed of the target languages of each item in otherArcs .
-
If LookupMatchingLocaleByBestFit ( targetLanguages , « arc ’s target language ») is not undefined, then return true.
-
Return false.
en
",
"
fr
")
overlaps
with
«
("
en
",
"
fr-CA
")
»,
so
the
user
agent’s
translator
language
arc
availabilities
cannot
contain
both
of
these
language
arcs
at
the
same
time.
Instead,
a
typical
user
agent
will
either
support
only
one
English-to-French
language
arc
(presumably
("
en
",
"
fr
")),
or
it
could
support
multiple
non-overlapping
English-to-French
language
arcs,
such
as
("
en
",
"
fr-FR
"),
("
en
",
"
fr-CA
"),
and
("
en
",
"
fr-CH
").
In
the
latter
case,
if
the
web
developer
requested
to
create
a
translator
using
ai
,
the
LookupMatchingLocaleByBestFit
algorithm
would
choose
one
of
the
three
possible
language
arcs
to
use
(presumably
("
en
",
"
fr-FR
")).
-
If LookupMatchingLocaleByBestFit (« arc ’s source language », « arc ’s target language ») is not undefined, then return true.
-
If LookupMatchingLocaleByBestFit (« arc ’s target language », « arc ’s source language ») is not undefined, then return true.
-
Return false.
2.3.
The
AITranslator
class
Every
AITranslator
has
a
source
language
,
a
string
,
set
during
creation.
Every
AITranslator
has
a
target
language
,
a
string
,
set
during
creation.
The
sourceLanguage
getter
steps
are
to
return
this
’s
source
language
.
The
targetLanguage
getter
steps
are
to
return
this
’s
target
language
.
translate(
input
,
options
)
method
steps
are:
-
Let operation be an algorithm step which takes arguments chunkProduced , done , error , and stopProducing , and translates input given this ’s source language , this ’s target language , chunkProduced , done , error , and stopProducing .
-
Return the result of getting an aggregated AI model result given this , options , and operation .
translateStreaming(
input
,
options
)
method
steps
are:
-
Let operation be an algorithm step which takes arguments chunkProduced , done , error , and stopProducing , and translates input given this ’s source language , this ’s target language , chunkProduced , done , error , and stopProducing .
-
Return the result of getting a streaming AI model result given this , options , and operation .
2.4. Translation
2.4.1. The algorithm
-
a string input ,
-
a Unicode canonicalized locale identifier sourceLanguage ,
-
a Unicode canonicalized locale identifier targetLanguage ,
-
an algorithm chunkProduced that takes a string and returns nothing,
-
an algorithm done that takes no arguments and returns nothing,
-
an algorithm error that takes error information and returns nothing, and
-
an algorithm stopProducing that takes no arguments and returns a boolean,
perform the following steps:
-
Assert : this algorithm is running in parallel .
-
In an implementation-defined manner, subject to the following guidelines, begin the processs of translating input from sourceLanguage into targetLanguage .
If input is the empty string, or otherwise consists of no translatable content (e.g., only contains whitespace, or control characters), then the resulting translation should be input . In such cases, sourceLanguage and targetLanguage should be ignored.
If ( sourceLanguage , targetLanguage ) can be fulfilled by the identity translation , then the resulting translation should be input .
-
While true:
-
Wait for the next chunk of translated text to be produced, for the translation process to finish, or for the result of calling stopProducing to become true.
-
If such a chunk is successfully produced:
-
Let it be represented as a string chunk .
-
Perform chunkProduced given chunk .
-
-
Otherwise, if the translation process has finished:
-
Perform done .
-
Break .
-
-
Otherwise, if stopProducing returns true, then break .
-
Otherwise, if an error occurred during translation:
-
Let the error be represented as error information errorInfo according to the guidance in § 2.4.2 Errors .
-
Perform error given errorInfo .
-
Break .
-
-
2.4.2. Errors
When
translation
fails,
the
following
possible
reasons
may
be
surfaced
to
the
web
developer.
This
table
lists
the
possible
DOMException
names
and
the
cases
in
which
an
implementation
should
use
them:
DOMException
name
| Scenarios |
---|---|
"
NotAllowedError
"
|
Translation is disabled by user choice or user agent policy. |
"
NotReadableError
"
|
The translation output was filtered by the user agent, e.g., because it was detected to be harmful, inaccurate, or nonsensical. |
"
QuotaExceededError
"
|
The input to be translated was too large for the user agent to handle. |
"
UnknownError
"
|
All other scenarios, or if the user agent would prefer not to disclose the failure reason. |
This
table
does
not
give
the
complete
list
of
exceptions
that
can
be
surfaced
by
translator.translate()
and
translator.translateStreaming()
.
It
only
contains
those
which
can
come
from
the
implementation-defined
translate
algorithm.
3. The language detector API
partial interface AI {readonly attribute AILanguageDetectorFactory languageDetector ; }; [Exposed =(Window ,Worker ),SecureContext ]interface {
AILanguageDetectorFactory Promise <AILanguageDetector >create (optional AILanguageDetectorCreateOptions = {} );
options Promise <AIAvailability >availability (optional AILanguageDetectorCreateCoreOptions = {} ); }; [
options Exposed =(Window ,Worker ),SecureContext ]interface {
AILanguageDetector Promise <sequence <LanguageDetectionResult >>detect (DOMString ,
input optional AILanguageDetectorDetectOptions = {} );
options readonly attribute FrozenArray <DOMString >?expectedInputLanguages ;undefined (); };
destroy dictionary {
AILanguageDetectorCreateCoreOptions sequence <DOMString >; };
expectedInputLanguages dictionary :
AILanguageDetectorCreateOptions AILanguageDetectorCreateCoreOptions {AbortSignal ;
signal AICreateMonitorCallback ; };
monitor dictionary {
AILanguageDetectorDetectOptions AbortSignal ; };
signal dictionary {
LanguageDetectionResult DOMString ;
detectedLanguage double ; };
confidence
Every
AI
has
a
language
detector
factory
,
an
AILanguageDetector
object.
Upon
creation
of
the
AI
object,
its
language
detector
factory
must
be
set
to
a
new
AILanguageDetectorFactory
object
created
in
the
AI
object’s
relevant
realm
.
The
languageDetector
getter
steps
are
to
return
this
’s
language
detector
factory
.
3.1. Creation
create(
options
)
method
steps
are:
If this ’s relevant global object is a
Window
whose associated Document is not fully active , then return a promise rejected with an "InvalidStateError
"DOMException
.If options ["
signal
"] exists and is aborted , then return a promise rejected with options ["signal
"]'s abort reason .Validate and canonicalize language detector options given options .
This can mutate options .
Return the result of creating an AI model object given this ’s relevant realm , options , compute language detector options availability , download the language detector model , initialize the language detector model , and create the language detector object .
AILanguageDetectorCreateCoreOptions
options
,
perform
the
following
steps.
They
mutate
options
in
place
to
canonicalize
language
tags,
and
throw
a
TypeError
if
any
are
invalid.
Validate and canonicalize language tags given options and "
expectedInputLanguages
".
AILanguageDetectorCreateCoreOptions
options
:Assert : these steps are running in parallel .
Initiate the download process for everything the user agent needs to detect the languages of input text, including all the languages in options ["
expectedInputLanguages
"].This could include both a base language detection model, and specific fine-tunings or other material to help with the languages identified in options ["
expectedInputLanguages
"].If the download process cannot be started for any reason, then return false.
Return true.
AILanguageDetectorCreateCoreOptions
options
:Assert : these steps are running in parallel .
Perform any necessary initialization operations for the AI model backing the user agent’s capabilities for detecting the languages of input text.
This could include loading the model into memory, or loading any fine-tunings necessary to support the languages identified in options ["
expectedInputLanguages
"].If initialization failed for any reason, then return false.
Return true.
AILanguageDetectorCreateCoreOptions
options
:Assert : these steps are running on realm ’s surrounding agent ’s event loop .
Return a new
AILanguageDetector
object, created in realm , with- expected input languages
the result of creating a frozen array given options ["
expectedInputLanguages
"] if it is not empty ; otherwise null
3.2. Availability
availability(
options
)
method
steps
are:
If this ’s relevant global object is a
Window
whose associated Document is not fully active , then return a promise rejected with an "InvalidStateError
"DOMException
.Validate and canonicalize language detector options given options .
Let promise be a new promise created in this ’s relevant realm .
Let availability be the result of computing language detector options availability given options .
Queue a global task on the AI task source given this ’s relevant global object to perform the following steps:
If availability is null, then reject promise with an "
UnknownError
"DOMException
.Otherwise, resolve promise with availability .
AILanguageDetectorCreateCoreOptions
options
,
perform
the
following
steps.
They
return
either
an
AIAvailability
value
or
null,
and
they
mutate
options
in
place
to
update
language
tags
to
their
best-fit
matches.
Assert : this algorithm is running in parallel .
If there is some error attempting to determine what languages the user agent supports detecting, which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null.
Let availabilities be the result of getting language availabilities given the purpose of detecting text written in that language.
Let availability be "
available
".For each language in options ["
expectedInputLanguages
"]:For each availabilityToCheck in « "
available
", "downloading
", "downloadable
" »:Let languagesWithThisAvailability be availabilities [ availabilityToCheck ].
Let bestMatch be LookupMatchingLocaleByBestFit ( languagesWithThisAvailability , « language »).
If bestMatch is not undefined, then:
Replace language with bestMatch .[[locale]] in options ["
expectedInputLanguages
"].Set availability to the minimum availability given availability and availabilityToCheck .
Break .
Return "
unavailable
".
Return availability .
3.3.
The
AILanguageDetector
class
Every
AILanguageDetector
has
an
expected
input
languages
,
a
or
null,
set
during
creation.
FrozenArray
<
DOMString
>
The
expectedInputLanguages
getter
steps
are
to
return
this
’s
expected
input
languages
.
detect(
input
,
options
)
method
steps
are:
If this ’s relevant global object is a
Window
whose associated Document is not fully active , then return a promise rejected with an "InvalidStateError
"DOMException
.Let signals be « this ’s destruction abort controller ’s signal ».
If options ["
signal
"] exists , then append it to signals .Let compositeSignal be the result of creating a dependent abort signal given signals using
AbortSignal
and this ’s relevant realm .If compositeSignal is aborted , then return a promise rejected with compositeSignal ’s abort reason .
Let abortedDuringOperation be false.
This variable will be written to from the event loop , but read from in parallel .
Add the following abort steps to compositeSignal :
Set abortedDuringOperation to true.
Let promise be a new promise created in this ’s relevant realm .
Let stopProducing be the following steps:
Return abortedDuringOperation .
Let result be the result of detecting languages given input and stopProducing .
Queue a global task on the AI task source given this ’s relevant global object to perform the following steps:
If abortedDuringOperation is true, then reject promise with compositeSignal ’s abort reason .
Otherwise, if result is an error information , then reject promise with the result of creating a
DOMException
with name given by errorInfo ’s error name , using errorInfo ’s error information to populate the message appropriately.Otherwise:
Assert : result is a list of
LanguageDetectionResult
dictionaries. (It is not null, since in that case abortedDuringOperation would have been true.)Resolve promise with result .
3.3.1. The algorithm
LanguageDetectionResult
dictionaries.
Assert : this algorithm is running in parallel .
Let availabilities be the result of getting language availabilities given the purpose of detecting text written in that language.
Let currentlyAvailableLanguages be availabilities ["
available
"].In an implementation-defined manner, subject to the following guidelines, let rawResult and unknown be the result of detecting the languages of input .
rawResult must be a map which has a key for each language in currentlyAvailableLanguages . The value for each such key must be a number between 0 and 1. This value must represent the implementation’s confidence that input is written in that language.
unknown must be a number between 0 and 1 that represents the implementation’s confidence that input is not written in any of the languages in currentlyAvailableLanguages .
The values of rawResult , plus unknown , must sum to 1. Each such value, or unknown , may be 0.
If the implementation believes input to be written in multiple languages, then it should attempt to apportion the values of rawResult and unknown such that they are proportionate to the amount of input written in each detected language. The exact scheme for apportioning input is implementation-defined .
If input is "
tacosを食べる
", the implementation might split this into "tacos
" and "を食べる
", and then detect the languages of each separately. The first part might be detected as English with confidence 0.5 and Spanish with confidence 0.5, and the second part as Japanese with confidence 1. The resulting rawResult then might be «[ "en
" → 0.25, "es
" → 0.25, "ja
" → 0.5 ]» (with unknown set to 0).The decision to split this into two parts, instead of e.g. the three parts "
tacos
", "を
", and "食べる
", was an implementation-defined choice. Similarly, the decision to treat each part as contributing to "half" of the result, instead of e.g. weighting by number of code points , was implementation-defined .(Realistically, we expect that implementations will split on larger chunks than this, as generally more than 4-5 code points are necessary for most language detection models.)
If stopProducing returns true at any point during this process, then return null.
If an error occurred during language detection, then return an error information according to the guidance in § 3.3.2 Errors .
Sort in descending order rawResult with a less than algorithm which given entries a and b , returns true if a ’s value is less than b ’s value .
Let results be an empty list .
Let cumulativeConfidence be 0.
For each key → value of rawResult :
If value is 0, then break .
If value is less than unknown , then break .
Append «[ "
detectedLanguage
" → key , "confidence
" → value ]» to results .Set cumulativeConfidence to cumulativeConfidence + value .
If cumulativeConfidence is greater than or equal to 0.99, then break .
Assert : 1 − cumulativeConfidence is greater than or equal to unknown .
Append «[ "
detectedLanguage
" → "und
", "confidence
" → 1 − cumulativeConfidence ]» to results .Return results .
The
post-processing
of
rawResult
and
unknown
essentially
consolidates
all
languages
below
a
certain
threshold
into
the
"
und
"
language.
Languages
which
are
less
than
1%
likely,
or
contribute
to
less
than
1%
of
the
text,
are
considered
more
likely
to
be
noise
than
to
be
worth
detecting.
Similarly,
if
the
implementation
is
less
sure
about
a
language
than
it
is
about
the
text
not
being
in
any
of
the
languages
it
knows,
that
language
is
probably
not
worth
returning
to
the
web
developer.
3.3.2. Errors
When
language
detection
fails,
the
following
possible
reasons
may
be
surfaced
to
the
web
developer.
This
table
lists
the
possible
DOMException
names
and
the
cases
in
which
an
implementation
should
use
them:
DOMException
name
| Scenarios |
---|---|
"
NotAllowedError
"
| Language detection is disabled by user choice or user agent policy. |
"
QuotaExceededError
"
| The input to be detected was too large for the user agent to handle. |
"
UnknownError
"
| All other scenarios, or if the user agent would prefer not to disclose the failure reason. |
This
table
does
not
give
the
complete
list
of
exceptions
that
can
be
surfaced
by
detector.detect()
.
It
only
contains
those
which
can
come
from
the
implementation-defined
detect
languages
algorithm.