File API

Editor’s Draft,

This version:
https://w3c.github.io/FileAPI/
Latest published version:
https://www.w3.org/TR/FileAPI/
Previous Versions:
Issue Tracking:
GitHub
Inline In Spec
Editor:
( Google )
Former Editor:
Arun Ranganathan ( Mozilla Corporation )
Tests:
web-platform-tests FileAPI/ ( ongoing work )
Not Ready For Implementation

This spec is not yet ready for implementation. It exists in this repository to record the ideas and promote discussion.

Before attempting to implement this spec, please contact the editors.


Abstract

This specification provides an API for representing file objects in web applications, as well as programmatically selecting them and accessing their data. This includes:

Additionally, this specification defines objects to be used within threaded web applications for the synchronous reading of files.

§ 10 Requirements and Use Cases covers the motivation behind this specification.

This API is designed to be used in conjunction with other APIs and elements on the web platform, notably: XMLHttpRequest (e.g. with an overloaded send() method for File or Blob arguments), postMessage() , DataTransfer (part of the drag and drop API defined in [HTML] ) and Web Workers. Additionally, it should be possible to programmatically obtain a list of files from the input element when it is in the File Upload state [HTML] . These kinds of behaviors are defined in the appropriate affiliated specifications.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document was published by the Web Applications Working Group as an Editors Draft. This document is intended to become a W3C Recommendation.

Previous discussion of this specification has taken place on two other mailing lists: public-webapps@w3.org ( archive ) and public-webapi@w3.org ( archive ). Ongoing discussion will be on the public-webapps@w3.org mailing list.

This draft consists of changes made to the previous Last Call Working Draft. Please send comments to the public-webapi@w3.org as described above. You can see Last Call Feedback on the W3C Wiki: https://www.w3.org/wiki/Webapps/LCWD-FileAPI-20130912

An implementation report is automatically generated from the test suite.

This document was published by the Web Applications Working Group as a Working Draft. Feedback and comments on this specification are welcome. Please use GitHub issues Historical discussions can be found in the public-webapps@w3.org archives .

Publication as an Editors Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .

This document is governed by the 1 March 2019 W3C Process Document .

1. Introduction

This section is informative.

Web applications should have the ability to manipulate as wide as possible a range of user input, including files that a user may wish to upload to a remote server or manipulate inside a rich web application. This specification defines the basic representations for files, lists of files, errors raised by access to files, and programmatic ways to read files. Additionally, this specification also defines an interface that represents "raw data" which can be asynchronously processed on the main thread of conforming user agents. The interfaces and API defined in this specification can be used with other interfaces and APIs exposed to the web platform.

The File interface represents file data typically obtained from the underlying file system, and the Blob interface ("Binary Large Object" - a name originally introduced to web APIs in Google Gears ) represents immutable raw data. File or Blob reads should happen asynchronously on the main thread, with an optional synchronous API used within threaded web applications. An asynchronous API for reading files prevents blocking and UI "freezing" on a user agent’s main thread. This specification defines an asynchronous API based on an event model to read and access a File or Blob ’s data. A FileReader object provides asynchronous read methods to access that file’s data through event handler content attributes and the firing of events. The use of events and event handlers allows separate code blocks the ability to monitor the progress of the read (which is particularly useful for remote drives or mounted drives, where file access performance may vary from local drives) and error conditions that may arise during reading of a file. An example will be illustrative.

In the example below, different code blocks handle progress, error, and success conditions.
function startRead() {
  // obtain input element through DOM
  var file = document.getElementById('file').files[0];
  if(file){
    getAsText(file);
  }
}
function getAsText(readFile) {
  var reader = new FileReader();
  // Read file into memory as UTF-16
  reader.readAsText(readFile, "UTF-16");
  // Handle progress, success, and errors
  reader.onprogress = updateProgress;
  reader.onload = loaded;
  reader.onerror = errorHandler;
}
function updateProgress(evt) {
  if (evt.lengthComputable) {
    // evt.loaded and evt.total are ProgressEvent properties
    var loaded = (evt.loaded / evt.total);
    if (loaded < 1) {
      // Increase the prog bar length
      // style.width = (loaded * 200) + "px";
    }
  }
}
function loaded(evt) {
  // Obtain the read file data
  var fileString = evt.target.result;
  // Handle UTF-16 file dump
  if(utils.regexp.isChinese(fileString)) {
    //Chinese Characters + Name validation
  }
  else {
    // run other charset test
  }
  // xhr.send(fileString)
}
function errorHandler(evt) {
  if(evt.target.error.name == "NotReadableError") {
    // The file could not be read
  }
}

2. Terminology and Algorithms

When this specification says to terminate an algorithm the user agent must terminate the algorithm after finishing the step it is on. Asynchronous read methods defined in this specification may return before the algorithm in question is terminated, and can be terminated by an abort() call.

The algorithms and steps in this specification use the following mathematical operations:

The term Unix Epoch is used in this specification to refer to the time 00:00:00 UTC on January 1 1970 (or 1970-01-01T00:00:00Z ISO 8601); this is the same time that is conceptually " 0 " in ECMA-262 [ECMA-262] .

3. The Blob Interface and Binary Data

A
[Exposed=(Window,Worker), Serializable]
interface Blob {
  constructor(optional sequence<BlobPart> blobParts,
              optional BlobPropertyBag options = {});
Blob

  readonly attribute unsigned long long size;
  readonly attribute DOMString type;
object
refers
to
a
byte
sequence,
and
has
a

  // slice Blob into byte-ranged chunks
  Blob slice(optional [Clamp] long long start,            optional [Clamp] long long end,            optional DOMString contentType);
size

  // read from the Blob.
  [NewObject] ReadableStream stream();
  [NewObject] Promise<USVString> text();
  [NewObject] Promise<ArrayBuffer> arrayBuffer();
};

attribute
which
is
the
total
number
of
bytes
in
the
byte
sequence,
and
a

enum EndingType { "transparent", "native" };
dictionary BlobPropertyBag {
  DOMString type = "";
  EndingType endings = "transparent";
};
typedef (BufferSource or Blob or USVString) BlobPart;

A type Blob attribute, which is has an associated [[type]] internal slot, an ASCII-encoded string in lower case representing the media type of the byte sequence.

Each A Blob must have has an associated [[data]] internal snapshot state , which must be initially set to the state of the underlying storage, if any such underlying storage exists. Further normative definition of snapshot state can be found for File s. slot, a blob data description .

] { , = {}); ; ; // slice Blob into byte-ranged chunks , , ); // read from the Blob. [(); [(); [(); }; }; { = ""; = "transparent"; }; ;
Blob objects are serializable objects . Their serialization steps , (the blob serialization steps ), given value , serialized and serialized forStorage , are:
  1. If forStorage is true:

    1. Let bytes be the result of reading all bytes from value .

    2. Set serialized .[[BlobData]] to the result of creating blob data from bytes given bytes .

    In at least Chrome’s IndexedDB implementation, this copying of the data of blobs is only done when a transaction is committed (and failure to read the blob will cause the commit to fail).

  2. Otherwise:

    1. Set serialized .[[SnapshotState]] .[[BlobData]] to value ’s snapshot state . [[data]] .

  3. Set serialized .[[ByteSequence]] .[[Type]] to value ’s underlying byte sequence. . [[type]] .

Their deserialization step , (the blob deserialization steps ), given serialized and value , are:
  1. Set value ’s snapshot state . [[data]] to serialized .[[SnapshotState]]. .[[BlobData]].

  2. Set value ’s underlying byte sequence . [[type]] to serialized .[[ByteSequence]]. .[[Type]].

The actual storage API serialized was persisted in will need a way of modifying the read algorithms for deserialized blobs. I.e. a Blob that was deserialized from IndexedDB should start throwing in its read steps after clear-site-data clears all IndexedDB data. Somehow let StructuredDeserialize pass along a hook from the storage API to here? <https://github.com/w3c/webappsec-clear-site-data/issues/49>

A Blob blob has an associated get stream algorithm, which runs these steps:
  1. Let stream be the result of constructing a ReadableStream object.

  2. Run the following steps in parallel :

    1. While not all bytes of Let blob data have been read: be blob . [[data]] .

    2. Let bytes read state be the byte sequence result of calling blob data ’s read initialization algorithm given blob data ’s snapshot state and 0. If that results from reading a chunk threw an exception, error from stream with that exception and abort these steps.

    3. While true:

      1. Let read result be the result of calling blob data ’s read algorithm given read state .

      2. If a that threw an exception (a file read error occured while reading bytes , ), error stream with a that failure reason and abort these steps.

      3. If read result is end of blob , break .

      4. Enqueue a Uint8Array object wrapping an ArrayBuffer containing bytes read result into stream . If that threw an exception, error stream with that exception and abort these steps.

      We need to specify more concretely what reading from a Blob actually does, what possible errors can happen, perhaps something about chunk sizes, etc. <https://github.com/w3c/FileAPI/issues/144>

  3. Return stream .

3.1. Constructors Concepts

The Blob() constructor can be invoked with zero or more parameters. When the Blob() constructor is invoked, user agents must run the following steps:

If invoked with zero parameters, return The data represented by a new Blob object is described by a blob data description , consisting of 0 bytes, some representation of the data, combined with size a set of algorithms to 0, return the actual data as a series of byte sequences .

A blob data description has an associated size , a number specifying the total number of bytes in the byte sequence represented by the blob.

A blob data description has an associated snapshot state . This is a map that represents the data stored in the blob.

A blob data description has an associated read initialization algorithm . This algorithm takes two arguments: the snapshot state, and with a byte offset. It returns a struct which will be used as input for the read algorithm .

A blob data description has an associated read algorithm . This algorithm takes one argument (the struct returned by the read initialization algorithm ). It returns either a byte sequence or the special end of blob value.

To read all bytes from a type Blob set to the blob , run these steps:
  1. Let bytes be an empty string. byte sequence .

  2. Let bytes blob data be blob . [[data]] .

  3. Let read state be the result of processing calling blob parts data ’s read initialization algorithm given blobParts blob data ’s snapshot state and options . 0.

  4. If the type member of the options argument is not the empty string, run the following sub-steps: While true:

    1. Let t read result be the type dictionary member. If result of calling t blob data contains any characters outside the range U+0020 to U+007E, then set ’s read algorithm given t read state .

    2. If read result to the empty string and return from these substeps. is end of blob , break .

    3. Convert every character in Append t read result to ASCII lowercase . bytes .

  5. Return bytes .

Conceptually a Blob object referring to bytes as its associated byte sequence, with its represents a snapshot of some amount of data, and is frozen in time at the time a size Blob set to the length instance is created. As such for any specific instance of bytes , and its a type Blob set , every invocation of read all bytes should either return the exact same byte sequence , or throw an exception. Additionally the returned byte sequence 's length should be equal to the value of t from blob [[data]] 's size .

Note: This is a non-trivial requirement to implement for user agents, especially when a blob is backed by a file on disk (i.e. was created by the substeps above. create a file backed File object algorithm). User agents can use modification time stamps and other mechanisms to maintain this requirement, but this is left as an implementation detail.

3.1.1. Constructor Parameters Byte Sequence backed blobs

The Blob() snapshot state constructor can be invoked with the parameters below: A for a byte sequence backed blob contains a blobParts "data" member, a byte sequence which takes any number of .

To create blob data from bytes given bytes (a byte sequence ), run the following types of elements, and in any order: steps:
  1. Let blob data be a new blob data description .

  2. Set blob data . snapshot state [ BufferSource "data" elements. ] to bytes .

  3. Blob Set blob data . read initialization algorithm elements. to the bytes blob read initialization steps .

  4. USVString Set blob data . read algorithm elements. to the bytes blob read steps .

  5. Return blob data .

A bytes blob read state is a struct An optional BlobPropertyBag conssting of bytes (a byte sequence which takes these optional members: ).

The bytes blob read initialization steps , given a snapshot state and offset are:
  1. Let read state be a new bytes blob read state .

  2. If offset is larger than snapshot state [ type , the ASCII-encoded string in lower case representing the media type of the Blob "data" . Normative conditions for this member are provided in the § 3.1 Constructors ]'s length , set read state . bytes to an empty byte sequence .

  3. Otherwise, set read state . bytes to a copy of snapshot state [ "data" endings ] with the first offset bytes removed.

  4. Return read state .

The bytes blob read steps , given read state (a bytes blob read state ) are:
  1. Let result be read state . bytes .

  2. Set read state . bytes to an enum which can take the values empty byte sequence .

  3. If result is empty, return end of blob .

  4. Return result .

"transparent" 3.1.2. or Multipart blobs "native" . By default this is set to

Blobs created by the "transparent" Blob . If set to and "native" File , line endings will constructors are made up of multiple parts, where each part could be converted to native in any USVString a blob itself.

The snapshot state for a multipart blob contains a "parts" elements in blobParts member, a list . of blob data descriptions .

To process blob parts given a sequence of BlobPart 's parts blobParts and BlobPropertyBag options , run the following steps:
  1. Let size be 0.

  2. Let parts be an empty list .

  3. Let bytes be an empty byte sequence of bytes. .

  4. For each element in parts blobParts :

    1. If element is a USVString , run the following substeps:

      1. Let s be element .

      2. If the endings member of options is "native" , set s to the result of converting line endings to native of element .

      3. Append the result of UTF-8 encoding s to bytes .

        Note: The algorithm from WebIDL [WebIDL] replaces unmatched surrogates in an invalid utf-16 string with U+FFFD replacement characters. Scenarios exist when the Blob constructor may result in some data loss due to lost or scrambled character sequences.

    2. If element is a BufferSource , get a copy of the bytes held by the buffer source , and append those bytes to bytes .

    3. If element is a Blob , append :

      1. If bytes is not empty:

        1. Let part be the result of creating blob data from bytes it represents given bytes .

        2. Append part to parts .

        3. Set size to size + part . size .

        4. Set bytes to an empty byte sequence .

      2. Let part be element . [[data]] .

      3. Append part to parts .

      4. Set size to size + part . size .

      Note: The type of the Blob array element is ignored and will not affect type of returned Blob object.

  5. If bytes is not empty:

    1. Let part be the result of creating blob data from bytes given bytes .

    2. Append part to parts .

    3. Set size to size + part . size .

  6. Let result be a blob data description .

  7. Set result . size to size .

  8. Set result . snapshot state [ "parts" ] to parts .

  9. Set result . read initialization algorithm to the multipart blob read initialization steps .

  10. Set result . read algorithm to the multipart blob read steps .

  11. Return result .

A multipart blob read state is a struct consisting of:

parts

A queue of blob data descriptions , representing the not yet read parts of the blob.

offset

A number, representing the byte offset in the remaining blob parts from which to start returning data.

nested blob data

undefined or a blob data description . This is undefined unless otherwise specified.

nested read state

undefined or a struct , representing the read state for a nested read operation. This is undefined unless otherwise specified.

The multipart blob read initialization steps , given a snapshot state and offset are:
  1. Let read state be a new multipart blob read state .

  2. Set read state . parts to a clone of snapshot state [ "parts" ].

  3. Set read state . offset to offset .

  4. Return read state .

The multipart blob read steps , given read state (a multipart blob read state ) are:
  1. Let result be end of blob .

  2. While result is end of blob :

    1. If read state . nested read state is not undefined :

      1. Assert : read state . offset is 0.

      2. Set result to the result of calling read state . nested blob data 's read algorithm given read state . nested read state .

      3. If result is end of blob :

        1. Set read state . nested read state to undefined .

        2. Set read state . nested blob data to undefined .

    2. Otherwise:

      1. If read state . parts is empty:

        1. Return end of blob .

      2. Let current part be the result of dequeueing from read state . parts .

      3. If read state . offset >= current part . size :

        1. Set read state . offset to read state . offset - current part . size .

        2. Continue .

      4. Set read state . nested blob data to current part .

      5. Set read state . nested read state to the result of calling current part ’s read initialization algorithm given current part ’s snapshot state and read state . offset .

      6. Set read state . offset to 0.

  3. Return result .

3.1.3. Sliced blobs

Blobs created by the slice() method are also known as sliced blobs.

The snapshot state for a sliced blob contains a "offset" member (a number), a "span" member (a number), and a "source" member (a blob data description ).

A sliced blob read state is a struct consisting of:

source

A blob data description , representing the blob that was sliced.

bytes remaining

A number, representing the remaining number of bytes to be returned.

nested read state

A struct , representing the read state of the nested read operation.

The sliced blob read initialization steps , given a snapshot state and offset are:
  1. Let read state be a new sliced blob read state .

  2. Set read state . source to snapshot state [ "source" ].

  3. Set read state . bytes remaining to snapshot state [ "span" ] - offset .

  4. Let read offset be snapshot state [ "offset" ] + offset .

  5. Set read state . nested read state to the result of calling read state . source 's read initialization algorithm given read state . source 's snapshot state and read offset .

  6. Return read state .

The sliced blob read steps , given read state (a sliced blob read state ) are:
  1. If read state . bytes remaining <= 0:

    1. Return end of blob .

  2. Let result be the result of calling read state . source 's read algorithm given read state . nested read state .

  3. If result is not end of blob :

    1. If result ’s length is larger than read state . bytes remaining :

      1. Truncate result to be read state . bytes remaining bytes long.

    2. Set read state . bytes remaining to read state . bytes remaining - result ’s length .

  4. Return result .

3.2. Constructors

The new Blob( blobParts , options ) constructor steps are:
  1. Let blob data be the result of processing blob parts given blobParts and options .

  2. Set this . [[data]] to blob data .

  3. Let type be an empty string.

  4. If the type member of the options argument is not the empty string, run the following sub-steps:

    1. Let type be the type dictionary member. If type contains any characters outside the range U+0020 to U+007E, then set type to the empty string and return from these substeps.

    2. Convert every character in type to ASCII lowercase .

  5. Set this . [[type]] to type .

To convert line endings to native in a string s , run the following steps:
  1. Let native line ending be be the code point U+000A LF.

  2. If the underlying platform’s conventions are to represent newlines as a carriage return and line feed sequence, set native line ending to the code point U+000D CR followed by the code point U+000A LF.

  3. Set result to the empty string .

  4. Let position be a position variable for s , initially pointing at the start of s .

  5. Let token be the result of collecting a sequence of code points that are not equal to U+000A LF or U+000D CR from s given position .

  6. Append token to result .

  7. While position is not past the end of s :

    1. If the code point at position within s equals U+000D CR:

      1. Append native line ending to result .

      2. Advance position by 1.

      3. If position is not past the end of s and the code point at position within s equals U+000A LF advance position by 1.

    2. Otherwise if the code point at position within s equals U+000A LF, advance position by 1 and append native line ending to result .

    3. Let token be the result of collecting a sequence of code points that are not equal to U+000A LF or U+000D CR from s given position .

    4. Append token to result .

  8. Return result .

Examples of constructor usage follow.
// Create a new Blob object
var a = new Blob();
// Create a 1024-byte ArrayBuffer
// buffer could also come from reading a File
var buffer = new ArrayBuffer(1024);
// Create ArrayBufferView objects based on buffer
var shorts = new Uint16Array(buffer, 512, 128);
var bytes = new Uint8Array(buffer, shorts.byteOffset + shorts.byteLength);
var b = new Blob(["foobarbazetcetc" + "birdiebirdieboo"], {type: "text/plain;charset=utf-8"});
var c = new Blob([b, shorts]);
var a = new Blob([b, c, bytes]);
var d = new Blob([buffer, b, c, bytes]);

3.2. 3.3. Attributes

blob . size , of type unsigned long long , readonly

Returns the size of the byte sequence represented by blob in number of bytes. On getting, conforming user agents must return the total number of bytes that can be read by a FileReader

The size or FileReaderSync getter steps are to return this . [[data]] . size .

object, or 0 if the blob . Blob type has no bytes to be read. type , of type DOMString , readonly

The ASCII-encoded string in lower case representing the media type of the Blob . On getting, user agents must return the type of a Blob as an ASCII-encoded string in lower case, such that when it is converted to a byte sequence, it is a parsable MIME type , , or the an empty string – 0 bytes – if the type cannot be determined.

The type attribute can be set by the web application itself through constructor invocation and through the slice() call; in these cases, further normative conditions for this attribute are in § 3.1 Constructors , § 4.1 Constructor , and § 3.3.1 The slice() method respectively. User agents can also determine the type of a Blob , especially if the byte sequence is from an on-disk file; in this case, further normative conditions are in the file type guidelines .

Note: The type t of a Blob is considered a parsable MIME type , if performing the parse a MIME type algorithm to a byte sequence converted from the ASCII-encoded string representing the Blob object’s type does not return failure.

Note: Use of the type attribute informs the package data algorithm and determines the Content-Type header when fetching blob URLs .

The type getter steps are to return this . [[type]] .

3.3. 3.4. Methods and Parameters

3.3.1. 3.4.1. The slice() method

The
slice = blob . slice slice() ( start , end , contentType )

method returns slice is a new Blob object object, sharing storage with blob , with bytes ranging from the optional start parameter up to but not including the optional end parameter, and with a type attribute that is the value of the optional contentType parameter. It must act as follows:

Let Negative values for O start be and end are interpreted as relative to the Blob context object on which end of the slice() method is being called. blob .

The optional slice( start , end , contentType ) parameter is a value for the start point of a slice() call, and must be treated as a byte-order position, with the zeroth position representing the first byte. User agents must process slice() with start normalized according to the following: If the optional start parameter is not used as a parameter when making this call, let method steps are:
  1. Let relativeStart be 0.

  2. If start is negative, let not undefined :

    1. If start < 0, set relativeStart be to max(( max( this . [[data]] . size + start ), , 0) . Else, let

    2. Otherwise, set relativeStart be to min(start, size) min( start , this . [[data]] . size ) .

  3. The optional end parameter is a value for the end point of a slice() call. User agents must process slice() with end normalized according to the following: If the optional end parameter is not used as a parameter when making this call, let Let relativeEnd be this . [[data]] . size . .

  4. If end is negative, let not undefined :

    1. If end < 0, set relativeEnd be to max((size max( this . [[data]] . size + end), end , 0) . Else, let

    2. Otherwise, set relativeEnd be to min(end, size) min( end , this . [[data]] . size ) .

  5. The optional Let span be contentType parameter is used to set the ASCII-encoded string in lower case representing the media type of the Blob. User agents must process the slice() with contentType normalized according to the following: max((relativeEnd - relativeStart), 0) .

    If the contentType parameter is not provided, let
  6. Let relativeContentType be set to the an empty string. Else let

  7. If relativeContentType be set to contentType and run the substeps below: is not undefined :

    1. If relativeContentType contentType contains does not contain any characters outside the range of U+0020 to U+007E, then set relativeContentType U+0x0020 to the empty string and return from these substeps. U+0x007E:

      1. Convert every character in Set relativeContentType to ASCII lowercase . of contentType .

  8. Let span snapshot state be max((relativeEnd - relativeStart), 0) . a new map .

  9. Return a new Set snapshot state [ Blob "offset" object ] to S with the following characteristics: relativeStart .

  10. Set S snapshot state refers [ "span" ] to span .

  11. Set snapshot state consecutive byte [ "source" s from ] to this . [[data]] .

  12. Let O , beginning with the byte result be a new Blob at byte-order position object.

  13. Set relativeStart result . [[type]] to S relativeContentType .

  14. Set result . [[data]] . size = to span .

  15. Set S result . type [[data]] . snapshot state = to relativeContentType snapshot state .

  16. Set result . [[data]] . read initialization algorithm to the sliced blob read initialization steps .

  17. Set result . [[data]] . read algorithm to the sliced blob read steps .

  18. Return result .

The examples below illustrate the different types of slice() calls possible. Since the File interface inherits from the Blob interface, examples are based on the use of the File interface.
// obtain input element through DOM
var file = document.getElementById('file').files[0];
if(file)
{
  // create an identical copy of file
  // the two calls below are equivalent
  var fileClone = file.slice();
  var fileClone2 = file.slice(0, file.size);
  // slice file into 1/2 chunk starting at middle of file
  // Note the use of negative number
  var fileChunkFromEnd = file.slice(-(Math.round(file.size/2)));
  // slice file into 1/2 chunk starting at beginning of file
  var fileChunkFromStart = file.slice(0, Math.round(file.size/2));
  // slice file from beginning till 150 bytes before end
  var fileNoMetadata = file.slice(0, -150, "application/experimental");
}

3.3.2. 3.4.2. The stream() method

The stream() method, when invoked, must return the result of calling get stream on the context object .

3.3.3. 3.4.3. The text() method

The text() method, when invoked, must run these steps:

  1. Let stream promise be the result of calling get stream on the context object a new Promise .

  2. Run the following steps in parallel :

    1. Let reader bytes be the result of getting a reader reading all bytes from stream . this . If that threw an exception, return a new reject promise rejected with that exception. exception and abort.

    2. Let Resolve promise be with the result of reading all bytes running UTF-8 decode from stream with on reader bytes .

  3. Return the result of transforming promise by a fulfillment handler that returns the result of running UTF-8 decode on its first argument. .

Note: This is different from the behavior of readAsText() to align better with the behavior of Fetch’s text() . Specifically this method will always use UTF-8 as encoding, while FileReader can use a different encoding depending on the blob’s type and passed in encoding name.

3.3.4. 3.4.4. The arrayBuffer() method

The arrayBuffer() method, when invoked, must run these steps:

  1. Let stream promise be the result of calling get stream on the context object a new Promise .

  2. Let reader be Run the result of getting a reader from stream . If that threw an exception, return a new promise rejected with that exception. following steps in parallel :

    1. Let promise bytes be the result of reading all bytes from this . If that threw an exception, reject stream promise with reader . that exception and abort.

    2. Return the result of transforming Resolve promise by a fulfillment handler that returns with a new ArrayBuffer whose contents are its first argument. bytes .

  3. Return promise .

4. The File Interface

A File object is a Blob object with a name attribute, which is a string; it can be created within the web application via a constructor, or is a reference to a byte sequence from a file from the underlying (OS) file system.

[Exposed=(Window,Worker), Serializable]
interface File : Blob {
  constructor(sequence<BlobPart> fileBits,
              USVString fileName,
              optional FilePropertyBag options = {});
  readonly attribute DOMString name;
  readonly attribute long long lastModified;
};
dictionary FilePropertyBag : BlobPropertyBag {  long long lastModified;
};

If a A File object is a reference to a byte sequence originating from has an associated [[name]] intenral slot, a file on disk, then its snapshot state should be set to the state of the file on disk at the time the string.

A File object is created. has an associated [[lastModified]] internal slot, something.

Note: This is a non-trivial requirement to implement for user agents, and is thus not a must but a should [RFC2119] . User agents should endeavor to have a File object’s snapshot state objects are serializable objects . Their serialization steps , given value , serialized and forStorage , are:

  1. Invoke the blob serialization steps set given value , serialized and forStorage .

  2. Set serialized .[[Name]] to value . [[name]] .

  3. Set serialized .[[LastModified]] to value . [[lastModified]] .

Their deserialization steps , given value and serialized , are:

  1. Invoke the blob deserialization steps given value and serialized .

  2. Set value . [[name]] to serialized .[[Name]].

  3. Set value . [[lastModified]] to serialized .[[LastModified]].

4.1. Concepts

The snapshot state of the underlying storage on disk at the time the for a file backed blob contains a "file" member (a reference is taken. If the to a native file is modified on disk following the time disk), and a reference has been taken, the "last modified" member (a number).

To create a file backed File 's object for a given native file , run these steps:
  1. Let snapshot state be an empty map .

  2. Set snapshot state [ "file" will differ from ] to native file .

  3. Let last modified be the state last time native file was modified, as the number of milliseconds since the underlying storage. User agents may use modification time stamps and other mechanisms to maintain snapshot state , but Unix Epoch . If this is left can’t be determined, set last modified to the current date and time represented as an implementation detail. the number of milliseconds since the Unix Epoch .

  4. When a Set snapshot state [ File "last modified" object refers ] to a last modified .

  5. Let name be the file on disk, name of native file , converted to a string in a user agents must return agent defined manner.

  6. Let content type be the mime type of that file, and must follow native file (as a lowercase ASCII string), derived from name in a user agent defined manner, or an empty string if no type could be determined, taking into account the following file type guidelines below: :

    • User agents must return the type as an ASCII-encoded string in lower case, such that when it is converted to a corresponding byte sequence, it is a parsable MIME type , or the empty string – 0 bytes – if the type cannot be determined.

    • When the file is of type text/plain user agents must NOT append a charset parameter to the dictionary of parameters portion of the media type [MIMESNIFF] .

    • User agents must not attempt heuristic determination of encoding, including statistical methods.

    ] { , , = {}); ; ; }; { ; };
  7. Let result be a new File objects are serializable objects . Their serialization steps , given object.

  8. Set value and result . [[type]] to serialized , are: content type .

  9. Set serialized .[[SnapshotState]] result . [[data]] . size to the size of native file .

  10. Set result . [[data]] . snapshot state to value ’s snapshot state .

  11. Set result . [[data]] . read initialization algorithm to the file read initialization steps .

  12. Set serialized .[[ByteSequence]] result . [[data]] . read algorithm to the file read steps .

  13. Set value ’s underlying byte sequence. result . [[name]] to name .

  14. Set serialized .[[Name]] result . [[lastModified]] to the value last modified .

  15. Return result .

A file blob read state is a struct consisting of a file handle (a not further defined handle to a file that is open for reading), and a offset (a number).

The file read initialization steps , given a value snapshot state ’s and offset are:
  1. Let file be snapshot state [ name "file" attribute. ].

  2. Set serialized .[[LastModified]] to If the value of file referred to value file ’s no longer exists, throw a lastModified NotFoundError attribute. .

  3. Their deserialization steps , given Let value last modified and be the last time serialized , are: file was modified, as the number of milliseconds since the Unix Epoch .

  4. Set If value last modified ’s is different from snapshot state [ "last modified" ], throw a NotReadableError .

  5. User agents may attempt to serialized .[[SnapshotState]]. detect in other ways that the file on disk has been changed. If this is the case, throw a NotReadableError .

  6. Set If the user agent for some other reason decides that the file should not be read by a website, throw a SecurityError .

  7. Let value read state ’s underlying byte sequence to serialized .[[ByteSequence]]. be a new file blob read state .

  8. Initialize Set read state . file handle to the value result of opening value file ’s for reading. If this fails, for example due to permission problems, throw a name NotReadableError attribute .

  9. Set read state . offset to serialized .[[Name]]. offset .

  10. Initialize Return read state .

The file read steps , given a read state are:
  1. Let bytes be the value result of reading bytes from value ’s read state . file handle at read state . offset .

  2. If reading bytes failed other than by reaching the end of the file, throw a lastModified NotReadableError attribute .

  3. If reading bytes failed becaus the end of the file was reached, return end of blob .

  4. Set read state . offset to serialized .[[LastModified]]. read state . offset + bytes ’s length .

  5. Return bytes .

4.1. 4.2. Constructor

The File new File( fileBits , fileName , options ) constructor is invoked with two or three parameters, depending on whether the optional dictionary parameter is used. When the File() constructor is invoked, user agents must run the following steps: steps are:
  1. Let bytes be Run the result of processing blob parts given fileBits Blob and ( fileBits , options . ) constructor steps.

  2. Let n be a new string of the same size as the fileName argument to the constructor. .

  3. Copy every character from fileName to n , replacing any "/" character (U+002F SOLIDUS) with a ":" (U+003A COLON).

    Note: Underlying OS filesystems use differing conventions for file name; with constructed files, mandating UTF-16 lessens ambiquity when file names are converted to byte sequences. sequences .

  4. Process If options . FilePropertyBag lastModified dictionary argument by running the following substeps: member is provided:

    1. If the type member is provided and is not the empty string, let Let t d be set to the options . type lastModified dictionary member. If t contains any characters outside the range U+0020 to U+007E, then set t to the empty string and return from these substeps.

  5. Convert every character in t to ASCII lowercase . Otherwise:

    1. If the lastModified member is provided, let Let d be set to the lastModified dictionary member. If it is not provided, set d to the current date and time represented as the number of milliseconds since the Unix Epoch (which is the equivalent of Date.now() [ECMA-262] ).

      Note: Since ECMA-262 Date objects convert to long long values representing the number of milliseconds since the Unix Epoch , the lastModified member could be a Date object [ECMA-262] .

  6. Return a new File object F such that: F refers to the bytes byte sequence. F . size is set to the number of total bytes in bytes . F . name Set this . [[name]] is set to n .

  7. F . type is set to t . F . lastModified Set this . [[lastModified]] is set to d .

4.1.1. 4.3. Constructor Parameters Attributes The File() constructor can be invoked with the parameters below:

A fileBits sequence which takes any number of the following elements, and in any order: BufferSource elements. Blob elements, which includes File elements. USVString elements. A fileName
parameter A file . USVString parameter representing the name of the file; normative conditions for this constructor parameter can be found in § 4.1 Constructor . An optional FilePropertyBag dictionary which in addition to the members of BlobPropertyBag takes one member:

An optional lastModified member, which must be a long long ; normative conditions for this member are provided in § 4.1 Constructor . 4.2. Attributes name , of type DOMString , readonly The name of the file. On getting, this must return Returns the name of the file as a string. There are numerous file name variations and conventions used by different underlying OS file systems; this is merely the name of the file, without path information. On getting, if user agents cannot make this information available, they must return the empty string. If a File

The name object is created using a constructor, further normative conditions for this attribute getter steps are found in § 4.1 Constructor to return this . [[name]] .

file . lastModified , of type long long , readonly The last modified date of the file. On getting, if user agents can make this information available, this must return a long long set to

Return a the time the file was last modified as the number of milliseconds since the Unix Epoch . If the last modification date and time are not known, the attribute must return returns the current date and time as a long long representing the number of milliseconds since the Unix Epoch ; this is equivalent (equivalent to Date . now () [ECMA-262] . If a File object is created using a constructor, further normative conditions for this attribute are found in § 4.1 Constructor . ).

The File lastModified interface is available on objects that expose an attribute of type FileList ; these objects getter steps are defined in HTML [HTML] to return this . [[lastModified]] . The File interface, which inherits from Blob , is immutable, and thus represents file data that can be read into memory at the time a read operation is initiated. User agents must process reads on files that no longer exist at the time of read as errors , throwing a NotFoundError exception if using a FileReaderSync on a Web Worker [Workers] or firing an error event with the error attribute returning a NotFoundError .

In the examples below, metadata from a file object is displayed meaningfully, and a file object is created with a name and a last modified date.
var file = document.getElementById("filePicker").files[0];
var date = new Date(file.lastModified);
println("You selected the file " + file.name + " which was modified on " + date.toDateString() + ".");
...
// Generate a file with a specific last modified date
var d = new Date(2013, 12, 5, 16, 23, 45, 600);
var generatedFile = new File(["Rough Draft ...."], "Draft1.txt", {type: "text/plain", lastModified: d})
...

5. The FileList Interface

Note: The FileList interface should be considered "at risk" since the general trend on the Web Platform is to replace such interfaces with the Array platform object in ECMAScript [ECMA-262] . In particular, this means syntax of the sort filelist . item ( 0 ) is at risk; most other programmatic use of FileList is unlikely to be affected by the eventual migration to an Array type.

This interface is a list of File objects.

[Exposed=(Window,Worker), Serializable]
interface FileList {
  );

  getter File? item(unsigned long index);
  readonly attribute unsigned long length;
};

FileList objects are serializable objects . Their serialization steps , given value and serialized , are:

  1. Set serialized .[[Files]] to an empty list .

  2. For each file in value , append the sub-serialization of file to serialized .[[Files]].

Their deserialization step , given serialized and value , are:

  1. For each file of serialized .[[Files]], add the sub-deserialization of file to value .

Sample usage typically involves DOM access to the <input type="file"> element within a form, and then accessing selected files.
// uploadData is a form element
// fileChooser is input element of type 'file'
var file = document.forms['uploadData']['fileChooser'].files[0];
// alternative syntax can be
// var file = document.forms['uploadData']['fileChooser'].files.item(0);
if(file)
{
  // Perform file ops
}

5.1. Attributes

length , of type unsigned long , readonly
must return the number of files in the FileList object. If there are no files, this attribute must return 0.

5.2. Methods and Parameters

item(index)
must return the index th File object in the FileList . If there is no index th File object in the FileList , then this method must return null .

index must be treated by user agents as value for the position of a File object in the FileList , with 0 representing the first file. Supported property indices are the numbers in the range zero to one less than the number of File objects represented by the FileList object. If there are no such File objects, then there are no supported property indices.

Note: The HTMLInputElement interface has a readonly attribute of type FileList , which is what is being accessed in the above example. Other interfaces with a readonly attribute of type FileList include the DataTransfer interface.

6. Reading Data

6.1. The File Reading Task Source

This specification defines a new generic task source called the file reading task source , which is used for all tasks that are queued in this specification to read byte sequences associated with Blob and File objects. It is to be used for features that trigger in response to asynchronously reading binary data.

6.2. The FileReader API

[Exposed=(Window,Worker)]
interface FileReader: EventTarget {
  constructor();
  // async read methods
  );
  );
  );
  );

  void readAsArrayBuffer(Blob blob);
  void readAsBinaryString(Blob blob);
  void readAsText(Blob blob, optional DOMString encoding);
  void readAsDataURL(Blob blob);
  void abort();
  // states
  const unsigned short EMPTY = 0;
  const unsigned short LOADING = 1;
  const unsigned short DONE = 2;
  readonly attribute unsigned short readyState;
  // File or Blob data
  ;

  readonly attribute (DOMString or ArrayBuffer)? result;
  ;

  readonly attribute DOMException? error;
  // event handler content attributes
  attribute EventHandler onloadstart;
  attribute EventHandler onprogress;
  attribute EventHandler onload;
  attribute EventHandler onabort;
  attribute EventHandler onerror;
  attribute EventHandler onloadend;
};

A FileReader has an associated state , that is "empty" , "loading" , or "done" . It is initially "empty" .

A FileReader has an associated result ( null , a DOMString or an ArrayBuffer ). It is initially null .

A FileReader has an associated error ( null or a DOMException ). It is initially null .

The FileReader() constructor, when invoked, must return a new FileReader object.

The readyState attribute’s getter, when invoked, switches on the context object 's state and runs the associated step:

"empty"

Return EMPTY

"loading"

Return LOADING

"done"

Return DONE

The result attribute’s getter, when invoked, must return the context object 's result .

The error attribute’s getter, when invoked, must return the context object 's error .

A FileReader fr has an associated read operation algorithm, which given blob , a type and an optional encodingName , runs the following steps:
  1. If fr ’s state is "loading" , throw an InvalidStateError DOMException .

  2. Set fr ’s state to "loading" .

  3. Set fr ’s result to null .

  4. Set fr ’s error to null .

  5. Let stream be the result of calling get stream on blob .

  6. Let reader be the result of getting a reader from stream .

  7. Let bytes by an empty byte sequence .

  8. Let chunkPromise be the result of reading a chunk from stream with reader .

  9. Let isFirstChunk be true.

  10. In parallel , while true:

    1. Wait for chunkPromise to be fulfilled or rejected.

    2. If chunkPromise is fulfilled, and isFirstChunk is true, queue a task to fire a progress event called loadstart at fr .

      We might change loadstart to be dispatched synchronously, to align with XMLHttpRequest behavior. <https://github.com/w3c/FileAPI/issues/119>

    3. Set isFirstChunk to false.

    4. If chunkPromise is fulfilled with an object whose done property is false and whose value property is a Uint8Array object, run these steps:

      1. Let bs be the byte sequence represented by the Uint8Array object.

      2. Append bs to bytes .

      3. If roughly 50ms have passed since these steps were last invoked, queue a task to fire a progress event called progress at fr .

      4. Set chunkPromise to the result of reading a chunk from stream with reader .

    5. Otherwise, if chunkPromise is fulfilled with an object whose done property is true, queue a task to run the following steps and abort this algorithm:

      1. Set fr ’s state to "done" .

      2. Let result be the result of package data given bytes , type , blob ’s type , and encodingName .

      3. If package data threw an exception error :

        1. Set fr ’s error to error .

        2. Fire a progress event called error at fr .

      4. Else:

        1. Set fr ’s result to result .

        2. Fire a progress event called load at the fr .

      5. If fr ’s state is not "loading" , fire a progress event called loadend at the fr .

        Note: Event handler for the load or error events could have started another load, if that happens the loadend event for this load is not fired.

    6. Otherwise, if chunkPromise is rejected with an error error , queue a task to run the following steps and abort this algorithm:

      1. Set fr ’s state to "done" .

      2. Set fr ’s error to error .

      3. Fire a progress event called error at fr .

      4. If fr ’s state is not "loading" , fire a progress event called loadend at fr .

        Note: Event handler for the error event could have started another load, if that happens the loadend event for this load is not fired.

Use the file reading task source for all these tasks.

6.2.1. Event Handler Content Attributes

The following are the event handler content attributes (and their corresponding event handler event types ) that user agents must support on FileReader as DOM attributes:

event handler content attribute event handler event type
onloadstart loadstart
onprogress progress
onabort abort
onerror error
onload load
onloadend loadend

6.2.2. FileReader States

The FileReader object can be in one of 3 states. The readyState attribute tells you in which state the object is:
EMPTY (numeric value 0)

The FileReader object has been constructed, and there are no pending reads. None of the read methods have been called. This is the default state of a newly minted FileReader object, until one of the read methods have been called on it.

LOADING (numeric value 1)

A File or Blob is being read. One of the read methods is being processed, and no error has occurred during the read.

DONE (numeric value 2)

The entire File or Blob has been read into memory, OR a file read error occurred, OR the read was aborted using abort() . The FileReader is no longer reading a File or Blob . If readyState is set to DONE it means at least one of the read methods have been called on this FileReader .

6.2.3. Reading a File or Blob

The FileReader interface makes available several asynchronous read methods readAsArrayBuffer() , readAsBinaryString() , readAsText() and readAsDataURL() , which read files into memory.

Note: If multiple concurrent read methods are called on the same FileReader object, user agents throw an InvalidStateError on any of the read methods that occur when readyState = LOADING .

( FileReaderSync makes available several synchronous read methods . Collectively, the sync and async read methods of FileReader and FileReaderSync are referred to as just read methods .)

6.2.3.1. The readAsDataURL() method

The readAsDataURL( blob ) method, when invoked, must initiate a read operation for blob with DataURL .

6.2.3.2. The readAsText() method

The readAsText( blob , encoding ) method, when invoked, must initiate a read operation for blob with Text and encoding .

6.2.3.3. The readAsArrayBuffer()

The readAsArrayBuffer( blob ) method, when invoked, must initiate a read operation for blob with ArrayBuffer .

6.2.3.4. The readAsBinaryString() method

The readAsBinaryString( blob ) method, when invoked, must initiate a read operation for blob with BinaryString .

Note: The use of readAsArrayBuffer() is preferred over readAsBinaryString() , which is provided for backwards compatibility.

6.2.3.5. The abort() method

When the abort() method is called, the user agent must run the steps below:

  1. If context object 's state is "empty" or if context object 's state is "done" set context object 's result to null and terminate this algorithm .

  2. If context object 's state is "loading" set context object 's state to "done" and set context object 's result to null .

  3. If there are any tasks from the context object on the file reading task source in an affiliated task queue , then remove those tasks from that task queue.

  4. Terminate the algorithm for the read method being processed.

  5. Fire a progress event called abort at the context object .

  6. If context object 's state is not "loading" , fire a progress event called loadend at the context object .

6.3. Packaging data

A Blob has an associated package data algorithm, given bytes , a type , a optional mimeType , and a optional encodingName , which switches on type and runs the associated steps:
DataURL

Return bytes as a DataURL [RFC2397] subject to the considerations below:

  • Use mimeType as part of the Data URL if it is available in keeping with the Data URL specification [RFC2397] .

  • If mimeType is not available return a Data URL without a media-type. [RFC2397] .

Better specify how the DataURL is generated. <https://github.com/w3c/FileAPI/issues/104>

Text
  1. Let encoding be failure.

  2. If the encodingName is present, set encoding to the result of getting an encoding from encodingName .

  3. If encoding is failure, and mimeType is present:

    1. Let type be the result of parse a MIME type given mimeType .

    2. If type is not failure, set encoding to the result of getting an encoding from type ’s parameters [ "charset" ].

      If blob has a type attribute of text/plain;charset=utf-8 then getting an encoding is run using "utf-8" as the label. Note that user agents must parse and extract the portion of the Charset Parameter that constitutes a label of an encoding.
  4. If encoding is failure, then set encoding to UTF-8 .

  5. Decode bytes using fallback encoding encoding , and return the result.

ArrayBuffer

Return a new ArrayBuffer whose contents are bytes .

BinaryString

Return bytes as a binary string, in which every byte is represented by a code unit of equal value [0..255].

6.4. Events

The FileReader object must be the event target for all events in this specification.

When this specification says to fire a progress event called e (for some ProgressEvent e at a given FileReader reader as the context object ), the following are normative:

6.4.1. Event Summary

The following are the events that are fired at FileReader objects.

Event name Interface Fired when…
loadstart ProgressEvent When the read starts.
progress ProgressEvent While reading (and decoding) blob
abort ProgressEvent When the read has been aborted. For instance, by invoking the abort() method.
error ProgressEvent When the read has failed (see file read errors ).
load ProgressEvent When the read has successfully completed.
loadend ProgressEvent When the request has completed (either in success or failure).

6.4.2. Summary of Event Invariants

This section is informative.

The following are invariants applicable to event firing for a given asynchronous read method in this specification:

  1. Once a loadstart has been fired, a corresponding loadend fires at completion of the read, UNLESS any of the following are true:

    Note: The events loadstart and loadend are not coupled in a one-to-one manner.

    This example showcases "read-chaining": initiating another read from within an event handler while the "first" read continues processing.
    // In code of the sort...
    reader.readAsText(file);
    reader.onload = function(){reader.readAsText(alternateFile);}
    .....
    //... the loadend event must not fire for the first read
    reader.readAsText(file);
    reader.abort();
    reader.onabort = function(){reader.readAsText(updatedFile);}
    //... the loadend event must not fire for the first read
    
  2. One progress event will fire when blob has been completely read into memory.

  3. No progress event fires before loadstart .

  4. No progress event fires after any one of abort , load , and error have fired. At most one of abort , load , and error fire for a given read.

  5. No abort , load , or error event fires after loadend .

6.5. Reading on Threads

Web Workers allow for the use of synchronous File or Blob read APIs, since such reads on threads do not block the main thread. This section defines a synchronous API, which can be used within Workers [[Web Workers]]. Workers can avail of both the asynchronous API (the FileReader object) and the synchronous API (the FileReaderSync object).

6.5.1. The FileReaderSync API

This interface provides methods to synchronously read File or Blob objects into memory.

[Exposed=(DedicatedWorker,SharedWorker)]
interface FileReaderSync {
  constructor();
  // Synchronously return strings
  );
  );
  );
  );

  ArrayBuffer readAsArrayBuffer(Blob blob);
  DOMString readAsBinaryString(Blob blob);
  DOMString readAsText(Blob blob, optional DOMString encoding);
  DOMString readAsDataURL(Blob blob);
};
6.5.1.1. Constructors

When the FileReaderSync() constructor is invoked, the user agent must return a new FileReaderSync object.

6.5.1.2. The readAsText()

The readAsText( blob , encoding ) method, when invoked, must run these steps:

  1. Let stream be the result of calling get stream on blob .

  2. Let reader be the result of getting a reader from stream .

  3. Let promise be the result of reading all bytes from stream with reader .

  4. Wait for promise to be fulfilled or rejected.

  5. If promise fulfilled with a byte sequence bytes :

    1. Return the result of package data given bytes , Text , blob ’s type , and encoding .

  6. Throw promise ’s rejection reason.

6.5.1.3. The readAsDataURL() method

The readAsDataURL( blob ) method, when invoked, must run these steps:

  1. Let stream be the result of calling get stream on blob .

  2. Let reader be the result of getting a reader from stream .

  3. Let promise be the result of reading all bytes from stream with reader .

  4. Wait for promise to be fulfilled or rejected.

  5. If promise fulfilled with a byte sequence bytes :

    1. Return the result of package data given bytes , DataURL , and blob ’s type .

  6. Throw promise ’s rejection reason.

6.5.1.4. The readAsArrayBuffer() method

The readAsArrayBuffer( blob ) method, when invoked, must run these steps:

  1. Let stream be the result of calling get stream on blob .

  2. Let reader be the result of getting a reader from stream .

  3. Let promise be the result of reading all bytes from stream with reader .

  4. Wait for promise to be fulfilled or rejected.

  5. If promise fulfilled with a byte sequence bytes :

    1. Return the result of package data given bytes , ArrayBuffer , and blob ’s type .

  6. Throw promise ’s rejection reason.

6.5.1.5. The readAsBinaryString() method

The readAsBinaryString( blob ) method, when invoked, must run these steps:

  1. Let stream be the result of calling get stream on blob .

  2. Let reader be the result of getting a reader from stream .

  3. Let promise be the result of reading all bytes from stream with reader .

  4. Wait for promise to be fulfilled or rejected.

  5. If promise fulfilled with a byte sequence bytes :

    1. Return the result of package data given bytes , BinaryString , and blob ’s type .

  6. Throw promise ’s rejection reason.

Note: The use of readAsArrayBuffer() is preferred over readAsBinaryString() , which is provided for backwards compatibility.

7. Errors and Exceptions

File read errors can occur when reading files from the underlying filesystem. The list below of potential error conditions is informative .

7.1. Throwing an Exception or Returning an Error

This section is normative.

Error conditions can arise when reading a File or a Blob .

The read operation can terminate due to error conditions when reading a File or a Blob ; the particular error condition that causes the get stream algorithm to fail is called a failure reason . A failure reason is one of NotFound , UnsafeFile , TooManyReads , SnapshotState , or FileLock .

Synchronous read methods throw exceptions of the type in the table below if there has been an error owing to a particular failure reason .

Asynchronous read methods use the error attribute of the FileReader object, which must return a DOMException object of the most appropriate type from the table below if there has been an error owing to a particular failure reason , or otherwise return null.

Type Description and Failure Reason
NotFoundError If the File or Blob resource could not be found at the time the read was processed, this is the NotFound failure reason .

For asynchronous read methods the error attribute must return a NotFoundError exception and synchronous read methods must throw a NotFoundError exception.

SecurityError If:
  • it is determined that certain files are unsafe for access within a Web application, this is the UnsafeFile failure reason .

  • it is determined that too many read calls are being made on File or Blob resources, this is the TooManyReads failure reason .

For asynchronous read methods the error attribute may return a SecurityError exception and synchronous read methods may throw a SecurityError exception.

This is a security error to be used in situations not covered by any other failure reason .

NotReadableError If:

For asynchronous read methods the error attribute must return a NotReadableError exception and synchronous read methods must throw a NotReadableError exception.

8. A URL for Blob and MediaSource reference

This section defines a scheme for a URL used to refer to Blob and MediaSource objects.

8.1. Introduction

This section is informative.

Blob (or object) URLs are URLs like blob:http://example.com/550e8400-e29b-41d4-a716-446655440000 . This enables integration of Blob s and MediaSource s with other APIs that are only designed to be used with URLs, such as the img element. Blob URLs can also be used to navigate to as well as to trigger downloads of locally generated data.

For this purpose two static methods are exposed on the URL interface, createObjectURL(obj) and revokeObjectURL(url) . The first method creates a mapping from a URL to a Blob , and the second method revokes said mapping. As long as the mapping exist the Blob can’t be garbage collected, so some care must be taken to revoke the URL as soon as the reference is no longer needed. All URLs are revoked when the global that created the URL itself goes away.

8.2. Model

Each user agent must maintain a blob URL store . A blob URL store is a map where keys are valid URL strings and values are blob URL Entries .

A blob URL entry consists of an object (of type Blob or MediaSource ), and an environment (an environment settings object ).

Keys in the blob URL store (also known as blob URLs ) are valid URL strings that when parsed result in a URL with a scheme equal to " blob ", an empty host , and a path consisting of one element itself also a valid URL string .

To generate a new blob URL , run the following steps:
  1. Let result be the empty string.

  2. Append the string " blob: " to result .

  3. Let settings be the current settings object

  4. Let origin be settings ’s origin .

  5. Let serialized be the ASCII serialization of origin .

  6. If serialized is " null ", set it to an implementation-defined value.

  7. Append serialized to result .

  8. Append U+0024 SOLIDUS ( / ) to result .

  9. Generate a UUID [RFC4122] as a string and append it to result .

  10. Return result .

An example of a blob URL that can be generated by this algorithm is blob:https://example.org/40a5fb5a-d56d-4a33-b4e2-0acf6a8e5f64 1.
To add an entry to the blob URL store for a given object , run the following steps:
  1. Let store be the user agent’s blob URL store .

  2. Let url be the result of generating a new blob URL .

  3. Let entry be a new blob URL entry consisting of object and the current settings object .

  4. Set store [ url ] to entry .

  5. Return url .

To remove an entry from the blob URL store for a given url , run the following steps:
  1. Let store be the user agent’s blob URL store ;

  2. Let url string be the result of serializing url .

  3. Remove store [ url string ].

8.3. Dereferencing Model for blob URLs

To resolve a blob URL given a url (a URL ), run the following steps:
  1. Assert : url ’s scheme is " blob ".

  2. Let store be the user agent’s blob URL store .

  3. Let url string be the result of serializing url with the exclude fragment flag set.

  4. If store [ url string ] exists , return store [ url string ]; otherwise return failure.

Futher requirements for the parsing and fetching model for blob URLs are defined in the [URL] and [Fetch] specifications.

8.3.1. Origin of blob URLs

This section is informative.

The origin of a blob URL is always the same as that of the environment that created the URL, as long as the URL hasn’t been revoked yet. This is achieved by the [URL] spec looking up the URL in the blob URL store when parsing a URL, and using that entry to return the correct origin.

If the URL was revoked the serialization of the origin will still remain the same as the serialization of the origin of the environment that created the blob URL, but for opaque origins the origin itself might be distinct. This difference isn’t observable though, since a revoked blob URL can’t be resolved/fetched anymore anyway.

8.3.2. Lifetime of blob URLs

This specification extends the unloading document cleanup steps with the following steps:

  1. Let environment be the Document 's relevant settings object .

  2. Let store be the user agent’s blob URL store ;

  3. Remove from store any entries for which the value 's environment is equal to environment .

This needs a similar hook when a worker is unloaded.

8.4. Creating and Revoking a blob URL

Blob URLs are created and revoked using static methods exposed on the URL object. Revocation of a blob URL decouples the blob URL from the resource it refers to, and if it is dereferenced after it is revoked, user agents must act as if a network error has occurred. This section describes a supplemental interface to the URL specification [URL] and presents methods for blob URL creation and revocation.

[Exposed=(Window,DedicatedWorker,SharedWorker)]
partial interface URL {
  );
  );

  static DOMString createObjectURL((Blob or MediaSource) obj);
  static void revokeObjectURL(DOMString url);
};
The createObjectURL( obj ) static method must return the result of adding an entry to the blob URL store for obj .
The revokeObjectURL( url ) static method must run these steps:
  1. Let url record be the result of parsing url .

  2. If url record ’s scheme is not " blob ", return.

  3. Let origin be the origin of url record .

  4. Let settings be the current settings object .

  5. If origin is not same origin with settings ’s origin , return.

  6. Remove an entry from the Blob URL Store for url .

Note: This means that rather than throwing some kind of error, attempting to revoke a URL that isn’t registered will silently fail. User agents might display a message on the error console if this happens.

Note: Attempts to dereference url after it has been revoked will result in a network error . Requests that were started before the url was revoked should still succeed.

In the example below, window1 and window2 are separate, but in the same origin ; window2 could be an iframe inside window1 .
myurl = window1.URL.createObjectURL(myblob);
window2.URL.revokeObjectURL(myurl);

Since a user agent has one global blob URL store , it is possible to revoke an object URL from a different window than from which it was created. The URL. revokeObjectURL() call ensures that subsequent dereferencing of myurl results in a the user agent acting as if a network error has occurred.

8.4.1. Examples of blob URL Creation and Revocation

Blob URL s are strings that are used to fetch Blob objects, and can persist for as long as the document from which they were minted using URL. createObjectURL() see § 8.3.2 Lifetime of blob URLs .

This section gives sample usage of creation and revocation of blob URL s with explanations.

In the example below, two img elements [HTML] refer to the same blob URL :
url = URL.createObjectURL(blob);
img1.src = url;
img2.src = url;
In the example below, URL. revokeObjectURL() is explicitly called.
var blobURLref = URL.createObjectURL(file);
img1 = new Image();
img2 = new Image();
// Both assignments below work as expected
img1.src = blobURLref;
img2.src = blobURLref;
// ... Following body load
// Check if both images have loaded
if(img1.complete && img2.complete) {
  // Ensure that subsequent refs throw an exception
  URL.revokeObjectURL(blobURLref);
} else {
  msg("Images cannot be previewed!");
  // revoke the string-based reference
  URL.revokeObjectURL(blobURLref);
}

The example above allows multiple references to a single blob URL , and the web developer then revokes the blob URL string after both image objects have been loaded. While not restricting number of uses of the blob URL offers more flexibility, it increases the likelihood of leaks; developers should pair it with a corresponding call to URL. revokeObjectURL() .

9. Security and Privacy Considerations

This section is informative.

This specification allows web content to read files from the underlying file system, as well as provides a means for files to be accessed by unique identifiers, and as such is subject to some security considerations. This specification also assumes that the primary user interaction is with the <input type="file"/> element of HTML forms [HTML] , and that all files that are being read by FileReader objects have first been selected by the user. Important security considerations include preventing malicious file selection attacks (selection looping), preventing access to system-sensitive files, and guarding against modifications of files on disk after a selection has taken place.

Preventing selection looping

During file selection, a user may be bombarded with the file picker associated with <input type="file"/> (in a "must choose" loop that forces selection before the file picker is dismissed) and a user agent may prevent file access to any selections by making the FileList object returned be of size 0.

System-sensitive files

(e.g. files in /usr/bin, password files, and other native operating system executables) typically should not be exposed to web content, and should not be accessed via blob URLs . User agents may throw a SecurityError exception for synchronous read methods, or return a SecurityError exception for asynchronous reads.

This section is provisional; more security data may supplement this in subsequent drafts.

10. Requirements and Use Cases

This section covers what the requirements are for this API, as well as illustrates some use cases. This version of the API does not satisfy all use cases; subsequent versions may elect to address these.

Acknowledgements

This specification was originally developed by the SVG Working Group. Many thanks to Mark Baker and Anne van Kesteren for their feedback.

Thanks to Robin Berjon, Jonas Sicking and Vsevolod Shmyroff for editing the original specification.

Special thanks to Olli Pettay, Nikunj Mehta, Garrett Smith, Aaron Boodman, Michael Nordman, Jian Li, Dmitry Titov, Ian Hickson, Darin Fisher, Sam Weinig, Adrian Bateman and Julian Reschke.

Thanks to the W3C WebApps WG, and to participants on the public-webapps@w3.org listserv

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example" , like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note" , like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

.

Index

Terms defined by this specification

https://fetch.spec.whatwg.org/#concept-read-all-bytes-from-readablestream Referenced in: 3.3.3. The text() method 3.3.4. The arrayBuffer() method 6.5.1.2. The readAsText() 6.5.1.3. The readAsDataURL() method 6.5.1.4. The readAsArrayBuffer() method 6.5.1.5. The readAsBinaryString() method https://infra.spec.whatwg.org/#byte https://streams.spec.whatwg.org/#chunk Referenced in: 3. The Blob Interface and Binary Data

Terms defined by reference

References

Normative References

[DOM]
Anne van Kesteren. DOM Standard . Living Standard. URL: https://dom.spec.whatwg.org/
[ECMA-262]
ECMAScript Language Specification . URL: https://tc39.es/ecma262/
[ENCODING]
Anne van Kesteren. Encoding Standard . Living Standard. URL: https://encoding.spec.whatwg.org/
[Fetch]
Anne van Kesteren. Fetch Standard . Living Standard. URL: https://fetch.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard . Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard . Living Standard. URL: https://infra.spec.whatwg.org/
[MEDIA-SOURCE]
Matthew Wolenetz; et al. Media Source Extensions™ . 17 November 2016. REC. URL: https://www.w3.org/TR/media-source/
[MIMESNIFF]
Gordon P. Hemsley. MIME Sniffing Standard . Living Standard. URL: https://mimesniff.spec.whatwg.org/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels . March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[RFC2397]
L. Masinter. The "data" URL scheme . August 1998. Proposed Standard. URL: https://tools.ietf.org/html/rfc2397
[RFC4122]
P. Leach; M. Mealling; R. Salz. A Universally Unique IDentifier (UUID) URN Namespace . July 2005. Proposed Standard. URL: https://tools.ietf.org/html/rfc4122 [STREAMS] Adam Rice; Domenic Denicola; 吉野剛史 (Takeshi Yoshino). Streams Standard . Living Standard. URL: https://streams.spec.whatwg.org/
[URL]
Anne van Kesteren. URL Standard . Living Standard. URL: https://url.spec.whatwg.org/
[WebIDL]
Boris Zbarsky. Web IDL . 15 December 2016. ED. URL: https://heycam.github.io/webidl/
[XHR]
Anne van Kesteren. XMLHttpRequest Standard . Living Standard. URL: https://xhr.spec.whatwg.org/

Informative References

[SVG2]
Amelia Bellamy-Royds; et al. Scalable Vector Graphics (SVG) 2 . 4 October 2018. CR. URL: https://www.w3.org/TR/SVG2/ [Workers] Ian Hickson. Web Workers . 24 September 2015. WD. URL: https://www.w3.org/TR/workers/

IDL Index

[Exposed=(Window,Worker), Serializable]
interface Blob {
  ,

  constructor(optional sequence<BlobPart> blobParts,
              optional BlobPropertyBag options = {});
  readonly attribute unsigned long long size;
  readonly attribute DOMString type;
  // slice Blob into byte-ranged chunks
  ,
            ,
            );

  Blob slice(optional [Clamp] long long start,
            optional [Clamp] long long end,
            optional DOMString contentType);
  // read from the Blob.
  [NewObject] ReadableStream stream();
  [NewObject] Promise<USVString> text();
  [NewObject] Promise<ArrayBuffer> arrayBuffer();
};
enum EndingType { "transparent", "native" };
dictionary BlobPropertyBag {
   = "";
   = "transparent";

  DOMString type = "";
  EndingType endings = "transparent";
};
typedef (BufferSource or Blob or USVString) BlobPart;
[Exposed=(Window,Worker), Serializable]
interface File : Blob {
  ,
              ,

  constructor(sequence<BlobPart> fileBits,
              USVString fileName,
              optional FilePropertyBag options = {});
  readonly attribute DOMString name;
  readonly attribute long long lastModified;
};
dictionary FilePropertyBag : BlobPropertyBag {
  ;

  long long lastModified;
};
[Exposed=(Window,Worker), Serializable]
interface FileList {
  getter File? item(unsigned long index);
  readonly attribute unsigned long length;
};
[Exposed=(Window,Worker)]
interface FileReader: EventTarget {
  constructor();
  // async read methods
  void readAsArrayBuffer(Blob blob);
  void readAsBinaryString(Blob blob);
  void readAsText(Blob blob, optional DOMString encoding);
  void readAsDataURL(Blob blob);
  void abort();
  // states
  const unsigned short EMPTY = 0;
  const unsigned short LOADING = 1;
  const unsigned short DONE = 2;
  readonly attribute unsigned short readyState;
  // File or Blob data
  readonly attribute (DOMString or ArrayBuffer)? result;
  readonly attribute DOMException? error;
  // event handler content attributes
  attribute EventHandler onloadstart;
  attribute EventHandler onprogress;
  attribute EventHandler onload;
  attribute EventHandler onabort;
  attribute EventHandler onerror;
  attribute EventHandler onloadend;
};
[Exposed=(DedicatedWorker,SharedWorker)]
interface FileReaderSync {
  constructor();
  // Synchronously return strings
  ArrayBuffer readAsArrayBuffer(Blob blob);
  DOMString readAsBinaryString(Blob blob);
  DOMString readAsText(Blob blob, optional DOMString encoding);
  DOMString readAsDataURL(Blob blob);
};
[Exposed=(Window,DedicatedWorker,SharedWorker)]
partial interface URL {
  static DOMString createObjectURL((Blob or MediaSource) obj);
  static void revokeObjectURL(DOMString url);
};

Issues Index

In at least Chrome’s IndexedDB implementation, this copying of the data of blobs is only done when a transaction is committed (and failure to read the blob will cause the commit to fail).
The actual storage API serialized was persisted in will need a way of modifying the read algorithms for deserialized blobs. I.e. a Blob that was deserialized from IndexedDB should start throwing in its read steps after clear-site-data clears all IndexedDB data. Somehow let StructuredDeserialize pass along a hook from the storage API to here? <https://github.com/w3c/webappsec-clear-site-data/issues/49>
We need to specify more concretely what reading from a Blob actually does, what possible errors can happen, perhaps something about chunk sizes, etc. <https://github.com/w3c/FileAPI/issues/144>
We might change loadstart to be dispatched synchronously, to align with XMLHttpRequest behavior. <https://github.com/w3c/FileAPI/issues/119>
Better specify how the DataURL is generated. <https://github.com/w3c/FileAPI/issues/104>
This needs a similar hook when a worker is unloaded.
This section is provisional; more security data may supplement this in subsequent drafts.
#file-type-guidelines Referenced in: 3.2. Attributes