What’s new in HTML5: The Track Element

One of the more exciting developments in HTML5 video is the inclusion of the track element in the newest versions of the desktop browsers. In addition to bringing captioning and subtitle support to HTML5 video, the invisible track element allows publishers to attach a rich array of textual metadata to their videos. In this blog post, we’ll look at the different types of tracks that can be used in conjunction with the <video> tag.

Browser Support

First, the bad news. The track element is extremely new, and browser support is growing, but limited. The current version of Chrome supports it, but the functionality must be enabled via a configuration option (go to chrome://flags, Enable <track> element). Internet Explorer 10, which is available in the Windows 8 developer preview, also has a working implementation. Mozilla is working on it for Firefox, but no timetable has been given for when it will be complete. In short, the track element is future tech, but luckily we can begin working with it today.

WebVTT: A New Format for Text Tracks

The web standards community has developed a new standard, called WebVTT (Web Video Text Tracks), which will be supported by all browsers implementing the track element. WebVTT provides a simple, extensible, and human-readable format on which to build text tracks. Although it is based on SRT (a popular subtitling format), a few tweaks have been made to the format. For content creators who already have subtitles in SRT, a no-frills converter is available.

Here’s a very simple example of a WebVTT file:

WEBVTT 00:00.000 --> 00:10.000 This text is related to the first ten seconds of the video 00:10.000 --> 00:20.000 This text is related to the next ten seconds of the video

In this example, the file contains two timed segments, called cues. These cues can come in many flavors, up to and including full HTML.

Here’s how to embed a video with a text track:

<video controls> <source src="video.mp4" type="video/mp4" /> <source src="video.webm" type="video/webm" /> <track kind="subtitles" src="subtitles.vtt" /> </video>

One Element, Many Uses

One of the reasons the track element is so captivating is its versatility. It can be used to make video accessible, to organize content that occurs within a video, to enable more robust interactions, and much more. This type is specified in the track element by setting the kind attribute. There are currently five different values the kind attribute can be set to: subtitles, captions, descriptions, chapters and metadata.

Accessibility: Captions, Subtitles and Descriptions

Let’s take a quick look at the first three text track types, subtitles, captions and descriptions. On the surface, they may seem similar, but they actually serve different purposes.

  • Subtitles are what you might expect to see while watching a foreign-language film — they’re a transcription or translation of the video’s dialogue.
  • Captions, on the other hand are designed for viewers who can’t hear the audio of the video, and include descriptions of non-dialogue sound. For example, if a character in a video slams a door off-camera, the captions would include something like [door slams]. Both subtitles and captions are displayed by the browsers as text overlays on top of the playing video.
  • Descriptions are not displayed visually, but are rather spoken out loud by a screen reader, benefitting viewers who can’t see the video. Not surprisingly, descriptions describe what’s happening visually in the scene.

All three of these kinds of tracks combine to make a video accessible to more viewers, and, as we’ll discuss later, to search engines as well.

Chapters: Navigating the Video

One of the more difficult problems to solve in web video has been how to index and recall discrete segments of content within a longer video. This is especially true when the different sub-segments pertain to dramatically different subjects. Publishers are either required to break up the video into more manageble chunks and tag the smaller chunks appropriately, or use complicated tools or scripts to synchronize the video player with an external index.

Using chapter text tracks, publishers can organize their long-form content in a WebVTT file which is embedded alongside of the video. Although current browser implementations do not yet do anything with chapter tracks, one can safely assume that they will do so in the future. In the meantime, developers can access the information contained in the chapters track via JavaScript and use it to build their own chapter interfaces.

Metadata

The track element supports one additional type of text track, metadata, which is at the same time vague and extremely powerful. Metadata tracks allow developers to synchronize any information they wish with time points within a video. When the time point described in the cue is reached, a JavaScript event will fire, and the text contained in the cue is passed to the script. A simple example could be latitude and longitude coordinates which correspond to certain time points within a video. A script could listen for these cues, and update a map with the current coordinates as they change in the video.

The possibile use cases for metadata tracks are virtually limitless, and we’ll explore some of these in more detail in a future post.

Making Video Content More Searchable

We’ve discussed how the text tracks are interpreted by the browser and displayed to a viewer, but this only scratches the surface of what’s possible once videos are annotated by text tracks. Search engines can use the contextual information contained in the tracks to correlate search queries to specific points within in a video. Because the tracks are separated logically, a search engine can prioritize results based on the length of a related segment, the frequency with which the search term appears in the video, and even whether the subject of the search term appears visually in the scene, regardless of whether or not the word itself is spoken.

Furthermore, a search engine can make use of translation engines to open up search results to users who speak different languages from the language used in the source video. The subtitle tracks themselves could theoretically also be translated automatically by the browser. Although a human translation is obviously preferable, this approach allows many more viewers to engage with the content at very little additional cost.

Captions in the JW Player

No discussion (at least on this blog) would be complete without a note on support in the JW Player for the topic at hand. Although the current player can display SRT captions through the Captions plugin, this support will become much more tightly integrated into the upcoming version 6.0 of the JW Player, which will support WebVTT as well. Here’s a sneak peak at what captions selection will look like in the new player:

JW Player 6 Captions Support

Where Do We Go From Here?

As we’ve seen, text tracks aren’t just for subtitles – there’s a virtually limitless range of applications for them. Over the next few weeks, we’ll be posting demos and examples showing the track element in action. I’ll also be presenting on the track element at this year’s DevCon5 conference later this month. [Update: slides and demos here.] So stay tuned!

10 Comments

  1. Derek August 21, 2012 - 03:15 EDT

    This is very good news! I’m especially interested in learning how to display a caption or audio description when a video reaches a certain time parameter.

    Please try to make this development available with playlists as well as youtube videos.

    Thanks!

  2. Rikudou_Sennin February 28, 2013 - 07:19 EDT

    I made this simple converter from .ass subtitles to .vtt subtitles:
    http://rikudou.naruto-sekai.com/subtitles/ass-to-vtt.php
    You just upload your .ass file and then you download .vtt file.
    I hope it will help :)

  3. Matt L September 26, 2012 - 11:57 EDT

    Apparently Safari 6 now supports
    : https://developer.apple.com/technologies/safari/html5.html

  4. PabloS July 22, 2012 - 05:00 EDT

    @Adrian -

    Yes, we’ll be adding WebVTT support to the player in Flash as well as HTML5. In HTML5 mode, until we have full browser adoption of the track element, we’ll be providing WebVTT support via an HTML overlay, which will work everywhere except for mobile devices in fullscreen mode (since the overlay is not present in that case). For that, you’ll still need to use embedded captions.

    As for the more advanced features (positioning, CSS, etc), we’ll likely not be implementing those right away, if at all. Chrome hasn’t yet implemented these features either.

  5. Media producer July 19, 2012 - 01:19 EDT

    I’m quite happy to see that the HTML5 standard is being fleshed out with regard to media and media integration.
    While I doubt getting a handle on all these things will get ‘simpler’ at least it might all be in relatively one place/language/format.

    Thanks for the post, nice update!

  6. Adrian July 19, 2012 - 11:28 EDT

    I’m curious whether you’re planning on including webvtt support with the flash version of the v6 player, or just the html5 version?
    Will you also be supporting then more advanced features of webvtt like positioning, tagging, formatting etc?

  7. Timoto July 12, 2012 - 05:21 EDT

    At last it’s starting to roll out.

    Is there any news on what the search engines are actually doing about the new format ?

  8. Timoto July 12, 2012 - 07:03 EDT

    Also, any rumours on how iOS might take up on this ?

  9. Timoto July 17, 2012 - 03:34 EDT

    @PabloS

    As you may know that is my example you are pointing to.

    Whilst it’s nice to hear about the new element, it’s too early to start implementing it’s use for such limited browser support.

    Before I get involved with this new element I’d like to solve a more current issue with delivering embedded/external captions:

    http://www.longtailvideo.com/support/forums/jw-player/bug-reports/25796/mode-should-fallback-when-accepted-file-type-is-not-found-ff#comment-153584

  10. PabloS July 13, 2012 - 03:21 EDT

    @Timoto -

    No news yet on how the search engines will interpret text track content, but it’s a safe bet to assume that they’ll incorporate them somehow.

    iOS will presumably also support external text tracks at some point. In fact the current version of iOS supports embedded captions within MP4 files (we have an example of this with the Captions plugin).

Post a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>