Date: September 6, 2012 Author: Jeroen Wijering

FOMS 2012 Recap: Here Come The Text Tracks!

FOMS (Foundations of Open Media Software) is an annual unconference for media engineers, known for its attitude of getting things done. This year’s edition – held in Paris, France – again had a great mix of attendees representing codec manufacturers, media frameworks, web browsers and video players.


Text Tracks

On the web browser side, the biggest topic was the implementation ofand WebVTT. These can be used to add interactivity and accessibility to video elements. See our previous blog post on text tracks for more info.

At FOMS, both Opera and Chrome demoed working text track implementations. For Opera, this functionality will probably ship with version 12.5, while Chrome users have to wait until version 23. Safari 6 and Internet Explorer 10 will havesupport too, but Firefox is not actively working on it.

Despite all the progress and working implementations, the WebVTT specification is not yet done. Current outstanding issues are the implementation of roll-up captions (for live broadcasting, like this example) and the ability to store CSS in WebVTT – for players like VLC or Flash, who cannot access the webpage. Both items were heavily discussed during the workshop and proposals for implementation were filed with W3C.

Default HTML5 Captions Style

Beyond Captions

Though captions in themselves are great, HTML5 Text Tracks can do a lot more. At FOMS, we saw several demos to show applications of WebVTT beyond captions. The demos we presented are listed below.

Note: you need a browser with text track support to see the demos:

  • Preview Thumbs: these thumbnails, known from Hulu and YouTube, pop-up when hovering the seek bar. The thumbs are implemented using a JPG sprite and a WebVTT file that links to the individual thumbs with an xywh fragment query.
  • Chapter Markers: this demo prints chapter markers on an alternative seek bar for the video. When clicking a marker, the browser seeks to the start of that chapter.
  • Slide Syncing: in this demo, related artworks are displayed for certain ranges of the video. This kind of video-page interaction, now easily implemented, has many applications (PowerPoint presentations, sports statistics, etc).
  • Timeline Search: this demo allows you to search the text tracks of a video to retrieve in-video search results. Widespread use of WebVTT will likely lead to search engines applying this trick on a much larger scale.

Thumbstrip example

Other Developments

Another interesting subject at FOMS was the implementation of adaptive streaming in JavaScript. A team of Chrome engineers presented a new demo of the Media Source API, which allows the appending of arbitrary chunks of video to an already playing stream. As we noted in our HTML5 progress update last year, web developers can use this API to create adaptive streaming applications, using fMP4 or WebM video fragments (TS may be coming too, for anyone interested in building HLS support in HTML5). The API can be enabled in Chrome 23 using chrome://flags.

Many other interesting developments, like the new OPUS audio codec and EME content protection scheme, were covered. The FOMS website contains the detailed notes for all of our sessions. In summary, it became clear at FOMS 2012 that so much is going on in the open media scene, and many great tools are yet to come.


    • AlanKelly VerbatimIT

      October 18, 2012 at 2:22 pm

      Thank you all at LongTail, Jerome and others. I appreciate your work.

      Addressing the objective of producing true and accurate transcription, it’s not voice-to-text software that is really getting the job done, but skilled, practiced, dedicated humans listening and keying using whatever tools at their disposal (CAT, terms databases, etc).

      What would be helpful to “us” is a means of inserting the appropriate timed-text file so that the HTML5 can do its stuff.

      In truth and accuracy,

      Alan Kelly

    • Andy Freed

      October 18, 2012 at 2:38 pm

      We caption a lot of education videos and the timed text now fails to convey some important information in math and science videos. If the WebVTT can be modified with CSS, can it also be modified by other JS? It would be great to be able to include MathML snippets in the < track> and have either the math player in the browser read it with context, or having another js like MathJax render the equation properly as part of the caption.

    • JeroenW

      October 19, 2012 at 7:36 am

      @ALANKELLY: Agreed. Voice-to-text software can serve for producing a draft, but humans are needed to turn this into actual high-quality captions.

      @ANDY: This should be possible indeed, since CSS can be used to style captions and JavaScript can be used to manipulate captions. If you have a browser with text track support, you could already test this. There’s probably some tutorials out there that describe the CSS and JS needed for track cue manipulation.

Leave a comment

Your email address will not be published

We are glad you have chosen to leave a comment. Please keep in mind that comments are moderated according to our Comments Policy