Although audio and video have been a popular part of the Web for many years on sites such as YouTube, Dailymotion, and SoundCloud, they’ve always been second-class citizens, relying on third-party plug-ins (especially Flash) to operate. A reliance on plug-ins isn’t a good long-term plan for website owners or browser makers, not least because the user is responsible for keeping them up-to-date. Surely, updating audio and media capabilities along with the browser is best, especially since most browsers have a frequent update cycle nowadays.
Plug-ins can also cause stability issues, as Steve Jobs famously noted when detailing the reasons for not supporting Flash on iOS:
We also know firsthand that Flash is the number one reason Macs crash. We have been working with Adobe to fix these problems, but they have persisted for several years now. We don’t want to reduce the reliability and security of our iPhones, iPods, and iPads by adding Flash.
For these reasons, having audio and video natively in the browser is extremely important, and one of the big pushes in HTML5 has been toward making this happen in the form of a pair of media elements that are widely implemented in browsers today. These elements control playback of audio and video from multiple sources, and in addition to robust HTML implementation, they also have an extensive API and set of events, giving developers granular control over what they can do. Being part of the web platform, they interact well with other content—far beyond what was possible with sand-boxed plug-ins—making the new media elements true first-class web citizens.
After exploring the media elements, I’ll take a brief look at the future of multimedia on the Web, from audio mixing and effects APIs to the future (or at least a possible future) of real-time voice, video, and data communication—the WebRTC project.