Audio sprites

April 13th, 2011. Tagged: ffmpeg, mobile, Music, performance

Another "brilliant" idea that I had recently - how about combining audio files into a single file to reduce HTTP requests, just like we do with CSS sprites? Then use the audio APIs to play only selected parts of the audio. Unlike pretty much all brilliant ideas I have, I decided to search for this one before I dive in. Turned out Remy Sharp has already talked about this. So I knew it was possible and wanted to check the server-side or things. (Remy is amazing, by the way, and I was happy to have him as a reviewer of "JavaScript Patterns")

Here's the demo - parts of Voodoo Chile covered by yours truly.

Playing separate files

Markup is a few audio elements:

<audio id="in">
  <source src="in.mp3">
  <source src="in.ogg" type="video/ogg">
</audio>
<audio id="1">
  <source src="1.mp3">
  <source src="1.ogg" type="video/ogg">
</audio>
<audio>
  ...

I have the files:

  1. in.mp3 - intro
  2. 1.mp3 - figure 1
  3. 2.mp3 - figure 2
  4. out.mp3 - the end

1 and 2 repeat a few times.

I play these files in JavaScript using a next() iterator function, which contains (in a private closure) the melody (which file after which) and a pointer to the current file being played. After play()-ing each audio, I subscribe to the "ended" event and play the next() file.

var thing = 'the thing';
var next = (function() {  
    //log('#: file');
    //log('-------');
    var these = ['in', '1', '2', '1', '2', '1', '2', '1', 'out'],
        current = 0;
    return function() {
        thing = document.getElementById(these[current]);
        //log(current + ': ' + these[current] + ' ' + thing.currentSrc);
        thing.play();
        
        if (current < these.length - 1) {
            thing.addEventListener('ended', next, false);
            current++;
        } else {
            current = 0;
        }        
    }
}());

That was easy enough. And worked (except on iPhone, see later). But we should be able to do better with sprites.

Sprite

For the sprite I have this audio:

<audio id="sprite">
  <source src="combo.mp3">
  <source src="combo.ogg" type="video/ogg">
</audio>

There's only one file - combo.mp3 which contains all the other four files played one after the other.

So we need to know the start and the length of each piece of audio. There are two parts to playing the sprite. First is knowing the lenghts and the "song" (meaning the succession of audios) and starting to play:

var sprites = {
      // id: [start, length]
       'in': [0, 3.07],
        '1': [3.07,  2.68],
        '2': [3.07 + 2.68,  2.68],
        out: [3.07 + 2.68 + 2.68, 11.79]
    },
    song = ['in', '1', '2', '1', '2', '1', '2', '1', 'out'],
    current = 0,
    id = song[current],
    start = 0,
    end = sprites[id][1],
    interval;
 
thing = document.getElementById('sprite');
thing.play();

Next is "listening" and stopping when one audio should be stopped, then seeking through the file and playing another part of it. This is done with a setInterval(), I couldn't find a better audio event to listen to.

// change
interval = setInterval(function() {
    if (thing.currentTime > end) {
        thing.pause();
        if (current === song.length - 1) {
            clearInterval(interval);
            return;
        }
        current++;
        
        id = song[current];
        start = sprites[id][0];
        end = start + sprites[id][1]
        thing.currentTime = start;
        thing.play();
        log(id + ': start: ' + sprites[id].join(', length: '));
    }
}, 10);

And this is it. The property currentTime is read/write - you can figure out where we are and also fast-forward or rewind to where you want to go.

Results

  • Sprites play fine in FF, Chrome, O, Safari, iPhone's mobile webkit.
  • I haven't tested IE9.
  • All the browsers played y stuff off by a few milliseconds, I think some early, some late. This is probably due to unreliable setTimeout(). Also I didn't cut the audio pieces very well, so that might have someting to do. Also adding a few milliseconds of silence between the sprites may help. A follow up experiment will be to have a piano of sorts and see how timely the audio is played after a click/button press.
  • iPhone didn't play properly the non-sprited verison - I believe because it won't let you autoplay unless there's a user action. There might be a workaround, but I only cared about the sprites and they are fine!

Server side

I was imagining the whole thing as a combo service like YUI's JS/CSS combo handler. The browser says: i need these 5 files, the server then creates a new audio file and sends it back. In this case it should also somehow send the data about start/length of each audio, so maybe a JSONP thing. I was mostly curious about those file formats and how the stitching would work.

In terms of file formats, it's not that bad, turns out all I need is MP3 and OGG in order to support all these browsers (I was prepared for worse).

(I could also probably support IE3? and above with a <bgsound> and a WAV, but the WAV is too big to be practical. So any IE (before 9) enthusiasm should probably end up in Flash.)

I recorded my audio pieces in Garage Band and exported as MP3.

ffmpeg is teh tool! It's like imagemagick for audio/video.

Cutting out extra 4-5 seconds Garage Band adds to each file you export:

$ ffmpeg -i in.mp3 -ss 0 -t 2.43 in-ok.mp3

(I didn't do that very precisely I think)

Converting MP3 to OGG is like:

$ ffmpeg -i in.mp3 -acodec vorbis -aq 60 -strict experimental in.ogg

Then the stitching.

MP3 files can actually be concatenated together just like JS/CSS, provided they have the same bitrate. I've done it in the past.

You can also combine by reading the files and piping them into ffmpeg. That somehow feels better:

$ cat in.mp3 1.mp3 2.mp3 out.mp3 | ffmpeg -i - combo.mp3

You can also consider putting a bit of silence between the separate audios.

In order to get the length info to return it to the client, you can use ffmpeg -i filename.mp3 (I haven't done that part)

OGGs cannot be concatenated like MP3, so the combo service should `cat` the mp3s as shown above, then convert to OGG (also shown 🙂 )

Voila

You can now roll your own on-demand audio combo handler and use audio sprites to have fewer HTTP requests a more responsive app/game/html5 thing.

Tell your friends about this post on Facebook and Twitter

Sorry, comments disabled and hidden due to excessive spam.

Meanwhile, hit me up on twitter @stoyanstefanov