Trends:


Sound Formats

Many ways of saving the same thing

by Garth Hjelte

How many spoken languages are there on the earth? Google it and you'll find out there are about 6,500. (You'll also find that the most popular is...Mandarin Chinese. Not English? It's #3, hairpin runner-up to Espanol by a slim million speakers.)

It doesn't take long to find out in our little audio world that there are many ways to express audio, in terms of file formats. We have single audio (WAVE, AIFF, MP3, CAF, and so on), and instrument formats (.nki, .exs, Giga, SoundFont, etc.).

Why so many formats? Don't they all express the same concept?

An audio file just has to contain the sample data plus a couple of common properties (sample rate, channels, bitrate). An instrument file just has to map the samples (LoKey, HiKey, etc.) and provide program parameters (envelopes, LFO, filters). So why have a Kontakt file, SFZ file, MachFive file, HALion file, Structure file, and so on and so on, when they essentially contain the same data? (If you disagree that they do, we'll talk about this later.)

Wouldn't it be nice to have a universal format that all sound editors and instrument players used? Something like "180 JamLoop.sound" or "Super Trumpet.instrument". It certainly is possible. What if some benign third-party concocted a standardized way of expressing a sample or an instrument? Think of an solid eloquent information document that would cover all common ways audio and instruments express themselves, and defined a expandable system where new parameters and information could be added for application-specific purposes?

Back to earth.

Exploring the problem

Musicians seem used to the fact that there are many audio and instrument file types, but that doesn't mean they like it. Perhaps we all just put up with it.

Audio formats revolve around two formats: WAVE and AIFF. Both are remarkably similar, in fact WAVE is actually called RIFF (Resource Interchange File Format). AIFF means Audio Interchange File Format and was designed by Apple (piece of trivia: Apple discourages anyone to call the format "AIFF" for some reason). RIFF and AIFF are based on a ISO-standard called (guess) IFF. IFF defines generic data methods of any type. RIFF and AIFF are IFF-variants for music.

(Digidesign's Sound Designer II was the only other significant audio format, but it died a fairly quick death due to the dependency on "resource forks," which is a Mac-exclusive disk feature that Apple deprecated a few years ago.)

WAVE vs. AIFF is really Microsoft vs. Apple and Intel vs. Motorola in a different form. RIFF/AIFF are very similar; the biggest difference is that WAVE is Intel-format (backwards) and AIFF is Motorola-format (forwards). Otherwise they might as well be the same file.

It used to be that Windows handled WAVE better then AIFF, and Macs handled AIFF better the WAVE. Applications converted the audio data internally to their native byte order to handle them.

Now, with Macs being Intel, is AIFF is on the way out? No. Modern computers are plenty fast enough to different byte-orders with ease. So the handling is mostly transparent.

Instrument formats are a whole different matter. There are many of them, including Native Instruments Kontakt (.nki plus perhaps six variants), GigaStudio (.gig plus some variants), Tascam Structure (.patch plus four variants), SFZ, SoundFont, MOTU MachFive...gasp gasp gasp...and that's just in the software world.

All these formats essentially do the same thing: define which samples play under what MIDI key (and a variety of other conditions, such as velocity, keyswitch, other) and supply the settings for the a set of real-time parameters (pitch, filter, amplitude). They really aren't that different.

When a new software engine is created, typically a new format plus some variants appear that correspond to the design of the engine. So a file format essentially says "this data is the native language of THIS sampler." It is a set of information that corresponds exactly to the data the playback engine needs to keep persistent.

Mostly, the instrument formats are unique but still use WAVE or AIFF files. But sometimes they add a special audio format too. Korg Triton, Alesis Fusion, and Yamaha Motif workstations use a unique audio format. Most "player" format variants like Kontakt, PLAY, Structure, Spectrasonics, Structure, HALion, or Garritan ARIA use special large-file file formats to encapsulate their encrypted audio data.

Back to the universe

Let's get back to this theoretical "universal instrument format" we were discussing earlier. There have only been three file formats I'm aware of that have been released to the public: SoundFont, DLS, and SFZ.

SoundFont was released for developers to design sounds for the Emu 8000 soundcard chip. DLS had the same general purpose, only it was meant to apply to many things. SFZ was released so people could make sounds for René Ceballos' SFZ Player. SF and SFZ formats were never really released with the intention of having other manufacturers use them for their purposes; DLS was intended to be public, but probably due to red tape and the adoption of SoundFont in the sound card area, it failed to gain any adoption.

This author made some suggestions to the MIDI Manufacturers Association many years ago to expand the SoundFont format to include "registered" parameters, so SoundFont could include a greater range of parameters and thus any engine could use it as its "native" format. None of these ideas got any traction, mainly because what actually gives a concept drive is when a big company or deep pockets adopt a standard and drive it.

One current (and seemingly lone) example of an engine adopting an open format is Garritan's Aria player, using SFZ as their format. But even this is an exception: Aria still had needs beyond what SFZ defined, so private correspondence was needed to implement these extra needs.

SFZ, though a great idea, is a mess right now regarding standardization and enforcement. Although it's clear who administrates the standard (Cakewalk and René Ceballos), there is no understood method of  format addition. And although some Cakewalk products have used the so-called "SFZ 2.0," still only SFZ 1.0 has been published. SFZ 2.0 has been documented by a third-party author (Simon Cann in Cakewalk Synthesizers-Hal Leonard) acting on his own. The SFZ 2.0 products are also unclear what they accept and what they don't accept.

Stepping back out of the mud and summarizing the situation for the past, present, and future, it seems that every new engine created will have its own file format. There has been no proposal by anyone that approaches a universal standard, plus there is no incentive I know of for manufacturers to use a universal format. SoundFont/DLS is history, and SFZ is in bed with an engine (Dimension) and a large company (Cakewalk/Roland).

Fortunately these engines have become better at reading in other formats, usually the more popular ones, and for the holes and gaps there are third-party utilities that offer conversion. Full disclosure: my company is one of those parties.

New audio standard

Although most people are used to using WAVE and AIFF so this really isn't a big problem, there is a newcomer to the audio format arena: CAF (Core Audio Format). This was created by Apple around 2006. To encourage adoption Apple converted their Logic 8 content (I believe 30GB of sounds) to CAF format.

CAF attempts to address some of the shortcomings of WAVE and AIFF, which, although minor, should be resolved to encourage modernity in the whole audio system. It's worthwhile to look at the spec to find out the advantages of CAF, which can be viewed at http://developer.apple.com/mac/library/documentation/MusicAudio/Reference/CAFSpec. (Also see the sidebar for CAF's advantages.)

CAF can handle any uncompressed or compressed audio, at any bitrate. Special chunks such as markers, loop info, and application-specific data are defined. While it is true that WAVE and AIFF support these too, CAF feels like it thought of these from the beginning.

Here's an additional CAF advantage: CAF files can contain an Overview chunk. Have you ever noticed the little files that result from view a sample in SoundForge, Wavelab, Kontakt, or other? These files that contain information that enables the app to draw the waveform onscreen in a faster manner then reading the whole file and calculating it from scratch.

The files are not necessary, but they speed up the app by a significant rate. You can delete these files when they get messy, but why have them in the first place?

CAF files can store Overview information as part of the file, thus allowing better file management on your system for you. This is cool.

But-and this is a BIG but-CAF has one significant disadvantage: no one is really using it yet, even after a couple years of existence. AFAIK there is no program as of this writing that can even write CAF files!

Plus CAF files have a certain amount of reliance on Mac APIs, so their usefulness on Windows machines remains to be seen. (APIs, or in this case "codecs," are embedded code routines that compress or decompress sound data streams. Apple's CAF files in their Logic distribution are often compressed to Apple Lossless or Apple Lossy formats, which require Apple APIs to decode them.)

CAF files are a good idea and encourage mature technology. I would hope that software manufacturers would take note and include support for CAF in their programs, but they haven't yet. Perhaps the market is saying "CAF files aren't necessary"-and they may not be, since WAVE and AIFF do take care of 99.999% of uses, and are completely proven throughout time.

In my opinion, the biggest drawback of WAVE/AIFF is the 4GB limit, which is little more than a day's length of 16-bit 44.1kHz audio. But I'm sure larger files would have a use.

Closing thoughts

In spite of the drive to move forward, we can be counted fortunate that the file formats we use cause us few problems. Although the plethora of instrument formats can be a hassle from time to time, I can say in most cases there always has been a solution to any problem it's presented.

Garth Hjelte is the president of Chicken Systems, a software company that deals with format translation and also offers other sampler tools.





SIDE BAR - Benefits to CAF (as written in the Apple CAF spec)

Unrestricted file size: Whereas AIFF, AIFF-C, and WAV files are limited in size to 4 gigabytes, which might represent as little as 15 minutes of audio at high sample and bit rates, CAF files use 64-bit file offsets, eliminating practical limits. A standard CAF file can hold audio data with a playback duration of hundreds of years.

Safe and efficient recording: Applications writing AIFF and WAV files must either update the data header's size field at the end of recording, which can result in an unusable file if recording is interrupted before the header is finalized, or they must update the size field after recording each packet of data, which is inefficient. With CAF files, in contrast, an application can append new audio data to the end of the file in a manner that allows it to determine the amount of data even if the size field in the header has not been finalized.

Support for many data formats: CAF files serve as wrappers for a wide variety of audio data formats. The flexibility of the CAF file structure and the many types of metadata that can be recorded enable CAF files to be used with practically any type of audio data. Furthermore, CAF files can store any number of audio channels.

Support for many types of auxiliary data: In addition to audio data, CAF files can store text annotations, markers, channel layouts, and many other types of information that can help in the interpretation, analysis, or editing of the audio.

Support for data dependencies: Certain metadata in CAF files is linked to the audio data by an edit count value. You can use this value to determine when metadata has a dependency on the audio data, and furthermore when the audio data has changed since the metadata was written.



RETURN TO TOP





2nd_page_header.gif