
Trends:
Sound
Formats
Many
ways of saving the same thing
by
Garth Hjelte
How many spoken
languages are there on the earth? Google it and you'll find out there
are about 6,500. (You'll also find that the most popular is...Mandarin
Chinese. Not English? It's #3, hairpin runner-up to Espanol by a slim
million speakers.)
It doesn't take
long to find out in our little audio world that there are many ways to
express audio, in terms of file formats. We have single audio (WAVE,
AIFF, MP3, CAF, and so on), and instrument formats (.nki, .exs, Giga,
SoundFont, etc.).
Why so many
formats? Don't they all express the same concept?
An audio file
just has to contain the sample data plus a couple of common properties
(sample rate, channels, bitrate). An instrument file just has to map
the samples (LoKey, HiKey, etc.) and provide program parameters
(envelopes, LFO, filters). So why have a Kontakt file, SFZ file,
MachFive file, HALion file, Structure file, and so on and so on, when
they essentially contain the same data? (If you disagree that they do,
we'll talk about this later.)
Wouldn't it be
nice to have a universal format that all sound editors and instrument
players used? Something like "180 JamLoop.sound" or "Super
Trumpet.instrument". It certainly is possible. What if some benign
third-party concocted a standardized way of expressing a sample or an
instrument? Think of an solid eloquent information document that would
cover all common ways audio and instruments express themselves, and
defined a expandable system where new parameters and information could
be added for application-specific purposes?
Back to earth.
Exploring
the problem
Musicians seem
used to the fact that there are many audio and instrument file types,
but that doesn't mean they like it. Perhaps we all just put up with it.
Audio formats
revolve around two formats: WAVE and AIFF. Both are remarkably similar,
in fact WAVE is actually called RIFF (Resource Interchange File
Format). AIFF means Audio Interchange File Format and was designed by
Apple (piece of trivia: Apple discourages anyone to call the format
"AIFF" for some reason). RIFF and AIFF are based on a ISO-standard
called (guess) IFF. IFF defines generic data methods of any type. RIFF
and AIFF are IFF-variants for music.
(Digidesign's
Sound Designer II was the only other significant audio format, but it
died a fairly quick death due to the dependency on "resource forks,"
which is a Mac-exclusive disk feature that Apple deprecated a few years
ago.)
WAVE vs. AIFF is
really Microsoft vs. Apple and Intel vs. Motorola in a different form.
RIFF/AIFF are very similar; the biggest difference is that WAVE is
Intel-format (backwards) and AIFF is Motorola-format (forwards).
Otherwise they might as well be the same file.
It used to be
that Windows handled WAVE better then AIFF, and Macs handled AIFF
better the WAVE. Applications converted the audio data internally to
their native byte order to handle them.
Now, with Macs
being Intel, is AIFF is on the way out? No. Modern computers are plenty
fast enough to different byte-orders with ease. So the handling is
mostly transparent.
Instrument
formats are a whole different matter. There are many of them, including
Native Instruments Kontakt (.nki plus perhaps six variants), GigaStudio
(.gig plus some variants), Tascam Structure (.patch plus four
variants), SFZ, SoundFont, MOTU MachFive...gasp gasp gasp...and that's
just in the software world.
All these
formats essentially do the same thing: define which samples play under
what MIDI key (and a variety of other conditions, such as velocity,
keyswitch, other) and supply the settings for the a set of real-time
parameters (pitch, filter, amplitude). They really aren't that
different.
When a new
software engine is created, typically a new format plus some variants
appear that correspond to the design of the engine. So a file format
essentially says "this data is the native language of THIS sampler." It
is a set of information that corresponds exactly to the data the
playback engine needs to keep persistent.
Mostly, the
instrument formats are unique but still use WAVE or AIFF files. But
sometimes they add a special audio format too. Korg Triton, Alesis
Fusion, and Yamaha Motif workstations use a unique audio format. Most
"player" format variants like Kontakt, PLAY, Structure, Spectrasonics,
Structure, HALion, or Garritan ARIA use special large-file file formats
to encapsulate their encrypted audio data.
Back
to the universe
Let's get back
to this theoretical "universal instrument format" we were discussing
earlier. There have only been three file formats I'm aware of that have
been released to the public: SoundFont, DLS, and SFZ.
SoundFont was
released for developers to design sounds for the Emu 8000 soundcard
chip. DLS had the same general purpose, only it was meant to apply to
many things. SFZ was released so people could make sounds for
René Ceballos' SFZ Player. SF and SFZ formats were never really
released with the intention of having other manufacturers use them for
their purposes; DLS was intended to be public, but probably due to red
tape and the adoption of SoundFont in the sound card area, it failed to
gain any adoption.
This author made
some suggestions to the MIDI Manufacturers Association many years ago
to expand the SoundFont format to include "registered" parameters, so
SoundFont could include a greater range of parameters and thus any
engine could use it as its "native" format. None of these ideas got any
traction, mainly because what actually gives a concept drive is when a
big company or deep pockets adopt a standard and drive it.
One current (and
seemingly lone) example of an engine adopting an open format is
Garritan's Aria player, using SFZ as their format. But even this is an
exception: Aria still had needs beyond what SFZ defined, so private
correspondence was needed to implement these extra needs.
SFZ, though a
great idea, is a mess right now regarding standardization and
enforcement. Although it's clear who administrates the standard
(Cakewalk and René Ceballos), there is no understood method
of format addition. And although some Cakewalk products have used
the so-called "SFZ 2.0," still only SFZ 1.0 has been published. SFZ 2.0
has been documented by a third-party author (Simon Cann in Cakewalk
Synthesizers-Hal Leonard) acting on his own. The SFZ 2.0 products are
also unclear what they accept and what they don't accept.
Stepping back
out of the mud and summarizing the situation for the past, present, and
future, it seems that every new engine created will have its own file
format. There has been no proposal by anyone that approaches a
universal standard, plus there is no incentive I know of for
manufacturers to use a universal format. SoundFont/DLS is history, and
SFZ is in bed with an engine (Dimension) and a large company
(Cakewalk/Roland).
Fortunately
these engines have become better at reading in other formats, usually
the more popular ones, and for the holes and gaps there are third-party
utilities that offer conversion. Full disclosure: my company is one of
those parties.
New
audio standard
Although most
people are used to using WAVE and AIFF so this really isn't a big
problem, there is a newcomer to the audio format arena: CAF (Core Audio
Format). This was created by Apple around 2006. To encourage adoption
Apple converted their Logic 8 content (I believe 30GB of sounds) to CAF
format.
CAF attempts to
address some of the shortcomings of WAVE and AIFF, which, although
minor, should be resolved to encourage modernity in the whole audio
system. It's worthwhile to look at the spec to find out the advantages
of CAF, which can be viewed at
http://developer.apple.com/mac/library/documentation/MusicAudio/Reference/CAFSpec.
(Also see the sidebar for CAF's advantages.)
CAF can handle
any uncompressed or compressed audio, at any bitrate. Special chunks
such as markers, loop info, and application-specific data are defined.
While it is true that WAVE and AIFF support these too, CAF feels like
it thought of these from the beginning.
Here's an
additional CAF advantage: CAF files can contain an Overview chunk. Have
you ever noticed the little files that result from view a sample in
SoundForge, Wavelab, Kontakt, or other? These files that contain
information that enables the app to draw the waveform onscreen in a
faster manner then reading the whole file and calculating it from
scratch.
The files are
not necessary, but they speed up the app by a significant rate. You can
delete these files when they get messy, but why have them in the first
place?
CAF files can
store Overview information as part of the file, thus allowing better
file management on your system for you. This is cool.
But-and this is
a BIG but-CAF has one significant disadvantage: no one is really using
it yet, even after a couple years of existence. AFAIK there is no
program as of this writing that can even write CAF files!
Plus CAF files
have a certain amount of reliance on Mac APIs, so their usefulness on
Windows machines remains to be seen. (APIs, or in this case "codecs,"
are embedded code routines that compress or decompress sound data
streams. Apple's CAF files in their Logic distribution are often
compressed to Apple Lossless or Apple Lossy formats, which require
Apple APIs to decode them.)
CAF files are a
good idea and encourage mature technology. I would hope that software
manufacturers would take note and include support for CAF in their
programs, but they haven't yet. Perhaps the market is saying "CAF files
aren't necessary"-and they may not be, since WAVE and AIFF do take care
of 99.999% of uses, and are completely proven throughout time.
In my opinion,
the biggest drawback of WAVE/AIFF is the 4GB limit, which is little
more than a day's length of 16-bit 44.1kHz audio. But I'm sure larger
files would have a use.
Closing
thoughts
In spite of the
drive to move forward, we can be counted fortunate that the file
formats we use cause us few problems. Although the plethora of
instrument formats can be a hassle from time to time, I can say in most
cases there always has been a solution to any problem it's presented.
Garth
Hjelte is the president of Chicken Systems, a software company that
deals with format translation and also offers other sampler tools.
SIDE
BAR - Benefits to CAF (as written in the Apple CAF spec)
Unrestricted file size: Whereas
AIFF, AIFF-C, and WAV files are limited in size to 4 gigabytes, which
might represent as little as 15 minutes of audio at high sample and bit
rates, CAF files use 64-bit file offsets, eliminating practical limits.
A standard CAF file can hold audio data with a playback duration of
hundreds of years.
Safe and efficient recording: Applications
writing AIFF and WAV files must either update the data header's size
field at the end of recording, which can result in an unusable file if
recording is interrupted before the header is finalized, or they must
update the size field after recording each packet of data, which is
inefficient. With CAF files, in contrast, an application can append new
audio data to the end of the file in a manner that allows it to
determine the amount of data even if the size field in the header has
not been finalized.
Support for many data formats: CAF
files serve as wrappers for a wide variety of audio data formats. The
flexibility of the CAF file structure and the many types of metadata
that can be recorded enable CAF files to be used with practically any
type of audio data. Furthermore, CAF files can store any number of
audio channels.
Support for many types of auxiliary data:
In addition to audio data, CAF files can store text annotations,
markers, channel layouts, and many other types of information that can
help in the interpretation, analysis, or editing of the audio.
Support for data dependencies:
Certain metadata in CAF files is linked to the audio data by an edit
count value. You can use this value to determine when metadata has a
dependency on the audio data, and furthermore when the audio data has
changed since the metadata was written.
RETURN
TO TOP
