[vc_cta_button call_text=”Need quality, high end audio hosting?” title=”Start today” target=”_self” color=”btn_grey” icon=”wpb_document_music” size=”btn-large” position=”cta_align_right” href=”http://gravlab.com/library.html”]
(This is almost all taken from: Richard Griscom’s ‘Digital Audio in the Library (word / pdf)
Once audio has been sampled and converted to digital data, it can be processed and stored in a number of different formats. During the early development of digital audio, sound engineers devised formats for sampling and storing audio data that met the particular requirements of whatever operating system they happened to be using, and as a result, multiple formats emerged for the storage of digital audio.
The Internet pushed on, formats gained a toe hold, and companies gained ground for their applications (WMV, Real, QT, etc). During the 1980s, with the advent of the personal computer, microprocessors increased in speed and capacity, and then during the 1990s, network access became commonplace, and new formats were developed to make the most of these technological advances as well as to meet emerging needs for compressed streaming audio and streaming media. Instead of the technology settling down to one or two established formats—as has happened with audio and video media in the past—the number of formats has increased rather than decreased.
A regular user of the internet confronts dozens of media formats for audio and video, and at this point, it seems doubtful that any one format will prevail. Fortunately, today’s software can play ﬁles in most of the standard formats and ﬁles can be easily converted from one format to another.
In this section, we will review the formats used to capture, encode, and store digital audio. Of the dozens of audio formats that have been developed through the years, many are now used only infrequently, so we will look only at those that are likely to have some application in a library setting.
Formats and different versions of formats can become confusing. Sometimes the terms are applied to the codec, sometimes the audio ‘container’, or the computer algorithm that compresses and decompresses the audio, the ﬁle format that is used to store the compressed audio data, and the player that plays back the resulting audio ﬁle. For example, the Windows Media Encoder can be used to create compressed Windows Media Audio data stored in a Windows Media Audio (.wma) ﬁle, which can be played back using a number of different players, including the Windows Media Player. As I discuss digital audio, I will try to maintain distinctions in the terminology used for ﬁle formats, compression/decompression algorithms, software, and players. Here is a summary of the terminology used in this book:
digital audio data: the binary data that represents the audio
digital audio format: the format of the digital audio data
codec: a computer algorithm used to compress and decompress digital audio data in a particular audio format
digital audio ﬁle: a ﬁle containing digital audio data
digital audio ﬁle format: the format of a digital audio ﬁle
The most common area of confusion lies in the term “format.” A distinction should be made between the format of the digital audio ﬁle and the format of the digital audio that the ﬁle contains. Think of a pitcher containing a beverage: a pitcher is similar to an audio ﬁle. Instead of a beverage, an audio ﬁle contains audio data. Similarly, the type of pitcher (round or octagonal; plastic or glass) would correspond to the ﬁle format, and the type of beverage (lemonade, iced tea, margaritas) would correspond to the audio format. The ﬁle format and the audio format are different concepts, and they exist independently of each other.
An audio ﬁle consists of several parts: a header, the audio data, and, optionally, metadata and a wrapper. The header provides information about the data in the ﬁle—the sampling rate, number of channels, bit depth, and similar technical speciﬁcations. The audio data—the bits representing the samples taken of the audio—make up the bulk of the ﬁle. Audio ﬁles may also include metadata— text describing the content of the audio ﬁle (performer, copyright information, track name, source album, etc.)—and a wrapper, which controls use of the ﬁle. Digital rights management and streaming capability, for example, are usually provided by a wrapper.
Some digital audio formats are open standard, which means that the speciﬁcations of the format. The meaning behind that, on a practical basis, is how the data is structured, the algorithms used to encode the data: they are freely available, and use of the format is free of legal restrictions. Usually open formats are maintained by a national or international standards organiza tion. Advocates argue that use of open formats will help guarantee long-term access to data and encourage cooperative development of the formats.
A good example is the QuickTime ﬁle format, which can contain data in a number of different formats—for example, MP3 data or AAC data.
Other formats are proprietary; for these, a private concern—usually a com mercial enterprise—maintains control over the format and the release of details on its structure, encoding, and decoding. In many cases, the owner of the for mat will release information on the structure of ﬁle and how it is encoded but retain rights over the decoding algorithm. Owners of proprietary formats are interested in promoting use of their format and often take actions to discourage the use of competing formats.
Some proprietary formats are actually based on open formats. Apple, for example, sells tracks on its iTunes Music Store in a proprietary format that uses AAC-encoded audio (an open format) with a proprietary digital rights manage ment wrapper that restricts use of the ﬁle.
Many popular, well-established formats are proprietary, and we choose to base our digital audio services on proprietary formats because they are familiar to patrons, and software to play back the ﬁles is readily available— sometimes even packaged with the computer’s operating system. Microsoft got sued for that.
There are some risks, however, in basing audio services on proprietary formats. Support can be very good until the sponsoring company abandons or alters the format. Companies often promote their own proprietary audio formats to the detriment of others with the hope of securing a greater market share, and they make adop tion of their format attractive by offering convenient tools for encoding sound in the formats. Often proprietary formats are developed for speciﬁc hardware and software, which will place limits on the playback options for listeners. For these reasons, a proprietary format that works well on one operating system may present problems for another.
There are two broad classiﬁcations of audio formats: uncompressed and com pressed. For uncompressed formats, the audio data consists of the digital audio samples as they were originally captured at their original ‘bit dept’h. Uncompressed formats do the best job of capturing and reproducing sound, and for that reason they are used extensively in the recording, mastering, and storage of digital audio. Let’s review the most common formats for uncompressed digital audio.
Pulse Code Modulation (PCM) is the process most often used to transmit and store uncompressed digital audio data. Most uncompressed digital audio ﬁle formats—including WAV, AIFF, and CDDA—use PCM as the format for the audio data. PCM is not new technology; it was developed in 1937 by British engineer Alec Reeves while working for International Telephone and Telegraph.
When an analog-to-digital converter translates analog audio samples into binary “words,” it uses PCM to transmit the individual bits of the words as voltages (“1” as a positive voltage; “0” as the absence of voltage), which can then be reconstituted as binary data for storage in a computer ﬁle or on a compact disc. PCM is the audio equivalent of ASCII text; because of its simplicity, most audio programs can play PCM. 3 It can accommodate a number of different resolutions (8-, 16-, and 24-bit depths are common), sampling rates (usually between 22 kHz and 96 kHz), and channel conﬁgurations (for example, mono, stereo, and 5.1 surround sound).
The .au ﬁle format—”au” is short for “audio”—was developed by Sun for use with telephone transmissions processed by Unix computers, and it became one of the earliest formats commonly used for audio ﬁles on personal computers. It is now primarily only of historical interest. The extension .snd is used for ﬁles in this format on Sun, NeXT, and Silicon Graphics computers. Although .auﬁles usually contain PCM audio, the format can also handle several compressed formats.
The Audio Interchange File Format (AIFF) was developed by Apple for use with the Macintosh, but it is recognized by a number of Windows and Linux audio editing programs as well. AIFF accommodates uncompressed PCM audio with a variety of channels, sampling rates, and resolutions.
The WAVE (Waveform Audio) is a proprietary ﬁle format developed by Microsoft for use in Windows 3.1. It is actually a variant of the RIFF bitstream format and is a “wrapper” format capable of containing audio data of various types, including compressed audio data. The default (and most common) type of data contained in a WAVE ﬁle is PCM data, which can be accommodated in a variety of channels at various sampling rates and resolutions.
WAVE is the format most frequently used in Windows operating systems for uncompressed audio. Many compact disc “ripping” applications store the resulting raw data in WAVE format, so it is often used as an intermediate format when preparing compressed audio for streaming.
Because uncompressed audio ﬁles are so large—about 10 MB of storage for ev ery minute of CD-quality audio—they are impractical for streaming and down loading over the internet. For network use, audio ﬁles are “compressed” to reduce their size, allowing for quicker downloads and real-time streaming.
Computers compress and decompress audio data by using software called a codec (COmpress/DECompress). Sometimes the term “codec” is used inter changeably with “audio format,” but there is an important difference: a codec is software that is used to interpret an audio format. In fact, in some cases several different codecs exist to compress and decompress a single audio format.
With compression, there is a tradeoff between ﬁle size and sound quality. Codecs that provide high levels of compression discard parts of the original audio to reduce the amount of data. The more data that is discarded, the smaller the audio ﬁle, but the loss of data also results in a degradation in sound quality.
Audio compression formats fall into three groups: formats deﬁned by in ternational standards (such as MPEG), proprietary formats (such as Windows Media and RealAudio), and open-source formats (such as Ogg Vorbis).
It is important to select the compression format that best meets your particu lar needs, and those needs often concern more than audio quality. The projected longevity of the format, its market share, its technical support, the requirements and limitations it imposes on hardware and software—all of these can be just as important as sound quality.
Some compression formats are able to reduce the size of an audio ﬁle without discarding any data. This is lossless compression. When the resulting com pressed ﬁle is decompressed, it is identical to the original uncompressed audio ﬁle. Lossless compression can be used to distribute and archive digital audio, and digital players can decode the most common audio formats for playback. The rate of reduction varies generally between 25 percent and 50 percent, de pending on the content of the source ﬁle.
Because they reduce the size of an average compact-disc audio ﬁle by no more than 50 percent, however, lossless compression formats are generally im practical for use in streaming—at least over networks slower than 600 kbps. Their primary application is in the archiving of master recordings, where it is essential both to preserve content and to save storage space. With lossless com pression, if the original media is lost or damaged, an exact duplicate of the original can be recovered at any time.
Most lossless encoders offer various levels of compression. The tradeoff is between ﬁle size and the amount of time required to encode a ﬁle; higher com pression comes at the cost of speed. Often the encoding software will offer guidance in selecting an appropriate compression level to suit your needs.
The Free Lossless Audio Codec (FLAC) was developed by the Xiph.org Founda tion and is a free, open-source format that has no restrictions on use and no licensing fees. There is also a metadata component: a “cue sheet” metadata block can be used to store a compact disc’s track listing and index points. FLAC can be used with any PCM data with bitdepths from 4 to 32, sampling rates from 1 Hz to 1 MHz, and one to eight channels. Typical compression rates run between 30 and 50 percent A technical strength of FLAC is its ability to be de coded quickly, which makes it suitable for streaming over fast networks. FLAC data is often contained in Xiph.org’s Ogg ﬁle format.
WavPack (extension .wv )
As its name suggests, WavPack is used to compress WAV ﬁles, and it can ac commodate ﬁles with multiple channels, at sampling rates from 6 to 192 KHz, and at 8-,16-, 24-, and 32-bit resolution. The compression rate ranges from 30 percent to 70 percent, depending on the source ﬁle.
The WavPack encoder offers options for both lossless and lossy compression as well as a “hybrid” mode, which creates a lossy compressed ﬁle and a second “correction” ﬁle that can be used to restore the compressed ﬁle to its original lossless state. The encoder is available in versions for Windows, Linux, and Mac OS X. All versions are run from a command line, but an optional Windows interface is available.
The three major proprietary formats—Windows Media, RealAudio, and QuickTime—now offer lossless codecs for use with archiving as well as streaming. The usual caveats related to proprietary formats apply here as well: the sponsoring companies may offer encoding software, and their popular media players (Windows Media Player, RealPlayer, Quicktime Player) may be able to.
The most common compressed audio formats use “psychoacoustic models” to discard audio data that cannot be heard or that is typically ignored by the hu man ear. By eliminating this data, a ﬁle can be reduced in size while minimizing the effect on the sound. These formats that selectively discard data are known as lossy formats, and they can produce ﬁles that are anywhere from one-fourth to one-thirtieth the size of the original uncompressed audio, with a correspond ing degradation in ﬁdelity.
Among the lossy formats, those based on MPEG standards are the most popular. MPEG is a suite of open standards for compressed audio and video developed by the Motion Picture Experts Group, a working group established in 1988 under the direction of the International Standards Organization.
MPEG’s standards have been released in families, each designated by num ber. MPEG-1 (approved in 1992), supports video encoding as well as mono and stereo audio encoding at three sampling rates; MPEG-2 (1994) increases the number of sampling rates and provides for broadcast-quality video and sur round sound; MPEG-4 (1998) supports a broad range of multimedia and is one of many formats provided by the MPEG standards, the most common are MP3 and AAC.
MP3, ofﬁcially known as MPEG-1 Audio Layer III, is an audio subset of the 1992 MPEG-1 standard. (Layer III also received some enhancements in the MPEG 2 standard.) MP3 ﬁles were front and center in the digital music revolution of the 1990s and gained notoriety through their open sharing on peer-to-peer networks.
The MP3 format has been particularly popular because it can produce “near CD” quality 13 audio at a compression rate of 11 to 1. In other words, one minute of compact-disc audio, which requires about 10 MB of storage, can be compressed to an MP3 ﬁle smaller than 1 MB. Despite the development of com pression formats that produce better sound quality at identical bitrates—such as AAC and Ogg Vorbis—MP3 remains the most popular audio format on the internet, and it has become the lingua franca of personal digital audio players.
Although the speciﬁcations of the MPEG standards are open and freely avail able, the Fraunhofer Institute and Thompson Multimedia—the companies that helped ﬁnance the development of the standards—hold patents on many of the algorithms used to code and decode MPEG ﬁles. 14 In 1998, when the Fraun hofer Institute issued a letter stating that it would begin charging royalties to developers of MP3 encoders, some distributors removed MP3 codecs from players, and some developers decided to begin work on truly open formats, such as Ogg Vorbis (see below).
Advanced Audio Coding (AAC) was developed under MPEG-2 and enhanced under MPEG-4. In the MPEG family of standards, AAC is the heir apparent to MP3. Until the introduction of AAC, MPEG audio formats were “backward compatible,” which means that ﬁles created with earlier standards could be played with decoders for the newer standards. With the introduction of AAC, MPEG abandoned backward compatibility in order to take advantage of newer coding algorithms and took the practical precaution of assigning it a name that would distinguish it from its “MP” predecessors.
AAC provides better sound quality than MP3 — particularly at lower bit rates — and it supports sampling rates from 8 kHz to 96 kHz, compared to MP3’s 16 kHz to 48 kHz. One claim to fame for the MPEG-4 AAC format was its adoption by Apple as the basis for the audio format used by its iTunes music store. In fact, because of the close association of AAC with the iPod, it is often mistakenly assumed that AAC stands for “Apple Audio Codec.”
Files with the extension .aac are MPEG-2 AAC ﬁles; only a few audio players are able to support these ﬁles. AAC audio data is more frequently contained in an MPEG-4 ﬁle (similar in structure to a QuickTime ﬁle), which is supported by most popular audio players. A number of confusing ﬁle extensions are applied to MPEG-4 ﬁles, and their interpretation can be challenging. Although the ofﬁcial MPEG-4 ﬁle extension is .mp4 , this extension is not found as frequently as the ones applied by Apple for use with the iPod and iTunes: .m4 a (“MPEG-4 audio”) is used for ﬁles ripped using iTunes, .m4 p (“MPEG-4 protected”) is used for ﬁles purchased on the iTunes Music Store (the “protected” refers to embedded digital rights management), .m4 b (“MPEG-4 bookmarkable”) is used for audio book ﬁles that can be “bookmarked,” and .m4 v (“MPEG-4 video”) is used for audio/video ﬁles.
Although AAC is a part of the open MPEG standards, the situation with licensing is similar to the one with MP3: the patent rights to the codecs used with AAC are held privately—in this case by AT&T, Dolby, the Fraunhofer Institute, and Sony. Also, the developers who incorporate AAC codecs into their software must pay royalties to the patent holders.
Windows Media Audio (WMA) was introduced by Microsoft in 1999 as a com petitor to MP3, and while it was slow to catch on at ﬁrst, its popularity has increased in recent years. Several online music stores—including Napster—use WMA (with Digital Rights Management) as the basis of their service, and a growing number of portable digital players support the format. WMA ﬁles are usually wrapped in an Advanced Systems Format (ASF) ﬁle, a fully documented format that provides streaming capability.
Microsoft’s Windows Media offerings include an encoder (Windows Media Encoder), various software development kits, and a player (Windows Media Player). There are both lossy and lossless codecs available for WMA.
The earliest live audio offerings on the web were radio broadcasts streamed using RealAudio, introduced by Progressive Networks (now RealNetworks) in 1995. This new format—and technology—led to the rapid growth of streaming audio and video webcasts during the late 1990s. With the subsequent develop ment of competing formats, RealAudio’s market share has deteriorated, but it is still a popular choice for streaming radio broadcasts, and it is still the format of choice for streaming digital audio reserves in music libraries. One advantage of RealAudio is the support of SMIL ﬁles, which allow a series of audio ﬁles to be played consecutively without prompting from the user. This feature is par ticularly useful with longer works, such as operas and multi-movement works, which are typically divided into multiple tracks on compact disc recordings.
RealNetworks applications in support of RealAudio include a player (Re alPlayer), encoder (RealProducer), and streaming server (RealServer). In July 2002, RealNetworks launched Helix, an open-source initiative that builds on programming code released by the company. The Helix Community currently offers a player (Helix Player), encoder (Helix Producer), and server (Helix Server), all of which are developed to support RealAudio.
QuickTime, developed by Apple Computer, is a popular format for streaming video and multimedia presentations encoded in various formats. The ﬁrst ver sion was released in December 1991, and Apple initially used QuickTime to provide video, graphics, and audio content on CD-ROMs. It remains the most popular format for CD-ROM video. In fact, Apple was fairly late in offering streaming capability for QuickTime, which was not made available until the re lease of Version 4 in June 1999—four years after the introduction of RealAudio.
The QuickTime is a “container” format that is particularly useful for syn chronizing the content of numerous multimedia ﬁles, which may be stored in different locations. The process of editing a multimedia presentation in Quick-Time is much simpler than in other formats, and because of this facility, MPEG adopted the QuickTime .mo v format as the basis for MPEG-4 in 1998. In an odd twist, Apple held off on incorporating the resulting MPEG-4 standard into QuickTime following a dispute with the MPEG-4 license holders over licensing fees. The two parties reached a compromise, and QuickTime 6 was released in July 2002. As of this writing, the latest release is Version 7.0.3.
Ogg Vorbis is a free and open audio format developed and maintained by Xiph.org. Chris Montgomery began the Ogg Vorbis project at the Mas sachusetts Institute of Technology soon after the Fraunhofer Institute an nounced in September 1998 that it would begin charging licensing fees for use of the MP3 format.
It is a fairly new format; the speciﬁcations were established in May 2000. Strictly speaking, Ogg is a ﬁle format and Vorbis is an audio format. Ogg can be used as a container for audio in other Xiph.org formats (such as FLAC), and Vorbis can exist as raw data without the Ogg container. Nonetheless, the audio format is commonly referred to as simply Ogg Vorbis.
The quality of Vorbis audio at a given bitrate is comparable to AAC and superior to MP3 and Windows Media Audio. At this point, though, the format is not in wide use, perhaps because the patent owners MP3 and AAC codecs have not been aggressive in collecting royalty payments for their use.
AIFC or AIFF-C (Audio Interchange File Format Extension for Compression) is a version of the AIFF format that accommodates compressed data. The codec can achieve compression rates as high as 83 percent.
Music lovers have been so preoccupied by the convenience of downloading music that sound quality hasn’t been much of a priority. But the honeymoon could be ending, says Wired magazine’s Eliot Van Buskirk. He joins us to explain why.