MPEG-1 Audio Layer 3
, more commonly referred to as
, is a patent
digital audio encoding
format using a form of lossy data compression
. It is a
common audio format for consumer audio storage, as well as a
de facto standard
of digital audio
compression for the transfer and playback of music on digital audio players
MP3 is an audio-specific format that was designed by the Moving Picture Experts Group
part of its MPEG-1
standard. The group was formed
by several teams of engineers at Fraunhofer IIS in Erlangen, Germany, AT&T-Bell Labs (now a division of Alcatel-Lucent) in Murray Hill,
NJ, USA, Thomson-Brandt, and CCETT as well as others.
It was approved as an
The use in MP3 of a lossy
is designed to greatly reduce the amount
of data required to represent the audio recording and still sound
like a faithful reproduction of the original uncompressed audio for
most listeners. An MP3 file that is created using the setting of
will result in a file that is
about 1/11th the size of the CD
file created from the
original audio source. An MP3 file can also be constructed at
higher or lower bit rates, with higher or lower resulting
The compression works by reducing accuracy of certain parts of
sound that are deemed beyond the auditory
resolution ability of most people. This method is commonly referred
to as perceptual coding
internally provides a representation of sound within a short-term
time/frequency analysis window, by using psychoacoustic
models to discard or reduce
precision of components less audible to human hearing, and
recording the remaining information in an efficient manner.
This technique is often presented as relatively conceptually
similar to the principles used by JPEG
image compression format. The specific algorithms, however, are
rather different: JPEG uses a built-in vision model that is very
widely tuned (as is necessary for images), while MP3 uses a
complex, precise masking model that is much more signal
The MP3 lossy audio data
algorithm takes advantage of a perceptual
limitation of human hearing called auditory masking
. In 1894, Alfred Marshall Mayer
reported that a
tone could be rendered inaudible by another tone of lower
frequency. In 1959, Richard Ehmer described a complete set of
auditory curves regarding this phenomenon. Ernst Terhardt et
created an algorithm describing auditory masking with high
accuracy. This work added on a variety of reports from authors
dating back to Fletcher, and to the work that initially determined
critical ratios and critical bandwidths.
The psychoacoustic masking codec
was first proposed in 1979, apparently
independently, by Manfred R.
Schroeder, et al. from AT&T-Bell
Labs in Murray Hill,
NJ, and M.
A. Krasner both in the United
States. Krasner was the first to publish and to produce hardware
for speech, not usable as music bit compression, but the
publication of his results as a relatively obscure Lincoln Laboratory
Technical Report did
not immediately influence the mainstream of psychoacoustic codec
development. Manfred Schroeder was already a well-known and revered
figure in the worldwide community of acoustical and electrical
engineers, and his paper had influence in acoustic and
source-coding (audio data compression) research. Both Krasner and
Schroeder built upon the work performed by Eberhard F. Zwicker in the areas
of tuning and masking of critical bands, that in turn built on the
fundamental research in the area from Bell Labs of Harvey Fletcher and his collaborators.
wide variety of (mostly perceptual) audio compression algorithms
were reported in IEEE
's refereed Journal on
Selected Areas in Communications. That journal reported in February
1988 on a wide range of established, working audio bit compression
technologies, some of them using auditory masking as part of their
fundamental design, and several showing real-time hardware
The immediate predecessors of MP3 were "Optimum Coding in the
Frequency Domain" (OCF), and Perceptual Transform Coding (PXFM).
These two codecs, along with block-switching contributions from
Thomson-Brandt, were merged into a codec called ASPEC, which was
submitted to MPEG, and which won the quality competition, but that
was mistakenly rejected as too complex to implement. The first
practical implementation of an audio perceptual coder (OCF) in
hardware (Krasner's hardware was too cumbersome and slow for
practical use), was an implementation of a psychoacoustic transform
coder based on Motorola 56000 DSP
MP3 is directly descended from OCF and PXFM. MP3 represents the
outcome of the collaboration of Dr. Karlheinz Brandenburg
, working as
a postdoc at AT&T-Bell Labs with Mr. James D. Johnston of
AT&T-Bell Labs, collaborating with the Fraunhofer Society for
Integrated Circuits, Erlangen, with relatively minor contributions
from the MP2 branch of psychoacoustic sub-band coders.
MPEG-1 Audio Layer 2 encoding began as
the Digital Audio Broadcast
(DAB) project managed by Egon
Meier-Engelen of the Deutsche Forschungs- und
Versuchsanstalt für Luft- und Raumfahrt (later on called
Deutsches Zentrum für Luft- und Raumfahrt, German Aerospace Center) in Germany.
project, commonly known as EU-147, from 1987 to 1994 as a part of
doctoral student at Germany's University of
Brandenburg began working on digital music compression in the
early 1980s, focusing on how people perceive music.
completed his doctoral work in 1989 and became an assistant
professor at Erlangen-Nuremberg. While there, he continued to work
on music compression with scientists at the Fraunhofer Society
(in 1993 he joined the
staff of the Fraunhofer Institute).
In 1991 there were two proposals available: Musicam
Spectral Perceptual Entropy Coding
). The Musicam technique, as
proposed by Philips
(France) and Institut für
(Germany) was chosen due to its simplicity and
error robustness, as well as its low computational power associated
with the encoding of high quality compressed audio. The Musicam
format, based on sub-band coding
was the basis of the MPEG Audio compression format (sampling rates,
structure of frames, headers, number of samples per frame).
Much of its technology and ideas were incorporated into the
definition of ISO MPEG Audio Layer I and Layer II and the filter
bank alone into Layer III (MP3) format as part of the
computationally inefficient hybrid filter bank. Under the chairmanship
of Professor Musmann (University of Hannover) the editing of the standard was made under the
responsibilities of Leon van de
Kerkhof (Layer I) and Gerhard
Stoll (Layer II).
A working group
consisting of Leon van
de Kerkhof (The Netherlands), Gerhard Stoll (Germany), Leonardo Chiariglione
(France), Karlheinz Brandenburg (Germany) and James D. Johnston
(USA) took ideas from ASPEC, integrated the filter bank
from Layer 2, added some of their
own ideas and created MP3, which was designed to achieve the same
quality at 128 kbit/s
at 192 kbit/s.
All algorithms were approved in 1991 and finalized in 1992 as part
, the first standard suite by
, which resulted in the international
standard ISO/IEC 11172-3
published in 1993. Further work on MPEG audio was finalized in 1994
as part of the second suite of MPEG standards, MPEG-2
, more formally known as international standard
, originally published in 1995. There is
also MPEG-2.5 audio, a proprietary unofficial extension developed
by Fraunhofer IIS. It enables MP3 to work satisfactorily at very
low bitrates and added lower sampling frequencies.
Compression efficiency of encoders is typically defined by the bit
rate, because compression ratio depends on the bit depth and
of the input signal.
Nevertheless, compression ratios are often published. They may use
the Compact Disc
(CD) parameters as
references (44.1 kHz
, 2 channels at 16 bits per
channel or 2×16 bit), or sometimes the Digital Audio Tape
(DAT) SP parameters
(48 kHz, 2×16 bit). Compression ratios with this latter reference
are higher, which demonstrates the problem with use of the term
for lossy encoders.
Karlheinz Brandenburg used a CD recording of Suzanne Vega
's song "Tom's Diner
" to assess and refine the MP3
song was chosen because of its nearly monophonic nature and wide
spectral content, making it easier to hear imperfections in the
compression format during playbacks. Some jokingly refer to Suzanne
Vega as "The mother of MP3". Some more critical audio excerpts (glockenspiel, triangle, accordion, etc.) were taken from the EBU V3/SQAM reference compact disc and have been used
by professional sound engineers to assess the subjective quality of
the MPEG Audio formats.
A reference simulation software implementation, written in the C
language and known as ISO 11172-5
, was developed by the
members of the ISO MPEG Audio committee in order to produce bit
compliant MPEG Audio files (Layer 1, Layer 2, Layer 3). Working in
non-real time on a number of operating systems, it was able to
demonstrate the first real time hardware decoding (DSP
based) of compressed audio.
Some other real time implementation of MPEG Audio encoders were
available for the purpose of digital broadcasting (radio DAB,
television DVB) towards consumer receivers and set top boxes.
Later, on July 7, 1994, the Fraunhofer Society
released the first
software MP3 encoder called l3enc
filename extension .mp3
was chosen by the Fraunhofer team on July 14, 1995 (previously, the
files had been named .bit
). With the first real-time
software MP3 player Winplay3
September 9, 1995) many people were able to encode and play back
MP3 files on their PCs. Because of the relatively small hard drives
back in that time (~ 500 MB
) lossy compression was essential to store
non-instrument based (see tracker
) music forplayback
From the first half of 1994 through the late 1990s, MP3 files began
to spread on the Internet
. The popularity
of MP3s began to rise rapidly with the advent of Nullsoft
's audio player Winamp
(released in 1997), and the Unix audio player
. In 1998, the Rio PMP300
, one of the first portable MP3 players
was released, despite legal suppression efforts by the RIAA
In November 1997, the website mp3.com
offering thousands of MP3s created by independent artists for free.
The small size of MP3 files enabled widespread peer-to-peer file
of music ripped
from CDs, which
would have previously been nearly impossible. The first large
peer-to-peer filesharing network, Napster
was launched in 1999.
The ease of creating and sharing MP3s resulted in widespread
infringement. Major record
companies argue that this free sharing of music reduces sales, and
call it "music piracy
". They reacted by
pursuing lawsuits against Napster
eventually shut down) and eventually against individual users who
engaged in file sharing.
Despite the popularity of the MP3 format, online music retailers
often use other proprietary formats that are encrypted or
obfuscated in order to make it difficult to use purchased music
files in ways not specifically authorized by the record companies.
Attempting to control the use of files in this way is known as
Digital Rights Management
Record companies argue that this is necessary to prevent the files
from being made available on peer-to-peer file sharing networks.
This has other side effects, though, such as preventing users from
playing back their purchased music on different types of devices.
However, the audio content of these files can usually be converted
into an unencrypted format. For instance, users are often allowed
to burn files to audio
, which requires conversion to an unencrypted audio
Unauthorized MP3 file sharing continues on next-generation
peer-to-peer networks. Some authorized services, such as Beatport, Bleep, Juno Records, eMusic,
Zune Marketplace, Walmart.com, and Amazon.com sell
unrestricted music in the MP3 format.
standard does not include a
precise specification for an MP3 encoder, but does provide example
psychoacoustic models, rate loop, and the like in the non-normative
part of the original standard. At the present, these suggested
implementations are quite dated. Implementers of the standard were
supposed to devise their own algorithms suitable for removing parts
of the information from the audio input. As a result, there are
many different MP3 encoders available, each producing files of
differing quality. Comparisons are widely available, so it is easy
for a prospective user of an encoder to research the best choice.
It must be kept in mind that an encoder that is proficient at
encoding at higher bit rates (such as LAME
not necessarily as good at lower bit rates.
During encoding, 576 time-domain samples are taken and are
transformed to 576 frequency-domain samples. If there is a transient
, 192 samples are taken
instead of 576. This is done to limit the temporal spread of
quantization noise accompanying the transient. (See psychoacoustics
Decoding, on the other hand, is carefully defined in the standard.
compliant", which means that the
decompressed output - that they produce from a given MP3 file -
will be the same, within a specified degree of rounding
tolerance, as the output specified
mathematically in the ISO/IEC standard document (ISO/IEC 11172-3).
Therefore, comparison of decoders is usually based on how
computationally efficient they are (i.e., how much memory
use in the decoding process).
When performing lossy audio encoding, such as creating an MP3 file,
there is a trade-off between the amount of space used and the sound
quality of the result. Typically, the creator is allowed to set a
, which specifies how many kilobits
the file may use per second of audio.
Using a lower bit rate provides a relatively lower audio quality
and produces a smaller file size. Likewise, using a higher bit rate
outputs a higher quality audio, but also results in a larger
Files encoded with a lower bit rate will generally play back at a
lower quality. With too low a bit rate, compression artifacts
(i.e. sounds that
were not present in the original recording) may be audible in the
reproduction. Some audio is hard to compress because of its
randomness and sharp attacks. When this type of audio is
compressed, artifacts such as ringing or pre-echo
are usually heard. A sample of applause
compressed with a relatively low bit rate provides a good example
of compression artifacts.
Besides the bit rate of an encoded piece of audio, the quality of
MP3 files also depends on the quality of the encoder itself, and
the difficulty of the signal being encoded. As the MP3 standard
allows quite a bit of freedom with encoding algorithms, different
encoders may feature quite different quality, even with identical
bit rates. As an example, in a public listening test featuring two
different MP3 encoders at about 128 kbit/s, one scored 3.66 on a
1–5 scale, while the other scored only 2.22.
Quality is dependent on the choice of encoder and encoding
parameters. However, in 1998, MP3 at 128 kbit/s was providing
quality only equivalent to AAC
at 64 kbit/s and MP2 at 192 kbit/s.
The simplest type of MP3 file uses one bit rate for the entire file
— this is known as Constant Bit
(CBR) encoding. Using a constant bit rate makes encoding
simpler and faster. However, it is also possible to create files
where the bit rate changes throughout the file. These are known as
Variable Bit Rate
(VBR) files. The
idea behind this is that, in any piece of audio, some parts will be
much easier to compress, such as silence or music containing only a
few instruments, while others will be more difficult to compress.
So, the overall quality of the file may be increased by using a
lower bit rate for the less complex passages and a higher one for
the more complex parts. With some encoders, it is possible to
specify a given quality, and the encoder will vary the bit rate
accordingly. Users who know a particular "quality setting" that is
ears can use this value when encoding all of their music, and not
need to worry about performing personal listening tests on each
piece of music to determine the correct bit rate.
Perceived quality can be influenced by listening environment
(ambient noise), listener attention, and listener training and in
most cases by listener audio equipment (such as sound cards,
speakers and headphones).
given to new students by Stanford University Music Professor Jonathan Berger showed that student
preference for MP3 quality music has risen each year.
said the students seem to prefer the 'sizzle' sounds that MP3s
bring to music. Others have reached the same conclusion, and some
record producers have begun to mix music specifically to be heard
on iPods and mobile phones.
Several bit rates
are specified in the
MPEG-1 Layer 3 standard: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160,
192, 224, 256 and 320 kbit/s, and the available sampling frequencies
are 32, 44.1 and
. A sample rate of 44.1 kHz is almost
always used, because this is also used for CD audio
, the main source used
for creating MP3 files. A greater variety of bit rates are used on
the Internet. 128 kbit/s is the most common, offering adequate
audio quality in a relatively small space. As Internet bandwidth
availability and hard drive
sizes have increased, higher bit rates like 160 and 192 kbit/s have
increased in popularity.
Uncompressed audio as stored on an audio-CD has a bit rate of
1,411.2 kbit/s, so the bitrates 128, 160 and 192 kbit/s
represent compression ratios
of approximately 11:1, 9:1 and 7:1 respectively.
Non-standard bit rates up to 640 kbit/s can be achieved with
encoder and the freeformat option,
although few MP3 players can play those files. According to the ISO
standard, decoders are only required to be able to decode streams
up to 320 kbit/s.
MPEG audio may use variable bitrate
(VBR). Layer III can use bitrate switching and bit reservoir.
Variable bitrate is used when the goal is to achieve a fixed level
of quality. The final file size of a VBR encoding is less
predictable than with constant
. Average bitrate
compromise between the two - the bitrate is allowed to vary for
more consistent quality, but is controlled to remain near an
average value chosen by the user, for predictable file sizes.
Although technically an MP3 decoder must support VBR to be
standards compliant, historically some decoders have bugs with VBR
decoding, particularly before VBR encoders became widespread.
An MP3 file is made up of multiple MP3 frames, which consist of a
header and a data block. This sequence of frames is called an
. Frames are not
independent items ("byte reservoir") and therefore cannot be
extracted on arbitrary frame boundaries. The MP3 Data blocks
contain the (compressed) audio information in terms of frequencies
and amplitudes. The diagram shows that the MP3 Header consists of a
, which is used to identify the
beginning of a valid frame. This is followed by a bit indicating
that this is the MPEG
standard and two bits
that indicate that layer 3 is used; hence MPEG-1 Audio Layer 3 or
MP3. After this, the values will differ, depending on the MP3 file.
defines the range of values for each section of the header along
with the specification of the header. Most MP3 files today contain
precedes or follows the MP3 frames; as noted in the diagram.
There are several limitations inherent to the MP3 format that
cannot be overcome by any MP3 encoder. Newer audio compression
formats such as Vorbis
, WMA Pro
no longer have these limitations. In technical terms, MP3 is
limited in the following ways:
- Time resolution can be too low for highly transient signals and
may cause smearing of percussive sounds.
- Due to the tree structure of the filter bank, pre-echo problems
are made worse, as the combined impulse response of the two filter
banks does not, and cannot, provide an optimum solution in
- The combining of the two filter banks' outputs creates aliasing
problems that must be handled partially by the "aliasing
compensation" stage; however, that creates excess energy to be
coded in the frequency domain, thereby decreasing coding
- Frequency resolution is limited by the small long block window
size, which decreases coding efficiency.
- There is no scale factor band for frequencies above 15.5/15.8
- Joint stereo is done only on a
- Internal handling of the bit reservoir increases encoding
overall delay is not defined, which means there is no official
provision for gapless playback.
However, some encoders such as LAME can attach
additional metadata that will allow players that can handle it to
deliver seamless playback.
- The data stream can contain an optional checksum, but the
checksum only protects the header data, not the
ID3 and other tags
- Main articles: ID3 and APEv2 tag
A "tag" in an audio file is a section of the file that contains
such as the title, artist, album,
track number or other information about the file's contents.
As of 2006, the most widespread standard tag formats are ID3v1 and ID3v2
, and the more recently introduced
APEv2 was originally developed for the MPC file format
. APEv2 can
coexist with ID3 tags in the same file or it can be used by
Tag editing functionality is often built into MP3 players and
editors, but there also exist tag editors
dedicated to the purpose.
Since volume levels of different audio sources can vary greatly, it
is sometimes desirable to adjust the playback volume of audio files
such that a consistent average volume is perceived. The idea is to
control the average
volume across multiple files, not the
in a single file. This gain normalization
, while similar in
purpose, is distinct from dynamic range compression
which is a form of normalization used in audio mastering. Gain
normalization may defeat the intent of recording artists and audio
engineers who deliberately set the volume levels of the audio they
A few standards for storing the average volume of an MP3 file in
its metadata tags, enabling a specially designed player to
automatically adjust the overall playback volume for each file,
have been proposed. A popular and widely implemented such proposal
is "Replay Gain
", which is not
MP3-specific. When used in MP3s, it is stored differently by
different encoders, and as of 2008, Replay Gain-aware players don't
yet support all formats.
Licensing and patent issues
Many organizations have claimed ownership of patents
related to MP3 decoding or encoding. These
claims have led to a number of legal threats and actions from a
variety of sources, resulting in uncertainty about which patents
must be licensed in order to create MP3 products without committing
patent infringement in countries that allow software patents.
The various MP3-related patents expire on dates ranging from 2007
to 2017 in the U.S. The initial near-complete MPEG-1 standard
(parts 1, 2 and 3) was publicly available in December 6, 1991 as
ISO CD 11172. In the United States, patents cannot claim inventions
that were already publicly disclosed by the inventor more than a
year prior to the filing date, but for patents filed prior to June
8, 1995, submarine patents
possible to extend the effective lifetime of a patent through
application extensions. Patents filed for anything disclosed in ISO
CD 11172 more than a year after its publication are questionable;
if only the known MP3 patents filed by December 1992 are considered
MP3 decoding may be patent free in the US by December of
Thomson Consumer Electronics claims to control
MP3 licensing of the Layer 3 patents in many countries, including
States, Japan, Canada and EU
Thomson has been actively enforcing these
MP3 license revenues generated about €100 million for the
Fraunhofer Society in 2005.
In September 1998, the Fraunhofer Institute sent a letter to
several developers of MP3 software stating that a license was
required to "distribute and/or sell decoders and/or encoders". The
letter claimed that unlicensed products "infringe the patent rights
of Fraunhofer and Thomson. To make, sell and/or distribute products
using the [MPEG Layer-3] standard and thus our patents, you need to
obtain a license under these patents from us."
However, there exist both free
proprietary alternatives, with free formats such as Vorbis
's usage of its own
proprietary Windows Media
allows it to avoid licensing issues associated with these patents
by avoiding usage of the MP3 format entirely. Until the key patents
expire, unlicensed encoders and players could be infringing
in countries where the
patents are valid.
In spite of the patent restrictions, the perpetuation of the MP3
format continues. The reasons for this appear to be the network effects
- familiarity with the format,
- the large quantity of music now available in the MP3
- the wide variety of existing software and hardware that takes
advantage of the file format,
lack of DRM restrictions,
which makes MP3 files easy to edit, copy and play in different
portable digital players (Samsung, Apple, Creative, etc.),
- the majority of home users not knowing or not caring about the
patents' controversy and often not considering such legal issues
when choosing their music format for personal use.
Additionally, patent holders declined to enforce license fees on
decoders, which allows many free MP3 decoders to
develop. Thus, while patent fees have been an issue for companies
that attempt to use MP3, they have not meaningfully impacted users,
which allows the format to grow in popularity.
Sisvel S.p.A. and its U.S. subsidiary Audio MPEG, Inc. previously
sued Thomson for patent infringement on MP3 technology, but those
disputes were resolved in November 2005 with Sisvel granting
Thomson a license to their patents. Motorola also recently signed
with Audio MPEG to license MP3-related patents.
In September 2006, German officials seized MP3 players from
's booth at the IFA show
in Berlin after an Italian patents firm
won an injunction on behalf of Sisvel against SanDisk in a dispute
over licensing rights. The injunction was later reversed by a
Berlin judge, but that reversal was in turn blocked the same day by
another judge from the same court, "bringing the Patent Wild West
to Germany" in the words of one commentator.
On February 16, 2007, Texas MP3 Technologies sued Apple, Samsung
Electronics and Sandisk with a patent-infringement lawsuit
regarding portable MP3 players. The suit was filed in Marshall,
; this is a common location for patent infringement suits
due to the speed at which trials are conducted there.
Texas MP3 Technologies claimed infringement with U.S. patent
7,065,417, awarded in June 2006 to multimedia chip-maker SigmaTel,
covering "an MPEG portable sound reproducing system and a method
for reproducing sound data compressed using the MPEG method."
also claims ownership
of several patents relating to MP3 encoding and compression,
inherited from AT&T-Bell Labs. In November 2006 (prior to the
companies' merger), Alcatel filed a lawsuit against Microsoft
(see Alcatel-Lucent v. Microsoft
infringement of seven of its patents. On February 23, 2007, a San
Diego jury awarded Alcatel-Lucent
record-breaking US$1.52 billion in damages. The judge, however,
reversed the jury verdict and ruled for Microsoft
, and this ruling was upheld by the court
of appeals. The appeals court actually ruled that Fraunhofer was a
co-owner of one patent claimed to be owned by Alcatel-Lucent, due
to work by James D. Johnston while Dr. Brandenburg worked at
In short, with Thomson, Fraunhofer IIS, Sisvel (and its U.S.
subsidiary Audio MPEG), Texas MP3 Technologies, and Alcatel-Lucent
all claiming legal control of relevant MP3 patents related to
decoders, the legal status of MP3 remains unclear in countries
where those patents are valid.
Microsoft Windows Media Format
, Windows XP
and Windows Server
contained a coding error that permitted "remote code execution if a
user opened a specially crafted media file". Such a file would
allow the attacker to "then install programs; view, change, or
delete data; or create new accounts with full user rights", if the
account on which the file was played had administrator privileges.
The problem was addressed in a critical update issued on September
8, 2009 (KB968816).
Many other lossy and lossless audio codecs
exist. Among these, mp3PRO
, and MP2
are all members of the same
technological family as MP3 and depend on roughly similar psychoacoustic models
. The Fraunhofer Gesellschaft
owns many of
the basic patents
underlying these codecs as
well, with others held by Dolby Labs
, Thomson Consumer Electronics
. In addition, there is also
the open source file format Ogg Vorbis
has been available free of charge and without patent
- 16 bit/sample × 44100 samples/second × 2 channels / 1000
- 16 bit/sample × 44100 samples/second × 2 channels / 1000
- The Story
of MP3 — How MP3 was invented, by Fraunhofer IIS
- MPEG Official Web site
- MP3, Hydrogen Audio Wiki
- RFC 3119, A More Loss-Tolerant RTP Payload Format for MP3
- RFC 3003, The audio/mpeg Media Type