The digital audio standard frequently called
, officially known as
, is used for carrying digital audio
signals between various devices.
developed by the Audio
Engineering Society (AES) and the European
Broadcasting Union (EBU) and first published in 1985, later revised in
1992 and 2003.
Both AES and EBU versions of the standard
exist. Several different physical connectors are also defined as
part of the overall group of standards. A related system, S/PDIF
, was developed essentially as a consumer
version of AES/EBU, using connectors more commonly found in the
The AES3 standard parallels part 4 of the international standard
Of the physical interconnection types defined by IEC 60958, three
are in common use:
The AES-3id standard defines a 75-ohm BNC
electrical variant of AES3. More recently,
professional equipment (notably by Sony
used this physical interconnection type. This uses the same
cabling, patching and infrastructure as analogue or digital video,
and is thus common in the broadcast industry.
F05 connectors, 5 mm connectors for plastic optical fiber
, are more commonly known by
brand name, TOSLINK
. The precursor of the IEC 60958 Type II
specification was the Sony/Philips Digital Interface, or S/PDIF
. For details on the format of AES/EBU data,
see the article on S/PDIF. Note that the electrical levels differ
between AES/EBU and S/PDIF.
For information on the synchronization of digital audio structures,
see the AES11
standard. The ability to insert
unique identifiers into an AES3 bit stream is covered by the
Other AES3 transport structures.
AES3 digital audio format can also be carried over an Asynchronous Transfer Mode
network. The standard for packing AES3 frames into ATM cells is
, and is also published as IEC 62365
. This requires a CAT5 or CAT6 type of
network infrastructure to support this.
The low-level protocol for data transmission in AES/EBU and
S/PDIF is largely identical, and the following discussion applies
for S/PDIF as well unless otherwise noted.
Simple representation of the protocol
for both AES/EBU and S/PDIF
AES/EBU was designed primarily to support PCM
encoded audio in either DAT
format at 48 kHz or CD
format at 44.1 kHz. No attempt was made
to use a carrier able to support both rates; instead, AES/EBU
allows the data to be run at any
rate, and recovers the
clock rate by encoding the data using biphase mark code
The bit stream consists of 64-bit frames
transmitted once per sample time. This is divided into two 32-bit
(or channels): A (left) and B (right).
Each subframe consists of 32 time slots
transmit individual data bits or synchronization information. 24
bits are available for audio data, of which 20 bits are normally
192 consecutive frames are grouped into an audio
. Certain status information is transmitted once per
audio block. At the default 48 kHz sample rate, there are 250 audio
blocks per second.
The 32 time slots of each subframe are used as following:
Time slots 0 to 3
These slots contain a specially coded preamble
identify the subframe and its position within the audio block. They
do not obey normal BMC encoding rules, although they do still have
zero DC bias
Three preambles are possible :
- X (or M) : 11100010 if previous time slot was "0", 00011101 if
it was "1". (Equivalently, 10010011 NRZI
encoded.) Marks a word for channel A (left) that isn't at the start
of the data block.
- Y (or W) : 11100100 if previous time slot was "0", 00011011 if
it was "1". (Equivalently, 10010110 NRZI
encoded.) Marks a word that isn't for channel A (eg a word for
channel B (right) in a stereo signal).
- Z (or B) : 11101000 if previous time slot was "0", 00010111 if
it was "1". (Equivalently, 10011100 NRZI
encoded.) Marks a word for channel A (left) at the start of the
They are called X, Y, Z from AES standard; M, W, B from the IEC 958
(an AES extension).
The 8-bit preambles are transmitted in time allocated to the first
four time slots of each subframe (time slots 0 to 3). Any of the
three marks the beginning of a subframe. X or Y marks the beginning
of a frame, and Z marks the beginning of an audio block.
| 0 | 1 | 2 | 3 | | 0 | 1 | 2 | 3 | Time slots
_____ _ _____ _
/ \_____/ \_/ \_____/ \_/ \ Preamble X
_____ _ ___ ___
/ \___/ \___/ \_____/ \_/ \ Preamble Y
_____ _ _ _____
/ \_/ \_____/ \_____/ \_/ \ Preamble Z
___ ___ ___ ___
/ \___/ \___/ \___/ \___/ \ Normal 0 bits
_ _ _ _ _ _ _ _
/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \ Normal 1 bits
| 0 | 1 | 2 | 3 | | 0 | 1 | 2 | 3 | Time slots
It is straightforward to extend this structure to additional
channels (more subframes per frame), as is done in the MADI
Time slots 4 to 7
If the audio word length is more than 20 bits, these slots carry
the least significant bits of the audio sample data.
If the audio word length is 20 bits (the default) or less, these
time slots can carry auxiliary information such as a low-quality
auxiliary audio channel for producer talkback or studio-to-studio
Time slots 8 to 27
These time slots carry 20 bits of audio information starting with
and ending with MSB
. If the source provides fewer than
20 bits, the unused LSBs will be set to the logical 0 (for example,
for the 16-bit audio read from CDs bits 8–11 are set to 0).
Time slots 28 to 31
These time slots carry associated bits as follows:
- V (28) Validity bit: it is set to zero if the audio sample word
data are correct and suitable for D/A conversion. Otherwise, the
receiving equipment may be instructed to mute its output during the
presence of defective samples. It is used by most CD players to
indicate that concealment rather than error correction is taking
- U (29) User bit: any kind of data such as running time, song,
track number, etc. One bit per audio channel per frame form a
serial data stream. Each audio block has a single 192 bit control
- C (30) Channel status bit: like the user bit, the bits from
each frame of an audio block are grouped to make a 192-bit channel
status word. Its structure depends on whether AES/EBU or S/PDIF is used.
- P (31) Parity bit: for error detection. A parity bit is provided to permit the detection of
an odd number of errors resulting from malfunctions in the
interface. If used, it is set to provide even parity over bits 4–31
The Channel Status Word in AES/EBU
As stated before there is one channel status bit in each subframe,
making one 192 bit word for each channel in each block. This 192
bit word is usually presented as 192/8 = 24 bytes. The contents of
the channel status word are completely different between the AES3
and S/PDIF standards, although they agree that the first channel
status bit (byte 0 bit 0) distinguishes between the two. In the
case of AES3, the standard describes in detail how the bits have to
be used. Here is a summary of the channel status word:
- byte 0: basic control data: sample rate, compression, emphasis
- bit 0: A value of 1 indicates this is AES/EBU channel status
data. 0 indicates this is S/PDIF data.
- bit 1: A value of 0 indicates this is linear audio PCM data. A
value of 1 indicates other (usually non-audio) data.
- bits 2–4: Indicates the type of signal preemphasis applied to the data. Generally set
to 100 (none).
- bit 5: A value of 0 indicates that the source is locked to some
(unspecified) external time sync. A value of 1 indicates an
- Bits 6–7: Sample rate. These bits are redundant when real-time
audio is transmitted (the receiver can observe the sample rate
directly), but are useful if AES/EBU data is recorded or otherwise
stored. Options are unspecified, 48 kHz (the default), 44.1 kHz,
and 32 kHz.
- byte 1: indicates if the audio stream is stereo, mono or some
- bits 0–3: Indicates the relationship of the two channels; they
might be unrelated audio data, a stereo pair, duplicated mono data,
music and voice commentary, a stereo sum/difference code.
- bits 4–7: Used to indicate the format of the user channel
- byte 2: audio word length
- bits 0–2: Aux bits usage. This indicates how the aux bits (time
slots 4–7) are used. Generally set to 000 (unused) or 001 (used for
24-bit audio data).
- bits 3–5: Word length. Specifies the sample size, relative to
the 20- or 24-bit maximum. Can specify 0, 1, 2 or 4 missing bits.
Unused bits are filled with 0, but audio processing functions such
as mixing will generally fill them in with valid data without
changing the effective word length.
- bits 6–7: Unused
- byte 3: used only for multichannel applications
- byte 4: Additional sample rate information.
- bits 0–1: indicate the grade of the sample rate reference, per
- bit 2: reserved
- bits 3–6: Extended sample rate. This indicates other sample
rates, not representable in byte 0 bits 6–7. Values are assigned
for 24, 96, and 192 kHz, as well as 22.05, 88.2, and 176.4
- bit 7: This "sampling frequency scaling flag", if set,
indicates that the sample rate is multiplied by 1/1.001 to match
NTSC frame rates.
- byte 5: reserved
- bytes 6–9: Four ASCII characters for
indicating channel origin. Widely used in large studios.
- bytes 10–13: Four ASCII characters indicating channel
destination, to control automatic switchers. Less often used.
- bytes 14–17: 32-bit sample address, incrementing by 192 every
frame. At 48 kHz, this wraps every 24h51m18.485333s.
- bytes 18–21: as above, but offset to indicate samples since
- byte 22: contains information about the reliability of the
channel status word.
- bits 0–3: reserved
- bit 4: if set, bytes 0–5 (signal format) are unreliable.
- bit 5: if set, bytes 6–13 (channel labels) are unreliable.
- bit 6: if set, bytes 14–17 (sample address) are
- bit 7: if set, bytes 18–21 (timestamp) are unreliable.
- byte 23: CRC. This byte
is used to detect corruption of the channel status word, as might
be cause by switching mid-block. (Generator polynomial is
preset to 1.)