A
numeral system (or
system of
numeration) is a
writing
system for expressing numbers, that is a
mathematical notation for representing
numbers of a given set, using
graphemes or symbols in a consistent manner.It can
be seen as the context that allows the numerals "11" to be
interpreted as the
binary
symbol for
three, the
decimal
symbol for
eleven, or as other numbers in different
bases.
Ideally, a numeral system will:
- Represent a useful set of numbers (e.g. all whole numbers, integers,
or rational numbers)
- Give every number represented a unique representation (or at
least a standard representation)
- Reflect the algebraic and arithmetic structure of the
numbers.
For example, the usual
decimal
representation of whole numbers gives every whole number a unique
representation as a
finite sequence of
digit.
However, when decimal representation is used for the
rational or real numbers, the representation
may not be unique: many rational numbers have two numerals, a
standard one that terminates, such as 2.31, and another that
recurs, such as 2.309999999... .
Numerals which terminate have no non-zero digits after a given
position. For example, numerals like 2.31 and 2.310 are taken to be
the same, except for some scientific contexts where greater
precision is implied by the trailing zero.
Numeral systems are sometimes called
number systems, but that name is
misleading, as it could refer to different systems of numbers, such
as the system of
real numbers, the
system of
complex numbers, the system
of
p-adic numbers, etc. Such
systems are not the topic of this article.
Types of numeral systems
The most
commonly used system of numerals is known as Hindu-Arabic numerals, and two
Indian mathematicians are credited with developing
them. Aryabhatta of
Kusumapura who lived during the 5th century developed the
place value notation and
Brahmagupta a century later introduced
the symbol zero.
The simplest numeral system is the
unary numeral system, in which every
natural number is represented by a
corresponding number of symbols. If the symbol
/ is
chosen, for example, then the number seven would be represented by
///////.
Tally marks represent
one such system still in common use. The unary system is only
useful for small numbers, although it plays an important role in
theoretical computer
science.
Elias gamma coding,
which is commonly used in
data
compression, expresses arbitrary-sized numbers by using unary
to indicate the length of a binary numeral.
The unary notation can be abbreviated by introducing different
symbols for certain new values. Very commonly, these values are
powers of 10; so for instance, if / stands for one, - for ten and +
for 100, then the number 304 can be compactly represented as
+++ //// and the number 123 as
+ - - /// without
any need for zero. This is called
sign-value notation. The ancient
Egyptian numeral system was
of this type, and the
Roman numeral
system was a modification of this idea.
More useful still are systems which employ special abbreviations
for repetitions of symbols; for example, using the first nine
letters of our alphabet for these abbreviations, with A standing
for "one occurrence", B "two occurrences", and so on, we could then
write C+ D/ for the number 304. The number system of the
English language is of this type ("three
hundred [and] four"), as are those of other spoken
languages, regardless of what written systems they
have adopted. However many languages use mixtures of bases, and
other features, for instance 79 in French is
soixante
dix-neuf (60+10+9) and in Welsh is
pedwar ar bymtheg a
thrigain (4+(5+10)+(3 x 20)) or (somewhat archaic)
pedwar
ugain namyn un (4 x 20 - 1)
More elegant is a
positional
system, also known as place-value notation. Again working
in base 10, we use ten different digits 0, ..., 9 and use the
position of a digit to signify the power of ten that the digit is
to be multiplied with, as in 304 = 3×100 + 0×10 + 4×1. Note that
zero, which is not needed in the other
systems, is of crucial importance here, in order to be able to
"skip" a power.
The Hindu-Arabic numeral system,
which originated in India and is now
used throughout the world, is a positional base 10
system.
Arithmetic is much easier in positional systems than in the earlier
additive ones; furthermore, additive systems need a large number of
different symbols for the different powers of 10; a positional
system needs only 10 different symbols (assuming that it uses base
10).
The numerals used when writing numbers with digits or symbols can
be divided into two types that might be called the
arithmetic numerals 0,1,2,3,4,5,6,7,8,9
and the
geometric numerals
1,10,100,1000,10000... respectively. The sign-value systems use
only the geometric numerals and the positional systems use only the
arithmetic numerals. The sign-value system does not need arithmetic
numerals because they are made by repetition (except for the
Ionic system), and the positional
system does not need geometric numerals because they are made by
position. However, the spoken language uses
both
arithmetic and geometric numerals.
In certain areas of computer science, a modified base-
k
positional system is used, called
bijective numeration, with digits 1, 2,
...,
k (
k ≥ 1), and zero being represented by an
empty string. This establishes a
bijection
between the set of all such digit-strings and the set of
non-negative integers, avoiding the non-uniqueness caused by
leading zeros. Bijective base-
k numeration is also called
k-adic notation, not to be confused with
p-adic numbers. Bijective base-1 is the same
as unary.
Positional systems in detail
In a positional base-
b numeral system (with
b a
positive
natural number known as the
radix),
b basic symbols (or digits)
corresponding to the first
b natural numbers including
zero are used. To generate the rest of the numerals, the position
of the symbol in the figure is used. The symbol in the last
position has its own value, and as it moves to the left its value
is multiplied by
b.
For example, in the
decimal system (base
10), the numeral 4327 means (
4×10
^{3}) +
(
3×10
^{2}) +
(
2×10
^{1}) +
(
7×10
^{0}), noting that 10
^{0} =
1.
In general, if
b is the base, we write a number in the
numeral system of base
b by expressing it in the form
a_{n}b^{n} +
a_{n − 1}b^{n − 1}
+
a_{n − 2}b^{n − 2}
+ ... +
a_{0}b^{0} and writing
the enumerated digits
a_{n}a_{n − 1}a_{n − 2}
...
a_{0} in descending order. The digits are
natural numbers between 0 and
b − 1,
inclusive.
If a text (such as this one) discusses multiple bases, and if
ambiguity exists, the base (itself represented in base 10) is added
in subscript to the right of the number, like this:
number
_{base}. Unless specified by context, numbers without
subscript are considered to be decimal.
By using a dot to divide the digits into two groups, one can also
write fractions in the positional system. For example, the base-2
numeral 10.11 denotes 1×2
^{1} + 0×2
^{0} +
1×2
^{−1} + 1×2
^{−2} = 2.75.
In general, numbers in the base
b system are of the
form:
(a_na_{n-1}\cdots a_1a_0.c_1 c_2 c_3\cdots)_b =\sum_{k=0}^n a_kb^k
+ \sum_{k=1}^\infty c_kb^{-k}.
The numbers
b^{k} and
b^{−k} are the
weight of the corresponding digits. The
position k is the
logarithm of
the corresponding
weight w, that is k = \log_{b} w =
\log_{b} b^k. The highest used position is close to the
order of magnitude of the number.
The number of
tally marks required in
the
unary numeral system for
describing the weight would have been
w.
In the positional system the number of digits required to describe
it is only k + 1 =
\log_{b} w + 1, for k \ge 0.
E.g. to describe the weight 1000 then four digits are needed since
\log_{10} 1000 + 1 = 3 + 1. The number of digits required to
describe the position is \log_{b} k + 1 = \log_{b}
\log_{b} w + 1 (in positions 1, 10, 100... only for simplicity in
the decimal example).
Position |
3 |
2 |
1 |
0 |
-1 |
-2 |
... |
Weight |
b^3 |
b^2 |
b^1 |
b^0 |
b^{-1} |
b^{-2} |
... |
Digit |
a_3 |
a_2 |
a_1 |
a_0 |
c_1 |
c_2 |
... |
Decimal example weight |
1000 |
100 |
10 |
1 |
0.1 |
0.01 |
... |
Decimal example digit |
4 |
3 |
2 |
7 |
0 |
0 |
... |
Note that a number has a terminating or repeating expansion
if and only if it is
rational; this does not depend on the base.
A number that terminates in one base may repeat in another (thus
0.3
_{10} = 0.0100110011001...
_{2}). An irrational
number stays unperiodic (infinite amount of unrepeating digits) in
all integral bases. Thus, for example in base 2,
π = 3.1415926...
_{10} can be written down as the
unperiodic 11.001001000011111...
_{2}.
Putting
overscores, , or dots,
^{•}n,
above the common digits is a convention used to represent repeating
rational expansions. Thus:
- 14/11 = 1.272727272727... = 1. or
321.3217878787878... = 321.321^{•}7^{•}8
.
If
b =
p is a
prime
number, one can define base-
p numerals whose expansion
to the left never stops; these are called the
p-adic numbers.
Generalized variable-length integers
More general is using a notation (here written
little-endian) like a_0 a_1
a_2 for a_0 + a_1 b_1 + a_2 b_1 b_2, etc.
This is used in
punycode, one aspect of
which is the representation of a sequence of non-negative integers
of arbitrary size in the form of a sequence without delimiters, of
"digits" from a collection of 36: a-z and 0-9, representing 0-25
and 26-35 respectively. A digit lower than a threshold value marks
that it is the most-significant digit, hence the end of the number.
The threshold value depends on the position in the number. For
example, if the threshold value for the first digit is b (i.e. 1)
then a (i.e. 0) marks the end of the number (it has just one
digit), so in numbers of more than one digit the range is only b-9
(1-35), therefore the weight
b_{1} is 35 instead
of 36. Suppose the threshold values for the second and third digit
are c (2), then the third digit has a weight 34 × 35 = 1190 and we
have the following sequence:
a (0), ba (1), ca (2), .., 9a (35), bb (36), cb (37), .., 9b (70),
bca (71), .., 99a (1260), bcb (1261), etc.
Unlike a regular based numeral system, we have numbers like 9b
where 9 and b each represent 35; yet the representation is unique
because ac and aca are not allowed - the a would terminate the
number.
The flexibility in choosing threshold values allows optimization
depending on the frequency of occurrence of numbers of various
sizes.
The case with all threshold values equal to 1 corresponds to
bijective numeration, where the
zeros correspond to separators of numbers with digits which are
non-zero.
See also
References
- Georges Ifrah. The Universal History of Numbers : From
Prehistory to the Invention of the Computer, Wiley, 1999. ISBN
0-471-37568-3.
- D. Knuth. The Art of Computer
Programming. Volume 2, 3rd Ed. Addison-Wesley. pp. 194–213, "Positional
Number Systems".
- A. L.
Kroeber (Alfred Louis Kroeber) (1876 -
1960), Handbook of the Indians of California, Bulletin 78 of the
Bureau of American Ethnology of the Smithsonian Institution
(1919)
- J.P. Mallory and D.Q. Adams, Encyclopedia of Indo-European
Culture, Fitzroy Dearborn Publishers, London and Chicago,
1997.
- Hans J. Nissen, P. Damerow, R. Englund, Archaic
Bookkeeping, University
of Chicago Press, 1993, ISBN 0-226-58659-6.
- Denise Schmandt-Besserat, How Writing Came About,
University of Texas Press,
1992, ISBN 0-292-77704-3.
- Claudia Zaslavsky, Africa Counts: Number and Pattern in
African Cultures, Lawrence Hill Books, 1999, ISBN
1-55652-350-5.
External links