Table of Contents
Numeric Types
Numeric types represent Real numbers in a computer. Because computers have limited space, most numeric values are approximations of a real value.
Math in a computer is a combination of a storage of data, and a collection of operators. Interpretation of the bits is up to the operators. When we talk about a type of a number, we are usually referring to the operations we can do on them, rather than the storage.
Here are some Numeric Types that programming languages usually support.
- Signed Integer (32 bits) - These usually have fast support in the processor. Sometimes called an int.
- Two's Complement - Represents Numbers as a string of bits. Bit Vector addition results in (inv(N) + 1) + N = 0
- One's Complement - Represents Numbers as a string of bits. Bit Vector addition results in (inv(N)) + N = 0
- Signed Magnitude - Represents Numbers as one bit for the sign, and the remaining as an unsigned magnitude. N = (-1)^S x M
- Signed Integer (8 bits) - Sometimes called a byte. Sometimes called an octet in the context of networking.
- Signed Integer (16 bits) - Sometimes called a short
- Signed Integer (24 bits) - Uncommon, sometimes called a medium. Some GPUs include support for these.
- Signed Integer (64 bits) - Sometimes called a long. These are the most common word size on a machine today
- Signed Integer (128 bits) - Less common, Sometimes used in C#.
- Unsigned Integer (32 bits) - Represents a magnitude from 0 to 2^B - 1, where B is bits. Overflow wraps around usually.
- Unsigned Integer (8 bits) - Sometimes also called a byte, or an octet. C usually calls these
char. - Unsigned Integer (16 bits) - Sometimes called a short, Java calls this a
char. - Unsigned Integer (64 bits) - Occasionally called a long long in C.
- Float (32 bit) - Represents a number in binary scientific notation.
- Double (64 bit) - Also a floating point number, but bigger. Sometimes called a Float64. Double is short for double precision floating point number.
- Fixed point - Represents a decimal value with a non floating decimal point. Simpler to implement but doesn't capture much range. This type is not a native type to most CPUs.
- Arbitrary Precision Integer - Sometimes called big integers. Some languages use them as the native type.
- Arbitrary Precision Decimal - Represents a number with high precision.
- Complex (x + jy) - Usually a real + imaginary number.
- Complex (m * theta) - An alternative representation in Polar. Uncommon, but better for multiplication.
There are many less common types, or types that libraries implement.
- Rational - A ratio of two integers. Go supports this.
As a consequence of their representation, most operations on numeric types are not true to their eponymous mathematical function. For example, adding two Int32 numbers can overflow, resulting in a wrong answer. Also, most floating point operations are not associative: (a + b) + c != a + (b + c). This is usually uncommon enough that it isn't a problem, but care should be taken to programming defensively.
Support in Languages
Java
Main: Java Numeric.
Numeric Observations
When dividing two quantities, the quotient and the remainder have different units. For example, 10 apples divided by 3 apples is 3, but the remainder is 1 apple. Note that the quotient unit changed.
The units may be different too. Consider calculating the quotient and remainder of 10 meters in 3 seconds.
A = 10 meters
B = 3 seconds
Q = floor(A / B)
R = A % B
This maintains the identity A = Q * B + R. Thus:
Q = 3 meters/second
R = 1 meter
Taylor Series
The Taylor Series for the exponential function diverges quickly if less than 100 terms are used. These terms tend to have enormous numerators and denominators. The numerator is usually x^100, and the denominator is 100!. Floating point math is not suited to represent these values as the number of terms exceeds 100ish.
