%d, %i, and %u print integers in base 10. The conversion from a
binary integer to a sequence of decimal characters is the core operation
that every score counter from 1977 to 1989 had to implement without a
library.
The digit extraction problem
The integer 1234 is stored as a binary value. The characters '1',
'2', '3', '4' are ASCII bytes 49, 50, 51, 52. They are not the
same thing — converting one to the other requires arithmetic.
% 10 gives the remainder after dividing by 10. Because we are working
in base 10, that remainder is always the last decimal digit — the units
place. 1234 % 10 is 4 because 1234 = 123 × 10 + 4.
/ 10 is integer division — the same as regular division, except
everything after the decimal point is discarded:
1234 / 10 = 123.4 → integer division: 123The fractional part is gone. Applying both operations repeatedly peels off one digit per iteration until nothing remains.
To convert a digit (0–9) to its ASCII character, add '0'. The
character '0' is ASCII 48; adding 4 gives 52, which is '4'. The
same offset works for every digit.
This is the same fixed-offset trick from c01/04:
tci_toupper and tci_tolower add or subtract 32 because uppercase and
lowercase letters are exactly 32 apart in ASCII. The digit offset is
the same idea applied to a different range.
1234 % 10 = 4 → '0' + 4 = '4' (ASCII 52)
1234 / 10 = 123
123 % 10 = 3 → '0' + 3 = '3' (ASCII 51)
123 / 10 = 12
12 % 10 = 2 → '0' + 2 = '2' (ASCII 50)
12 / 10 = 1
1 % 10 = 1 → '0' + 1 = '1' (ASCII 49)
1 / 10 = 0 → stopThe digits arrive in reverse order. A buffer of fixed size holds them; a
reverse pass puts them in the correct order before writing. An int has
at most 10 decimal digits plus a sign, so a buffer of 12 bytes is enough
for base 10. The function below uses 64 — because it handles any base,
and in base 2 an unsigned long on a 64-bit platform needs up to 64
binary digits.
Number bases
You use base 10 every day without thinking about it. "Base 10" just means there are 10 distinct digits — 0 through 9. When you run out of digits in a position, you carry over to the next: 9 + 1 becomes 10, not a new symbol. The position to the left is worth ten times the position to the right.
Any other number of symbols works the same way. The breadboard chapter used base 2: only two digits, 0 and 1. When a bit is 1 and you add 1 more, you carry: 1 + 1 becomes 10 in binary (one group of two, zero units). Every position to the left is worth twice the one to its right. The number 1234 in decimal looks different in other bases, but it represents the same quantity:
1234 in base 10: 1234
1234 in base 16: 4d2 (hex: 4×256 + 13×16 + 2)
1234 in base 2: 10011010010 (binary)Base 16 (hexadecimal) is common in programming because four binary
digits map exactly to one hex digit — it is a compact way to write
binary values. That is why %x and %p use it.
The digit extraction algorithm works for any base. Instead of % 10
and / 10, use % base and / base. The only other change is the
digit set: base 10 uses "0123456789", base 16 uses
"0123456789abcdef". The length of the string is the base.
tci_putnbr_base
Write a helper that works for any base, not just base 10. %d/%u pass
"0123456789"; %x/%X pass a hex digit string. The same function
handles all of them:
static int tci_putnbr_base(unsigned long n, const char *base, int fd)
{
char buf[64]; /* enough for any unsigned long in any base */
char tmp;
int blen;
int len;
int i;
blen = (int)tci_strlen(base);
len = 0;
if (n == 0)
buf[len++] = base[0]; /* zero is a valid digit, not empty output */
while (n > 0) {
buf[len++] = base[n % blen];
n /= blen;
}
i = 0;
while (i < len / 2) { /* reverse in place */
tmp = buf[i];
buf[i] = buf[len - 1 - i];
buf[len - 1 - i] = tmp;
i++;
}
write(fd, buf, len);
return (len);
}unsigned long as the parameter type is deliberate: the function is
called with both unsigned int (from %u, %x, %X) and uintptr_t
(from %p). On Linux, the LP64 data model guarantees that unsigned long is the same width as a pointer — 32 bits on a 32-bit system, 64
bits on x86-64 — so it is always wide enough for uintptr_t. This
does not hold on Windows, where the LLP64 model keeps unsigned long
at 32 bits even on 64-bit systems. This chapter runs on Linux, so
LP64 applies.
%d and %i — signed decimal
Both specifiers behave identically. The argument is a signed int. The
same width reasoning from above applies here: the function promotes to
long before doing anything, for reasons that become clear at the edge
of the type's range.
static int tci_print_signed(int n)
{
int count;
long val;
count = 0;
val = n; /* promote to long before negating */
if (val < 0) {
count += tci_putchar_fd('-', 1);
val = -val; /* negate: safe because val is long */
}
count += tci_putnbr_base((unsigned long)val, "0123456789", 1);
return (count);
}A 32-bit signed int holds values from −2147483648 to 2147483647.
The positive range stops at 2147483647 — one short of the magnitude of
the most negative value. Negating −2147483648 as an int would require
storing 2147483648, which exceeds INT_MAX by exactly 1 and produces
undefined behaviour.
This is the kind of edge case that goes unnoticed for a long time.
Every other negative integer negates cleanly; only this one value
breaks, and it only surfaces when someone passes INT_MIN to the
function. A quick test of %d with −1, −100, or −32768 passes
without issue — −2147483648 is the one that exposes it.
Promoting to long first gives 64 bits of room. The negation fits,
and the cast to unsigned long before passing to tci_putnbr_base
is then safe.
In dispatch:
if (spec == 'd' || spec == 'i')
return (tci_print_signed(va_arg(*args, int)));Where the limits come from
These numbers are not arbitrary. A 32-bit integer has 32 binary digits, giving 2³² = 4294967296 distinct values. An unsigned int uses all of them for positive numbers: 0 to 4294967295 (2³² − 1). A signed int splits that range in two — one bit encodes the sign, leaving 31 bits for the magnitude: 2³¹ − 1 = 2147483647 on the positive side, and −2³¹ = −2147483648 on the negative side. The asymmetry by one is a direct consequence of that split.
The same rule applies to every unsigned type. The unsigned char from
c01/02
has 8 bits: 2⁸ = 256 values, 0 to 255. The bit width changes; the
formula does not.
%u — unsigned decimal
After the INT_MIN detour, %u is a relief. An unsigned integer cannot
be negative by definition — it uses the full 2³² range for positive
values — so there is no sign to print and no value to negate. The
widening cast to unsigned long is still needed because
tci_putnbr_base expects it, but it cannot lose information: any 32-bit
unsigned value fits in a 64-bit unsigned long.
static int tci_print_unsigned(unsigned int n)
{
return (tci_putnbr_base((unsigned long)n, "0123456789", 1));
}In dispatch:
if (spec == 'u')
return (tci_print_unsigned(va_arg(*args, unsigned int)));Run man 3 printf — under d, i the manual specifies the int
argument type; under u it specifies unsigned int. Using the wrong
type in va_arg reads garbage from the argument list. The types look
similar and the compiler will not warn — it is a silent mistake that
only shows at runtime.
make re
bash test.shThe %d, %i, and %u rows must all pass — including edge cases for
0, INT_MAX, INT_MIN, and UINT_MAX.