Characters in C are integers. char is a numeric type. The value
65 stored in a char is the letter A. The value 48 is '0'. The
value 32 is a space. The character 'A' in source code is shorthand
for the integer 65, and the compiler treats them identically.
This encoding is ASCII[1] — the American Standard Code for Information Interchange, standardised in 1963. The assignments are not arbitrary: the uppercase letters run from 65 to 90, the lowercase from 97 to 122, the digits from 48 to 57. These ranges are what every character classification function exploits.
| Range | First char | First value | Last char | Last value |
|---|---|---|---|---|
| Digits | '0' | 48 | '9' | 57 |
| Uppercase | 'A' | 65 | 'Z' | 90 |
| Lowercase | 'a' | 97 | 'z' | 122 |
Run man ascii — the full 128-entry table, with every value in
decimal, octal, and hexadecimal, is one command away.
Why the parameter is int
Every character classification function in libc takes int c, not
char c. The reason is historical but consistent: char might be
signed, and passing a byte value above 127 to a function that expects
a signed char would produce a negative number. By taking int,
the functions accept the full range of unsigned char values (0–255)
plus the value EOF (typically −1), which indicates end-of-file and
is a legitimate input to some of these functions.
All libtci character functions follow the same convention. You
cast the low byte to unsigned char internally when comparing against
ranges.
tci_isascii
tci_isascii checks whether a value falls within the ASCII range —
0 to 127. ASCII is a 7-bit encoding: every character it defines fits
in 7 bits, giving exactly 128 values (2⁷). A value above 127 is
outside ASCII — it may be part of a multi-byte UTF-8 sequence or an
extended character that the functions in libtci's character group
were not designed to handle.
tci_isalpha, tci_isdigit, and the rest only define meaningful
behaviour for ASCII values and EOF. tci_isascii is the guard that
tells you whether a character is in the range where those functions
are reliable.
Create tci_isascii.c:
#include "libtci.h"
int tci_isascii(int c)
{
return (c >= 0 && c <= 127);
}Add tci_isascii.c to SRCS and its declaration to libtci.h:
int tci_isascii(int c);UTF-8
ASCII covers 128 characters — enough for English text and terminal
punctuation. The rest of the world needs more: £, é, 中, α.
UTF-8[2] extends ASCII to cover the full Unicode range while keeping
the 128 ASCII characters exactly where they are.
The rule is simple: every ASCII character (0–127) is a single byte in
UTF-8, unchanged. A byte with its high bit clear (values 0–127) is
always a single ASCII character — which is exactly what tci_isascii
checks.
Characters outside that range use sequences of two, three, or
four bytes, where every byte in the sequence has its high bit set. The
four-byte sequences reach into the emoji range — 🎮 is U+1F3AE,
encoded as 0xF0 0x9F 0x8E 0xAE, the most recent layer Unicode
added on top of the same scheme.
The pound sign £ is Unicode code point U+00A3. In UTF-8 it is the
two-byte sequence 0xC2 0xA3. In C, a char is one byte, so a UTF-8
string is just a char array — but a single visible character can
occupy two, three, or four elements of that array. The string "£100"
is not 4 bytes: it is 5 (0xC2, 0xA3, '1', '0', '0') plus the
null terminator.
This is why the character classification functions in libtci — and in
libc — are only specified for values 0–127 and EOF. A UTF-8
continuation byte (above 127, possibly negative if char is signed)
passed to tci_isalpha produces meaningless results. tci_isascii is
the guard that tells you whether a value is in the range where those
functions are reliable.
tci_isalpha
tci_isalpha answers one question: is this character a letter? It
returns non-zero for anything from 'A' to 'Z' and 'a' to 'z',
zero for everything else. The typical use is filtering input —
accepting only letters, rejecting digits, punctuation, and whitespace.
Create tci_isalpha.c:
#include "libtci.h"
int tci_isalpha(int c)
{
return ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'));
}The single-letter constants 'A', 'Z', 'a', 'z' are integer
literals — their values are 65, 90, 97, and 122 respectively. The
condition checks whether c falls within either range. The two
forms are identical to the compiler:
int tci_isalpha(int c)
{
return ((c >= 65 && c <= 90) || (c >= 97 && c <= 122));
}Prefer the character literal form — 'A' communicates intent more
clearly than 65, and the compiler produces the same code either way.
Run man 3 isalpha — the page covers the entire is* family;
note that all return non-zero for true, not necessarily 1, so
testing if (isalpha(c) == 1) would be wrong.
Add tci_isalpha.c to SRCS and its declaration to libtci.h:
int tci_isalpha(int c);tci_isdigit
tci_isdigit checks whether a character is a decimal digit — '0'
through '9'. The natural use is validating input: before converting
a character to a number with c - '0', you confirm with tci_isdigit
first.
Create tci_isdigit.c:
#include "libtci.h"
int tci_isdigit(int c)
{
return (c >= '0' && c <= '9');
}'0' is 48 and '9' is 57. The digit characters are consecutive in
ASCII, which is why this range check works.
tci_isalnum
tci_isalnum combines the two: it returns non-zero if c is a letter
or a digit. Identifiers in most programming languages accept
alphanumeric characters and little else — this is the function that
answers "is this a valid identifier character?" It is exactly
tci_isalpha or tci_isdigit:
Create tci_isalnum.c:
#include "libtci.h"
int tci_isalnum(int c)
{
return (tci_isalpha(c) || tci_isdigit(c));
}tci_isspace
tci_isspace checks whether a character is whitespace. Whitespace
detection is the backbone of parsing: splitting input into tokens
means deciding where one word ends and the next begins. In C,
whitespace is space (32), tab (9), newline (10), carriage return (13),
form feed (12), and vertical tab (11):
Create tci_isspace.c:
#include "libtci.h"
int tci_isspace(int c)
{
return (c == ' ' || c == '\t' || c == '\n' ||
c == '\r' || c == '\f' || c == '\v');
}The \t, \n, \r, \f, \v are the same backslash escapes you
know from strings. They represent specific byte values: 9, 10, 13,
12, 11.
tci_isupper
tci_isupper returns non-zero if c is an uppercase letter, zero for
anything outside the A–Z range.
Create tci_isupper.c:
#include "libtci.h"
int tci_isupper(int c)
{
return (c >= 'A' && c <= 'Z');
}Add tci_isupper.c and its declaration:
int tci_isupper(int c);tci_islower
tci_islower returns non-zero if c is a lowercase letter, zero for
anything outside the a–z range. Together with tci_isupper,
tci_toupper, and tci_tolower, they form the case-handling group of
the library.
Create tci_islower.c:
#include "libtci.h"
int tci_islower(int c)
{
return (c >= 'a' && c <= 'z');
}Add tci_islower.c and its declaration:
int tci_islower(int c);tci_isprint
tci_isprint checks whether a character is safe to display — something
a terminal can render visibly. Values below 32 are control characters
(tab, newline, escape) that affect terminal behaviour rather than
producing visible output. Value 127 is DEL, also invisible. A
printable character is anything from 32 (space) to 126 (~)
inclusive.
Create tci_isprint.c:
#include "libtci.h"
int tci_isprint(int c)
{
return (c >= 32 && c <= 126);
}tci_toupper and tci_tolower
tci_toupper converts a lowercase letter to its uppercase counterpart;
tci_tolower does the reverse. Both return c unchanged if it is not
a letter of the appropriate case. The conversion works because of a
deliberate property of ASCII: the uppercase and lowercase letters are
in the same order, exactly 32 apart.
| Uppercase | Value | Lowercase | Value |
|---|---|---|---|
A | 65 | a | 65 + 32 = 97 |
M | 77 | m | 77 + 32 = 109 |
Z | 90 | z | 90 + 32 = 122 |
Adding 32 to an uppercase letter gives its lowercase counterpart; subtracting 32 from a lowercase letter gives its uppercase counterpart.
Create tci_toupper.c:
#include "libtci.h"
int tci_toupper(int c)
{
if (tci_islower(c))
return (c - 32);
return (c);
}Create tci_tolower.c:
#include "libtci.h"
int tci_tolower(int c)
{
if (tci_isupper(c))
return (c + 32);
return (c);
}If c is not a letter of the appropriate case, both functions return
c unchanged. The libc specification requires this: toupper('3')
returns '3', not garbage.
Run man 3 toupper — the manual specifies that non-letter inputs
are returned unchanged, which is the condition both functions
check before applying the offset.
Add the remaining six to the Makefile and header
tci_isascii, tci_isalpha, tci_isupper, and tci_islower were already
added after their sections. Add the remaining six to SRCS:
tci_isdigit.c tci_isalnum.c tci_isspace.c \
tci_isprint.c tci_toupper.c tci_tolower.cAnd their declarations to libtci.h:
int tci_isdigit(int c);
int tci_isalnum(int c);
int tci_isspace(int c);
int tci_isprint(int c);
int tci_toupper(int c);
int tci_tolower(int c);tci_atoi
tci_atoi converts a string to an integer: it skips leading whitespace,
reads an optional sign, and accumulates decimal digits until the first
non-digit. Trailing non-digit characters are silently ignored. It is
the libc function atoi — same name, same behaviour.
Its implementation calls tci_isspace and tci_isdigit, both now
declared, which is why it appears here rather than on the strings page.
Create tci_atoi.c:
#include "libtci.h"
int tci_atoi(const char *str)
{
int sign;
long result;
result = 0;
sign = 1;
while (tci_isspace(*str)) /* skip leading whitespace */
str++;
if (*str == '-' || *str == '+') {
if (*str == '-')
sign = -1;
str++; /* consume the sign character */
}
while (tci_isdigit(*str)) {
result = result * 10 + (*str - '0'); /* shift left one decimal place, add digit value */
str++;
}
return ((int)(sign * result));
}Run man 3 atoi — the manual notes that atoi does not detect
overflow; tci_atoi uses long for accumulation precisely because
overflow on int is undefined behaviour.
Add tci_atoi.c to SRCS and the declaration to libtci.h:
int tci_atoi(const char *str);Run make. Twenty-six functions in the library, no warnings.
The next page is the most conceptually significant of the chapter.
It introduces dynamic memory — malloc and free — and the
specific bug that comes from forgetting the null terminator's byte
when allocating a string.