thecodingidiot.com

The ToolkitCharacters

Characters

Characters in C are integers. char is a numeric type. The value 65 stored in a char is the letter A. The value 48 is '0'. The value 32 is a space. The character 'A' in source code is shorthand for the integer 65, and the compiler treats them identically.

This encoding is ASCII[1] — the American Standard Code for Information Interchange, standardised in 1963. The assignments are not arbitrary: the uppercase letters run from 65 to 90, the lowercase from 97 to 122, the digits from 48 to 57. These ranges are what every character classification function exploits.

RangeFirst charFirst valueLast charLast value
Digits'0'48'9'57
Uppercase'A'65'Z'90
Lowercase'a'97'z'122

Run man ascii — the full 128-entry table, with every value in decimal, octal, and hexadecimal, is one command away.

Why the parameter is int

Every character classification function in libc takes int c, not char c. The reason is historical but consistent: char might be signed, and passing a byte value above 127 to a function that expects a signed char would produce a negative number. By taking int, the functions accept the full range of unsigned char values (0–255) plus the value EOF (typically −1), which indicates end-of-file and is a legitimate input to some of these functions.

All libtci character functions follow the same convention. You cast the low byte to unsigned char internally when comparing against ranges.

tci_isascii

tci_isascii checks whether a value falls within the ASCII range — 0 to 127. ASCII is a 7-bit encoding: every character it defines fits in 7 bits, giving exactly 128 values (2⁷). A value above 127 is outside ASCII — it may be part of a multi-byte UTF-8 sequence or an extended character that the functions in libtci's character group were not designed to handle.

tci_isalpha, tci_isdigit, and the rest only define meaningful behaviour for ASCII values and EOF. tci_isascii is the guard that tells you whether a character is in the range where those functions are reliable.

Create tci_isascii.c:

#include "libtci.h"
 
int     tci_isascii(int c)
{
    return (c >= 0 && c <= 127);
}

Add tci_isascii.c to SRCS and its declaration to libtci.h:

int     tci_isascii(int c);

UTF-8

ASCII covers 128 characters — enough for English text and terminal punctuation. The rest of the world needs more: £, é, , α. UTF-8[2] extends ASCII to cover the full Unicode range while keeping the 128 ASCII characters exactly where they are.

The rule is simple: every ASCII character (0–127) is a single byte in UTF-8, unchanged. A byte with its high bit clear (values 0–127) is always a single ASCII character — which is exactly what tci_isascii checks.

Characters outside that range use sequences of two, three, or four bytes, where every byte in the sequence has its high bit set. The four-byte sequences reach into the emoji range — 🎮 is U+1F3AE, encoded as 0xF0 0x9F 0x8E 0xAE, the most recent layer Unicode added on top of the same scheme.

The pound sign £ is Unicode code point U+00A3. In UTF-8 it is the two-byte sequence 0xC2 0xA3. In C, a char is one byte, so a UTF-8 string is just a char array — but a single visible character can occupy two, three, or four elements of that array. The string "£100" is not 4 bytes: it is 5 (0xC2, 0xA3, '1', '0', '0') plus the null terminator.

This is why the character classification functions in libtci — and in libc — are only specified for values 0–127 and EOF. A UTF-8 continuation byte (above 127, possibly negative if char is signed) passed to tci_isalpha produces meaningless results. tci_isascii is the guard that tells you whether a value is in the range where those functions are reliable.

tci_isalpha

tci_isalpha answers one question: is this character a letter? It returns non-zero for anything from 'A' to 'Z' and 'a' to 'z', zero for everything else. The typical use is filtering input — accepting only letters, rejecting digits, punctuation, and whitespace.

Create tci_isalpha.c:

#include "libtci.h"
 
int     tci_isalpha(int c)
{
    return ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'));
}

The single-letter constants 'A', 'Z', 'a', 'z' are integer literals — their values are 65, 90, 97, and 122 respectively. The condition checks whether c falls within either range. The two forms are identical to the compiler:

int     tci_isalpha(int c)
{
    return ((c >= 65 && c <= 90) || (c >= 97 && c <= 122));
}

Prefer the character literal form — 'A' communicates intent more clearly than 65, and the compiler produces the same code either way.

Run man 3 isalpha — the page covers the entire is* family; note that all return non-zero for true, not necessarily 1, so testing if (isalpha(c) == 1) would be wrong.

Add tci_isalpha.c to SRCS and its declaration to libtci.h:

int     tci_isalpha(int c);

tci_isdigit

tci_isdigit checks whether a character is a decimal digit — '0' through '9'. The natural use is validating input: before converting a character to a number with c - '0', you confirm with tci_isdigit first.

Create tci_isdigit.c:

#include "libtci.h"
 
int     tci_isdigit(int c)
{
    return (c >= '0' && c <= '9');
}

'0' is 48 and '9' is 57. The digit characters are consecutive in ASCII, which is why this range check works.

tci_isalnum

tci_isalnum combines the two: it returns non-zero if c is a letter or a digit. Identifiers in most programming languages accept alphanumeric characters and little else — this is the function that answers "is this a valid identifier character?" It is exactly tci_isalpha or tci_isdigit:

Create tci_isalnum.c:

#include "libtci.h"
 
int     tci_isalnum(int c)
{
    return (tci_isalpha(c) || tci_isdigit(c));
}

tci_isspace

tci_isspace checks whether a character is whitespace. Whitespace detection is the backbone of parsing: splitting input into tokens means deciding where one word ends and the next begins. In C, whitespace is space (32), tab (9), newline (10), carriage return (13), form feed (12), and vertical tab (11):

Create tci_isspace.c:

#include "libtci.h"
 
int     tci_isspace(int c)
{
    return (c == ' ' || c == '\t' || c == '\n' ||
            c == '\r' || c == '\f' || c == '\v');
}

The \t, \n, \r, \f, \v are the same backslash escapes you know from strings. They represent specific byte values: 9, 10, 13, 12, 11.

tci_isupper

tci_isupper returns non-zero if c is an uppercase letter, zero for anything outside the A–Z range.

Create tci_isupper.c:

#include "libtci.h"
 
int     tci_isupper(int c)
{
    return (c >= 'A' && c <= 'Z');
}

Add tci_isupper.c and its declaration:

int     tci_isupper(int c);

tci_islower

tci_islower returns non-zero if c is a lowercase letter, zero for anything outside the a–z range. Together with tci_isupper, tci_toupper, and tci_tolower, they form the case-handling group of the library.

Create tci_islower.c:

#include "libtci.h"
 
int     tci_islower(int c)
{
    return (c >= 'a' && c <= 'z');
}

Add tci_islower.c and its declaration:

int     tci_islower(int c);

tci_isprint

tci_isprint checks whether a character is safe to display — something a terminal can render visibly. Values below 32 are control characters (tab, newline, escape) that affect terminal behaviour rather than producing visible output. Value 127 is DEL, also invisible. A printable character is anything from 32 (space) to 126 (~) inclusive.

Create tci_isprint.c:

#include "libtci.h"
 
int     tci_isprint(int c)
{
    return (c >= 32 && c <= 126);
}

tci_toupper and tci_tolower

tci_toupper converts a lowercase letter to its uppercase counterpart; tci_tolower does the reverse. Both return c unchanged if it is not a letter of the appropriate case. The conversion works because of a deliberate property of ASCII: the uppercase and lowercase letters are in the same order, exactly 32 apart.

UppercaseValueLowercaseValue
A65a65 + 32 = 97
M77m77 + 32 = 109
Z90z90 + 32 = 122

Adding 32 to an uppercase letter gives its lowercase counterpart; subtracting 32 from a lowercase letter gives its uppercase counterpart.

Create tci_toupper.c:

#include "libtci.h"
 
int     tci_toupper(int c)
{
    if (tci_islower(c))
        return (c - 32);
    return (c);
}

Create tci_tolower.c:

#include "libtci.h"
 
int     tci_tolower(int c)
{
    if (tci_isupper(c))
        return (c + 32);
    return (c);
}

If c is not a letter of the appropriate case, both functions return c unchanged. The libc specification requires this: toupper('3') returns '3', not garbage.

Run man 3 toupper — the manual specifies that non-letter inputs are returned unchanged, which is the condition both functions check before applying the offset.

Add the remaining six to the Makefile and header

tci_isascii, tci_isalpha, tci_isupper, and tci_islower were already added after their sections. Add the remaining six to SRCS:

tci_isdigit.c tci_isalnum.c tci_isspace.c \
tci_isprint.c tci_toupper.c tci_tolower.c

And their declarations to libtci.h:

int     tci_isdigit(int c);
int     tci_isalnum(int c);
int     tci_isspace(int c);
int     tci_isprint(int c);
int     tci_toupper(int c);
int     tci_tolower(int c);

tci_atoi

tci_atoi converts a string to an integer: it skips leading whitespace, reads an optional sign, and accumulates decimal digits until the first non-digit. Trailing non-digit characters are silently ignored. It is the libc function atoi — same name, same behaviour.

Its implementation calls tci_isspace and tci_isdigit, both now declared, which is why it appears here rather than on the strings page.

Create tci_atoi.c:

#include "libtci.h"
 
int     tci_atoi(const char *str)
{
    int     sign;
    long    result;
 
    result = 0;
    sign = 1;
    while (tci_isspace(*str))                    /* skip leading whitespace */
        str++;
    if (*str == '-' || *str == '+') {
        if (*str == '-')
            sign = -1;
        str++;                                  /* consume the sign character */
    }
    while (tci_isdigit(*str)) {
        result = result * 10 + (*str - '0');    /* shift left one decimal place, add digit value */
        str++;
    }
    return ((int)(sign * result));
}

Run man 3 atoi — the manual notes that atoi does not detect overflow; tci_atoi uses long for accumulation precisely because overflow on int is undefined behaviour.

Add tci_atoi.c to SRCS and the declaration to libtci.h:

int     tci_atoi(const char *str);

Run make. Twenty-six functions in the library, no warnings.

The next page is the most conceptually significant of the chapter. It introduces dynamic memory — malloc and free — and the specific bug that comes from forgetting the null terminator's byte when allocating a string.

Footnotes

  1. ASCII - Wikipedia

  2. UTF-8 - Wikipedia