thecodingidiot.com

The ToolkitStrings

Strings

A C string is a sequence of char values with a zero byte at the end. That zero byte — \0, the null terminator — is the entire mechanism. There is no length stored anywhere. There is no struct wrapping the characters. There is just the bytes, and the agreement that the first zero marks the end.

This design has one important consequence: any function that needs to know a string's length must walk the entire string to find it. tci_strlen does not look up a stored number. It counts. Every time. The implication runs through every string function you will write: if you allocate space for a string, you must allocate length + 1 bytes to hold the null terminator. Forgetting the + 1 is the most common string bug in C.

The null terminator

When you write a string literal in C:

char  *s;
 
s = "hello";

the compiler places six bytes in memory: h, e, l, l, o, \0. The \0 at the end is not something you typed — the compiler adds it. The pointer s holds the address of the h. Functions that work with s read forward until they hit the \0.

You can also declare a string as an array and the \0 is still required:

char  name[6];
 
name[0] = 'D';
name[1] = 'o';
name[2] = 'o';
name[3] = 'm';
name[4] = '\0'; /* without this it is not a valid C string */

\0 is how you write a literal null byte in C source code. The backslash-zero is the character with value zero — the same zero that terminates strings.

tci_strlen

tci_strlen returns the number of characters in a string, not counting the null terminator — "hello" gives 5, "" gives 0. Because C stores no length alongside the bytes, the only way to find it is to walk forward until the \0:

Create tci_strlen.c:

#include "libtci.h"
 
size_t  tci_strlen(const char *s)
{
    size_t  len;
 
    len = 0;
    while (s[len])  /* s[len] is falsy when it reaches '\0' */
        len++;
    return (len);   /* count does not include the '\0' itself */
}

s[len] is the character at index len. When it is zero — when the null terminator is reached — the condition is false and the loop stops. The returned len does not include the \0.

Run man 3 strlen — the return type is size_t, not int. INT_MAX is the largest value a 32-bit int can hold: 2,147,483,647. A string longer than that would overflow an int counter on a 64-bit platform, silently wrapping to a negative number. size_t cannot be negative and is wide enough to count any valid string.

Add tci_strlen.c to SRCS, and the declaration to libtci.h:

size_t  tci_strlen(const char *s);

tci_strcpy

tci_strcpy copies a string from src into dst, including the null terminator. In C, assigning one string pointer to another with = copies only the pointer — both variables end up pointing at the same bytes. tci_strcpy copies the characters themselves, giving dst its own independent copy.

Create tci_strcpy.c:

#include "libtci.h"
 
char    *tci_strcpy(char *dst, const char *src)
{
    size_t  i;
 
    i = 0;
    while (src[i]) {        /* stop when src reaches '\0' — before copying it */
        dst[i] = src[i];
        i++;
    }
    dst[i] = '\0';          /* loop exits without copying the terminator; write it manually */
    return (dst);           /* return destination for call chaining */
}

The loop copies every character until it hits \0, then writes \0 explicitly after the loop ends. The \0 write is separate because the loop condition while (src[i]) exits before copying the terminator. Without the line dst[i] = '\0', the destination would not be a valid C string.

The caller is responsible for ensuring dst has enough space. strcpy does not check — it writes however many bytes the source contains. Passing a destination too small to hold the source corrupts memory.

Run man 3 strcpy — the manual is explicit that the caller is responsible for ensuring dst is large enough; there is no bounds check inside the function.

Add tci_strcpy.c to SRCS, and the declaration to libtci.h:

char    *tci_strcpy(char *dst, const char *src);

tci_strncpy

tci_strncpy copies at most n bytes from src to dst. If src is shorter than n, the remainder of dst is padded with null bytes. If src is longer than n, the destination is not null- terminated — a subtle trap that catches many programmers.

Create tci_strncpy.c:

#include "libtci.h"
 
char    *tci_strncpy(char *dst, const char *src, size_t n)
{
    size_t  i;
 
    i = 0;
    while (i < n && src[i]) {  /* stop at budget or end of src, whichever comes first */
        dst[i] = src[i];
        i++;
    }
    while (i < n)               /* src shorter than n: pad remaining bytes with '\0' */
        dst[i++] = '\0';        /* if src was longer than n, this loop runs zero times: no terminator */
    return (dst);
}

The first loop copies characters until it runs out of source or runs out of budget. The second loop pads with zeros to reach exactly n bytes. If the source was longer than n, the second loop runs zero times and no terminator is written.

Run man 3 strncpy — the DESCRIPTION names this case explicitly: it is specified behaviour, not a bug.

Add tci_strncpy.c and its declaration:

char    *tci_strncpy(char *dst, const char *src, size_t n);

BSD functions: tci_strlcpy and tci_strlcat

strcpy has no length limit — it writes however many bytes the source contains without checking the destination size. strncpy adds a limit, but its truncation behaviour is a trap: if the source is longer than n, the destination is left without a null terminator.

strlcpy and strlcat are BSD replacements that solve both problems. They always null-terminate, always take the destination's total buffer size, and return the length of the source string rather than the destination. The return value enables truncation detection: if it is greater than or equal to the buffer size, the result was truncated:

char    buf[8];
size_t  needed;
 
needed = tci_strlcpy(buf, "hello world", sizeof(buf));
if (needed >= sizeof(buf))
    /* truncation: source was 11 bytes, buffer holds 7 + '\0' */

Both functions originate in BSD libc and are not in GNU libc on Linux — see the note in Setup. libtci provides them so c01–c05 projects can use them on Linux.

tci_strlcpy copies src into dst, writing at most size - 1 characters and always appending \0. It returns the length of src:

Create tci_strlcpy.c:

#include "libtci.h"
 
size_t  tci_strlcpy(char *dst, const char *src, size_t size)
{
    size_t  src_len;
 
    src_len = tci_strlen(src);               /* measure once; needed for the branch and the return */
    if (size == 0)                          /* no buffer at all: nothing to write */
        return (src_len);                   /* still report what was needed for truncation detection */
    if (src_len < size)                     /* source fits: copy characters and terminator together */
        tci_memcpy(dst, src, src_len + 1);   /* +1 includes the '\0' */
    else {                                  /* source longer than buffer: truncate */
        tci_memcpy(dst, src, size - 1);      /* size-1 reserves the last byte for '\0' */
        dst[size - 1] = '\0';               /* always terminate, even when truncated */
    }
    return (src_len);                       /* source length, not bytes written: caller detects truncation */
}

When src_len < size, the entire source fits — copy characters and terminator in one operation. When the source is longer, copy size - 1 characters and write the terminator manually. Either way, dst is always a valid C string.

Run man 3 strlcpy — on Linux this requires man-db and the BSD manual pages package; the function is not in the GNU libc manual.

Add tci_strlcpy.c and its declaration:

size_t  tci_strlcpy(char *dst, const char *src, size_t size);

tci_strlcat appends src to the end of dst, writing into at most size total bytes (including the existing content of dst). It returns dst_len + src_len — the total length the result would have been without truncation. The same comparison detects truncation: if the return value is greater than or equal to size, the append was cut short.

Create tci_strlcat.c:

#include "libtci.h"
 
size_t  tci_strlcat(char *dst, const char *src, size_t size)
{
    size_t  dst_len;
    size_t  src_len;
 
    dst_len = tci_strlen(dst);           /* find where dst ends; append starts here */
    src_len = tci_strlen(src);           /* measure source; needed for the return value */
    if (size <= dst_len)                /* buffer already full or smaller than dst: nothing to append */
        return (dst_len + src_len);     /* still report total length that would have been needed */
    tci_strlcpy(dst + dst_len, src, size - dst_len); /* dst+dst_len points at the '\0'; size-dst_len is remaining space */
    return (dst_len + src_len);         /* combined length without truncation: caller detects if append was cut short */
}

dst + dst_len is a pointer to the null terminator of dst — the point where appending begins. size - dst_len is the remaining space in the buffer. If size <= dst_len, the destination is already full or longer than the budget — nothing is written and the function returns the total length that would have been needed.

Add tci_strlcat.c and its declaration:

size_t  tci_strlcat(char *dst, const char *src, size_t size);

tci_strcmp and tci_strncmp

In C, comparing two strings with == compares the pointers — it answers "are these the same memory location?", not "do these contain the same characters?" tci_strcmp compares the characters. It walks both strings in parallel, returning the difference between the first characters that differ. If the strings are identical, it returns zero.

Create tci_strcmp.c:

#include "libtci.h"
 
int     tci_strcmp(const char *s1, const char *s2)
{
    size_t  i;
 
    i = 0;
    while (s1[i] && s1[i] == s2[i])               /* advance while s1 has chars and both match */
        i++;
    return ((unsigned char)s1[i] - (unsigned char)s2[i]); /* cast avoids sign-extension on bytes above 127 */
}

The loop condition has two parts. s1[i] is truthy as long as the character is not the null terminator — it stops the loop when s1 ends. s1[i] == s2[i] stops the loop the moment the characters differ. Both must be true to keep advancing: not at the end of s1, and the characters still match. If s2 is shorter, its null terminator will not match the character in s1, and the loop exits.

After the loop, i sits at the first position where the strings differ — or at the null terminator if they were identical throughout. Subtracting s2[i] from s1[i] gives a positive number if s1's character sorts later, negative if it sorts earlier, and zero if both are the null terminator — meaning the strings are equal.

The (unsigned char) cast is necessary because plain char can be signed. A character above 127 — an accented letter, for example — would be treated as a negative value before the subtraction, producing a wrong result. Casting both sides to unsigned char first ensures a consistent ordering regardless of whether char is signed on the platform.

tci_strncmp adds an upper limit on how many bytes to compare:

Create tci_strncmp.c:

#include "libtci.h"
 
int     tci_strncmp(const char *s1, const char *s2, size_t n)
{
    size_t  i;
 
    if (n == 0)
        return (0);                                       /* zero bytes to compare: always equal */
    i = 0;
    while (i < n - 1 && s1[i] && s1[i] == s2[i])          /* n-1: reserve last step for final comparison */
        i++;
    return ((unsigned char)s1[i] - (unsigned char)s2[i]); /* cast avoids sign-extension on bytes above 127 */
}

Run man 3 strcmp — the return value is described as the sign of the difference between the first differing bytes treated as unsigned char; that is exactly what the cast in our implementation enforces.

Add both to SRCS and declare them:

int     tci_strcmp(const char *s1, const char *s2);
int     tci_strncmp(const char *s1, const char *s2, size_t n);

tci_strchr and tci_strrchr

tci_strchr finds the first occurrence of a character in a string. tci_strrchr finds the last. Both return a pointer to the found character, or NULL if the character is not present.

The target character is passed as int — the same libc convention used by tci_memset. Both functions search for the low byte of the value.

Note that both functions must also search for \0. If c is zero, tci_strchr should return a pointer to the null terminator, not NULL. The loop must include the terminator in the search.

Create tci_strchr.c:

#include "libtci.h"
 
char    *tci_strchr(const char *s, int c)
{
    unsigned char  target;
 
    target = (unsigned char)c;          /* narrow to one byte before comparing */
    while (*s) {
        if ((unsigned char)*s == target)
            return ((char *)s);         /* cast strips const: libc convention */
        s++;
    }
    if ((unsigned char)*s == target)    /* check '\0' itself: c==0 must return the terminator, not NULL */
        return ((char *)s);
    return (NULL);
}

while (*s) dereferences the pointer to read the current character and uses it directly as the condition. A character with value zero — the null terminator — is falsy, so the loop stops there. It is equivalent to writing while (*s != '\0'). The loop checks each character, advancing s with s++ rather than an index counter. Both styles work; pointer increment is common in functions that walk through a string without needing to return an index. After the loop, one more check handles the case where c is the null terminator itself.

The cast (char *)s strips const to match the return type. The libc function does the same: the parameter is const char * but the return is char *, because the caller who knows the underlying string is not const can use the result to modify it.

Create tci_strrchr.c — the same logic, but walk the entire string first and keep track of the last match:

#include "libtci.h"
 
char    *tci_strrchr(const char *s, int c)
{
    unsigned char   target;
    const char      *last;
 
    target = (unsigned char)c;          /* narrow to one byte before comparing */
    last = NULL;                        /* no match found yet */
    while (*s) {
        if ((unsigned char)*s == target)
            last = s;                   /* record position; keep walking to find a later match */
        s++;
    }
    if ((unsigned char)*s == target)    /* check '\0' itself in case c == 0 */
        last = s;
    return ((char *)last);              /* NULL if never matched; cast strips const */
}

Run man 3 strchr — the manual confirms that searching for \0 must return a pointer to the null terminator; the post-loop check in both functions implements this requirement.

Add both files to SRCS and their declarations to the header.

tci_strnstr

tci_strnstr searches for the string needle within haystack, but only within the first len bytes — it will not read beyond that limit or past a null terminator. The standard strstr searches the entire string with no upper bound; tci_strnstr adds a length constraint. It originates in BSD libc and is not in GNU libc on Linux — see the note in Setup. libtci includes it so c01–c05 projects can use it on Linux.

The use case is substring search within a bounded region: scanning a fixed-size buffer without risking a read past its end. Returns a pointer to the first match, or NULL if not found.

Create tci_strnstr.c:

#include "libtci.h"
 
char    *tci_strnstr(const char *haystack, const char *needle,
        size_t len)
{
    size_t  nlen;
    size_t  i;
 
    if (!*needle)                              /* empty needle matches at the start */
        return ((char *)haystack);
    nlen = tci_strlen(needle);
    if (nlen > len)                            /* needle longer than the window: impossible match */
        return (NULL);
    i = 0;
    while (i <= len - nlen && haystack[i]) {   /* stop when remaining window is too short or string ends */
        if (tci_strncmp(haystack + i, needle, nlen) == 0)
            return ((char *)haystack + i);     /* cast strips const: libc convention */
        i++;
    }
    return (NULL);
}

Add tci_strnstr.c and its declaration:

char    *tci_strnstr(const char *haystack, const char *needle,
        size_t len);

Build and test

Update SRCS to include all ten new files. Run make and confirm no warnings. The library now has fifteen functions. The next page adds the character classification group — ten short functions that teach an important lesson about how C represents characters. tci_atoi follows at the end of that page: it depends on tci_isspace and tci_isdigit, so it lives where those functions are declared.