thecodingidiot.com

The ToolkitMemory

Memory

The first group of libtci functions works with raw bytes. Not strings, not integers — bytes. A string is an interpretation we put on top of bytes. An integer is an interpretation. At the hardware level there are only bytes, and tci_memset, tci_memcpy, and tci_bzero operate at that level.

To write them, you need three types you have not seen explained yet: unsigned char, void *, and size_t.

unsigned char

A char in C holds one byte — eight bits. Eight bits means 2⁸ = 256 possible values. The question is which 256: signed char maps them to −128 to 127, unsigned char maps them to 0 to 255. The C standard does not pick one for plain char — the compiler decides, and it differs by platform. For raw memory work, always declare unsigned char explicitly.

The problem with signed char is what happens when the compiler widens it for use in an expression. C promotes small integer types to int before arithmetic. In any integer, bits run from right to left by significance — the rightmost is the low end, the leftmost is the upper end. Widening from 8 bits to 32 adds 24 new bits at the upper end; for a signed type those new bits are filled with the sign bit, the leftmost of the original eight.

Take the value 200. In eight bits:

11001000

In a signed 8-bit type the leftmost bit is 1, so the value is negative. To find the magnitude: flip all bits, then add one:

00110111 = 55  →  55 + 1 = 56

That operation — flip all bits, add one — is called two's complement[1], the encoding that signed integers use for negative values. You do not need to internalise it here; f03a/01 is all the binary background this chapter requires, and we will cover byte-level operations properly later in the curriculum.

So 11001000 means −56 as a signed type and 200 as an unsigned one. Here is what the compiler produces when it widens each to int:

11001000                              (original 8 bits)
11111111 11111111 11111111 11001000   (widened, signed: sign bit copied into upper 24)
00000000 00000000 00000000 11001000   (widened, unsigned: zeros filled into upper 24)

The signed result is −56 as a 32-bit value; the unsigned result is 200. When copying or setting raw bytes, that silent corruption changes the data. unsigned char avoids it.

Every memory function in libtci — and in libc — casts its pointers to unsigned char * before operating on the bytes.

void *

void * is a pointer to "unknown type." malloc returns a void * because it has no idea what you intend to store in the memory it allocates. tci_memset and tci_memcpy take void * because they work on any kind of data — integers, structs, arrays, anything.

You cannot dereference a void * directly; the compiler has no type to interpret. To access the bytes, you cast it to unsigned char * first:

void          *s;
unsigned char *ptr;
 
ptr = s; /* implicit cast from void * to unsigned char * */

In C, void * converts to and from any other pointer type without an explicit cast. This is intentional: it is what makes malloc usable without casting its return value.

size_t

size_t is an unsigned integer type defined in <stddef.h> — which you are already including in libtci.h. It is the type that sizeof returns, and the type that malloc takes. It is guaranteed to be wide enough to hold the size of any object in memory.

Use size_t whenever a variable counts bytes, elements, or array indices. int is a 32-bit signed type — it overflows at 2,147,483,647 bytes, just over 2 GB. On a 64-bit platform, a valid allocation can exceed that. size_t is guaranteed to match the platform's address width — 32 bits on a 32-bit system, 64 bits on a 64-bit system — so it can always represent any valid memory size without overflow. Using int as a loop counter in a memory function is not a style issue; it is a correctness issue on any buffer larger than 2 GB.

tci_memset

tci_memset fills a block of memory with a single byte value. Given a pointer s, a value c, and a count n, it writes c into every one of the n bytes starting at s. The most common use is zeroing a buffer before use — ensuring that every byte holds a known value, not whatever happened to be in that memory before, so real values have clean space to land in:

tci_memset(buf, 0, sizeof(buf));

Every byte in buf becomes zero. You can fill with any byte value — 0xFF sets every byte to 255, 'A' fills a char buffer with the letter A.

The value is passed as int — that is the libc convention, inherited from the early days when char and int were treated loosely — but it is cast to unsigned char before being written.

Create tci_memset.c:

#include "libtci.h"
 
void    *tci_memset(void *s, int c, size_t n)
{
    unsigned char   *ptr;
    size_t          i;
 
    ptr = s;                       /* implicit void * → unsigned char *; gives byte access */
    i = 0;
    while (i < n) {
        ptr[i] = (unsigned char)c; /* keep only the low 8 bits of the int argument */
        i++;
    }
    return (s);                    /* return start of region for call chaining */
}

The cast (unsigned char)c discards the upper three bytes of the int, keeping only the low byte — the rightmost eight bits. This is what the standard requires: the man 3 memset specification states that only the low byte of c is written, regardless of whatever sits in the upper bytes. Passing a value outside 0–255 is valid — the extra bytes are discarded by specification, not just by our implementation.

The function returns s — a pointer to the start of the modified region. This is the libc convention: returning the destination pointer allows chaining calls.

Run man 3 memset — the return value is part of the specification, not an implementation detail: memset is required to return s.

Add tci_memset.c to the SRCS line in the Makefile, and add the declaration to libtci.h:

void    *tci_memset(void *s, int c, size_t n);

tci_memcpy

tci_memset fills with a value you choose — every byte becomes the same. tci_memcpy takes its values from an existing block of memory instead, copying them byte for byte into a new location. Given a destination dst, a source src, and a count n, it reproduces exactly what is in src.

The two regions must not overlap. When source and destination overlap, use tci_memmove instead — it handles that case safely.

If n is larger than either buffer, tci_memcpy will read or write past the end of it. C has no bounds checking — there is no guard that stops you.

The outcomes within your own process range from corrupted adjacent variables to overwritten return addresses on the stack. If the write strays into an unmapped page, the OS kills the process with a segmentation fault.

On Linux that is as far as the damage goes. Virtual memory gives each process its own isolated address space, enforced by the hardware — an out-of-bounds write cannot reach another process or the kernel.

On bare metal there is no such protection. The r-tier targets — the Mega Drive, PS1, Dreamcast — have no OS and no virtual memory. Every address in the system is directly reachable: other variables, the stack, and the memory-mapped registers that control video output, audio, and interrupt handling. A wild write does not produce a clean crash; the program keeps running with corrupted state, and the effects range from garbled graphics to a locked machine. Memory discipline on embedded targets is not a style concern — it is the difference between a running program and a hung one.

The caller is responsible for ensuring n does not exceed the size of either region.

Create tci_memcpy.c:

#include "libtci.h"
 
void    *tci_memcpy(void *dst, const void *src, size_t n)
{
    unsigned char       *d;
    const unsigned char *s;
    size_t              i;
 
    d = dst;        /* implicit void * → unsigned char *; gives byte access */
    s = src;
    i = 0;
    while (i < n) {
        d[i] = s[i]; /* raw byte copy; no type information, no interpretation */
        i++;
    }
    return (dst);   /* return destination for call chaining */
}

const void *src is the source — we are promising not to modify it. The cast to const unsigned char * preserves that promise on the byte-level pointer.

Run man 3 memcpy — the manual specifies that behaviour is undefined if the regions overlap; that is a contract the caller must uphold, not a limitation of the implementation.

Add tci_memcpy.c to SRCS, and the declaration to libtci.h:

void    *tci_memcpy(void *dst, const void *src, size_t n);

tci_memmove

tci_memmove copies n bytes from src to dst, safely handling the case where the two regions overlap. tci_memcpy reads and writes in a single forward pass — if dst lies within src, the copy overwrites source bytes before they are read, corrupting the result. tci_memmove chooses the copy direction based on the relative positions of the two pointers: forward when dst is before src, backward when dst is after it.

The overlap case arises naturally when shifting data within a buffer. Moving bytes from position 3 to position 1 in the same array writes to a region that partly overlaps the source — tci_memcpy would corrupt the data, tci_memmove handles it correctly.

The direction of the copy is the key. When dst comes before src, writing to dst[i] can never touch a source byte that hasn't been read yet — the write is always behind the read. When dst comes after src, a forward pass would overwrite source bytes before reaching them, so the copy goes backward instead:

Create tci_memmove.c:

#include "libtci.h"
 
void    *tci_memmove(void *dst, const void *src, size_t n)
{
    unsigned char       *d;
    const unsigned char *s;
 
    d = dst;
    s = src;
    if (d == s || n == 0)       /* same address or zero bytes: nothing to do */
        return (dst);
    if (d < s)                  /* dst before src: forward pass stays ahead */
        while (n--)
            *d++ = *s++;        /* post-increment: copy byte, then advance both */
    else {                      /* dst after src: copy backward to avoid overwrite */
        d += n;                 /* position one past the last byte to copy */
        s += n;
        while (n--)
            *--d = *--s;        /* pre-decrement: step back first, then copy that byte */
    }
    return (dst);
}

When dst < src, the copy proceeds forward — the destination bytes are written before the corresponding source bytes are reached. When dst > src, the copy proceeds backward from the end, so each source byte is read before the destination byte at that position is written.

The d < s comparison is defined behaviour only when dst and src point into the same object; comparing pointers to unrelated objects is undefined in strict C99. In practice every major compiler produces correct code here, and memmove callers always pass addresses within the same buffer.

Run man 3 memmove — the manual guarantees correct behaviour when regions overlap; this is the only difference from memcpy.

Add tci_memmove.c to SRCS, and the declaration to libtci.h:

void    *tci_memmove(void *dst, const void *src, size_t n);

tci_memchr

tci_memchr searches the first n bytes of a memory block for a byte value, returning a pointer to the first match or NULL if not found. It is to raw memory what tci_strchr is to strings — but where tci_strchr walks until the null terminator, tci_memchr walks exactly n bytes regardless of content. This makes it useful for binary data that may contain null bytes.

Create tci_memchr.c:

#include "libtci.h"
 
void    *tci_memchr(const void *s, int c, size_t n)
{
    const unsigned char *ptr;
    unsigned char        target;
    size_t               i;
 
    ptr = s;
    target = (unsigned char)c;     /* narrow before comparing; avoids sign-extension */
    i = 0;
    while (i < n) {
        if (ptr[i] == target)
            return ((void *)(ptr + i)); /* cast away const: libc specifies void * return */
        i++;
    }
    return (NULL);
}

The parameter is const void * — a promise not to modify the data. The return is void * (non-const), matching the libc convention: the caller who knows the underlying memory is not const can use the result to modify it. The cast (void *)(ptr + i) strips the const to produce the non-const return — the same pattern as tci_strchr.

Run man 3 memchr — the return type is void * even though the input is const void *; this is specified behaviour, not a mistake.

Add tci_memchr.c to SRCS, and the declaration to libtci.h:

void    *tci_memchr(const void *s, int c, size_t n);

tci_bzero

Both tci_memset and tci_memcpy are general — any value, any source. tci_bzero is the specialised case: it always writes zero. Zeroing memory is common enough that it earned its own name.

tci_bzero zeros n bytes starting at s. It is a simpler version of tci_memset with a fixed value of zero. In practice, bzero is obsolete — POSIX removed it. GNU libc still includes it as a GNU extension on Linux (it will compile on your machine), but the POSIX manual marks it deprecated and points to memset as the replacement. It is in libtci because it is convenient shorthand and because many older C programs use it.

Run man 3 bzero — the POSIX manual marks it obsolete and refers to memset as the replacement.

Because we have tci_memset, tci_bzero is one line:

Create tci_bzero.c:

#include "libtci.h"
 
void    tci_bzero(void *s, size_t n)
{
    tci_memset(s, 0, n);    /* zero is the only value bzero ever writes */
}

Add tci_bzero.c to SRCS, and the declaration to libtci.h:

void    tci_bzero(void *s, size_t n);

Build and test

Add all five files to SRCS in the Makefile:

SRCS    = tci_memset.c tci_memcpy.c tci_memmove.c tci_memchr.c tci_bzero.c

Run make:

make

You should see five compile lines and then the ar invocation:

gcc -Wall -Wextra -g -std=c99 -c tci_memset.c -o tci_memset.o
gcc -Wall -Wextra -g -std=c99 -c tci_memcpy.c -o tci_memcpy.o
gcc -Wall -Wextra -g -std=c99 -c tci_memmove.c -o tci_memmove.o
gcc -Wall -Wextra -g -std=c99 -c tci_memchr.c -o tci_memchr.o
gcc -Wall -Wextra -g -std=c99 -c tci_bzero.c -o tci_bzero.o
ar rcs libtci.a tci_memset.o tci_memcpy.o tci_memmove.o tci_memchr.o tci_bzero.o

No warnings. The library now contains five functions. The next page adds the string operations — and the key insight about the null terminator that makes everything else work.

Footnotes

  1. Two's complement - Wikipedia