thecodingidiot.com

Writing CThe Standard Library

The Standard Library

The C standard library is a set of functions and types available on every conforming C implementation. You have already used one of them: printf from <stdio.h>. sizeof is a built-in C operator, not part of the library. This page covers the parts you need to build words.c.

Reading a line: fgets

scanf with %s stops at whitespace, which makes it unsuitable for reading a whole sentence. fgets reads up to a full line:

#include <stdio.h>
 
int main(void)
{
    char line[256];
 
    fgets(line, sizeof(line), stdin);
    printf("You entered: %s", line);
    return (0);
}

fgets(buffer, size, stream) reads at most size - 1 characters from stream, stores them in buffer, and appends a null terminator \0. It includes the trailing newline in the buffer if it fits. stdin is the standard input stream — the same source scanf reads from.

char line[256] declares an array of 256 characters. An array is a fixed-size block of contiguous memory. line[0] is the first character, line[255] is the last. The \0 at the end marks where the string ends — C strings are null-terminated by convention.

String functions: string.h

#include <string.h>
 
size_t len = strlen("hello");   /* 5 — does not count the \0 */

strlen counts characters up to (but not including) the null terminator. The return type is size_t — an unsigned integer type large enough to hold any size on the platform. Print it with %zu.

Other common functions:

FunctionWhat it does
strlen(s)length of string s
strcpy(dst, src)copy src into dst
strcat(dst, src)append src to dst
strcmp(s1, s2)compare; returns 0 if equal

Be careful with strcpy and strcat — they do not check that dst is large enough. Writing past the end of an array is undefined behaviour. The decision comes down to whether you can guarantee the buffer is large enough.

If the sizes are known and controlled by your own code, they are fine. If the input comes from the outside world — user input, a network packet, a file — use strncpy or snprintf, which take a maximum length. Note that strncpy does not null-terminate the destination if the source fills the buffer;

snprintf is the safer general-purpose choice. The rule is not "never use strcpy"; it is "never use strcpy when you cannot prove the destination is large enough."

Characters: ctype.h

#include <ctype.h>
 
isalpha('A')    /* non-zero (true) */
isdigit('3')    /* non-zero (true) */
isspace(' ')    /* non-zero (true) */
isspace('\t')   /* non-zero (true) — tab is also whitespace */
isspace('\n')   /* non-zero (true) — so is newline          */
tolower('A')    /* 'a' */
toupper('a')    /* 'A' */

These functions take an int (a character value) and return an int. They are the building blocks for parsing text character by character.


words.c

words.c reads one line from standard input and prints the number of words and the number of characters it contains. A word is a run of non-whitespace characters. Transitioning from whitespace to non-whitespace starts a new word.

#include <stdio.h>
#include <string.h>
#include <ctype.h>
 
int main(void)
{
    char   line[1024];
    int    words;
    int    chars;
    int    in_word;
    size_t i;
    size_t len;
 
    if (fgets(line, sizeof(line), stdin) == NULL) {
        printf("0 words, 0 characters\n");
        return (0);
    }
    words = 0;
    chars = 0;
    in_word = 0;
    len = strlen(line);
    i = 0;
    while (i < len) {
        if (line[i] == '\n')
            break;
        chars++;
        if (!isspace((unsigned char)line[i])) {
            if (!in_word) {
                words++;
                in_word = 1;
            }
        }
        else
            in_word = 0;
        i++;
    }
    printf("%d %s, %d %s\n", words, words == 1 ? "word" : "words",
        chars, chars == 1 ? "character" : "characters");
    return (0);
}

The (unsigned char) cast before isspace is a correctness requirement that will make full sense once you reach types and memory in the C chapters. For now, treat it as a rule: always cast to (unsigned char) when passing a char to any ctype.h function.

Test it:

gcc -Wall -Wextra words.c -o words
echo "hello world foo" | ./words
echo "one" | ./words
echo "" | ./words

Expected:

3 words, 15 characters
1 word, 3 characters
0 words, 0 characters

The next page brings rand(), srand(), and a loop together into the guessing game — the chapter's final program.