3 minute read

Daily dispatches from my 12 weeks at the Recurse Center in Summer 2023

Last time on stumbling through building a DNS resolver in C, I messed around with the nifty function inet_pton(), which takes IPv4 and IPv6 addresses and converts them from Presentation to Network format, which is to say from a human-friendly format like 31.13.70.36 to a machine-friendly 32-bit integer like 520963620. Along the way we discovered the hazards of endian-ness and learned our lesson: never, ever don’t convert numbers from host-to-network byte order and back!

That goes for port numbers, too, by the way. Want to make a DNS query on port 53? Don’t forget to first convert it with htons(53)! Otherwise, if you’re on an Intel machine, you’ll end up querying port 13563. Why? 53 in hex is 0x35, so you’d think a two-byte integer would render that value as 0x0035. And you’d be right if you’re thinking the way networks think! But Intel machines are eccentric and little-endian, and they would store that integer as 0x3500 (least significant byte first), which would be interpreted as 13563 by big-endian machines and networks. So, do your conversions, folks!

One of the things I did since then, in any case, is flesh out a toy version of inet_pton(), which does the full conversion from an IPv4 address to a 32-bit integer in a REPL-style interface.

IPv4 to int

The little program isn’t that complicated, but, well, it’s in C, so the whole thing felt like a bunch of LeetCode problems. (That’s a compliment.)

Here’s the full code:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

int main()
{
    char c, chunk[4];
    int i, chunk_as_i, chunks_count;
    uint32_t addr;

    i = 0;
    addr = 0;
    chunks_count = 0;

    while ((c = getchar()) != EOF)
    {

        /* convert chunk to int if we reach a `.` or `\n` */
        if (c == '.' || c == '\n') {
            chunk[i] = '\0';            // append null to chunk
            addr <<= 8;                 // shift addr 8 bits left
            chunk_as_i = atoi(chunk);   // convert current chunk to int
            if (chunk_as_i > 255) {     // exit on invalid chunk
                printf("Invalid IPv4 address component: %s\n", chunk);
                return 1;
            }
            addr |= atoi(chunk);        // add chunk to addr

            // reset chunk
            for (; i>0; --i) { chunk[i] = '\0'; }

            // inc chunks_count
            chunks_count++;

            // output addr and reset on newline
            if (c == '\n') {
                if (chunks_count == 4) {
                    printf("%u\n", addr);
                } else {
                    printf("Unexpected IPv4 address format ");
                    for (int j=1; j<chunks_count; ++j)
                        printf("x.");
                    printf("x\n");
                    return 1;
                }
                chunks_count = 0;
                addr = 0;
            }

        /* else append char to chunk if numeric */
        } else if (i < 3 && c >= '0' && c <= '9'){
            chunk[i++] = c;
        }
    }

    return 0;
}

There’s plenty of room for refactoring here, and probably error-detection and handling could be better, and, well, probably a lot could be better. But there are a few things that I think are especially neat that I’m psyched about.

The first is the decision not to read in an entire line at a time (i.e., up until a newline is detected), but to read in what I’m calling here a chunk at a time, meaning the parts of the IPv4 address between dots. So I’ve got chunk, a little array of chars of length 4 (the extra space being for a newline character so that I can easily treat the variable like a string), and as I traverse the input stream, I add the numeric characters I encounter to consecutive indexes. When I encounter a dot or a newline character, though, it’s time to rock and roll.

If it’s a dot, then that means we have a chunk to deal with, which leads us to my second innovation (which actaully I borrowed from a helpful recurser who was perplexed by my previous decision to work with bitstrings representing this 32-bit integer): the bitwise approach, of course! We shift addr, our output variable, 8 bits left and OR it with the integer value of the current chunk (once we ensure that it’s a valid one-byte value, that is). Then we reset chunk and our incrementer in one fell swoop, and continue along our input stream.

That just leaves the output. If we encounter a newline character, that means (in theory) that we’re finished and it’s time to print out addr. Once we make sure that there are, indeed, 4 chunks as expected, then, yeah, let ‘er rip! Otherwise, something ain’t right. ABORT PROGRAM.

Bye HAL

And that, ladies and gentlemen, is what reinventing the wheel, in a much worse and more brittle way, looks like.