6 minute read

Daily dispatches from my 12 weeks at the Recurse Center in Summer 2023

A few weeks back, a few recurser friends and I implemented a DNS resolver in Python using this excellent guide put together by a legendary RC alum. Knowing nothing about networking or how the internet works, I found this to be a wonderful and approachable introduction to the topic.

My appetite thusly whet (whetted?), I figured, hey, since I’ve also been focusing on C these past few months, why not try to re-implement this DNS resolver thing in C ? How hard could it be?

My initial thought was to take the approach of line-by-line translation, so I began dutifully building structs to handle the DNS header and DNS question as I had in the Python version. In that case, the DNS header class consisted of 6 integer attributes, and the DNS question class had 2 integer attributes and a string (the domain we want to resolve). The main thing that needed to happen with the values contained by these objects was to translate them into one long hex sequence, with each integer attribute represented as a two-byte hex value.

I was about to start implementing an integer-to-hex function so that I could encode my header and question structs to hexstrings, but then I stopped, since I realized that I’d better be damn sure that whatever C libraries I’ll be using are expecting the same information in the same format. My understanding is that these protocols are strictly defined, so I’m not expecting something wildly different. All the same, I do want to be sure that the C socket API wants, for instance, this kind of hexstring. That’s when I realized this translation would not be so straightforward after all. That’s also when I began consulting Beej’s Guide to Network Programming.

Along with a few friends also interested in this project, I made several small-step-but-giant-leaps forward today with Beej’s guide on our side. Our rough plan for a DNS resolver that attempts to circumvent higher-level abstractions like getaddreinfo() looks like this (which basically mirrors the Python tutorial that is our model):

  • open a socket with socket()
  • connect to it with connect()
  • send stuff with sendto()
  • get stuff with recv()
  • examine it and decide what to do next
  • close socket with close()

We sorted out how socket() works. But our next obstacle came when we started thinking through how to send stuff with sendto().

In the Python tutorial, the high level process looks like this:

import socket

query = build_query("www.example.com", 1)

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

sock.sendto(query, ("8.8.8.8", 53))

response, _ = sock.recvfrom(1024)

For the moment let’s ignore the build_query() part, which I’m anticipating will be the bulk of the work. Instead I’m focused on the sock.sendto() line, in particular the part where we specify that we’d like to send our query to Google’s DNS server (“8.8.8.8”) on port 53.

When we use C’s version of sendto(), we will be passing to it our socket descriptor (returned when we open our socket with socket()) a buffer string (the content we’re sending – presumably something like the header and question combo alluded to above), the length of our query in bytes, and then a pointer to a sockaddr struct (or its equivalent sockaddr_in) and its size. This last part, we realized, is where we are specifying where and how we want to send our stuff – in this case, to “8.8.8.8” on port 53. Now we’re getting somewhere! And in that sense, this sockaddr_in struct seems key to unlocking this small part of the puzzle.

Here’s what the struct looks like:

struct sockaddr_in {
    short            sin_family;   
    unsigned short   sin_port;    
    struct in_addr   sin_addr;   
    char             sin_zero[8];
};

struct in_addr {
    unsigned long s_addr;          
};

This sockaddr_in structure holds an address family (in our case AF_INET), a port number, and an IPv4 address, AKA one of those things that looks like 142.250.176.14, which is four one-byte integers strung together (i.e., each of the four component parts can range from 0-255). But what’s interesting is that this sin_addr component is not some string of four integers or something; it’s a single unsigned long, which I believe is either a 4-byte or 8-byte integer. Either way it’s a big-ass number. That means we need to translate our address’s component parts to one-byte bytestrings, concatenate them together into a single 4-byte string, and then interpret that as a single integer.

For example, if we replace each one-byte integer component of

142.250.176.14

with binary, we’d get

10001110 11111010 10110000 00001110

and if we concatenate those together into a single 4-byte binary value:

10001110111110101011000000001110

which corresponds to the integer

2,398,793,742

Now, I believe there is a convenient way to tranlsate a human-readable IPv4 address to an integer and vice-versa with inet_pton(), and we very well may avail ourselves of this convenience. But why do that when we could just . . . implement it ourselves?? In which case, ultimately what we’re going to need to do is take an IPv4 address (“8.8.8.8”), parse it into four bytestrings, concatenate the result into a single 4-byte bytestring, and return it as an integer. Then we’ll have that component part of our sockaddr_in struct and, in turn, a component of what we’ll need to pass along to sendto().

And with that, I present to you a fun REPL implementation of btoi(), our friendly bitstring-to-integer parser which will handle that last step of translating a 4-byte bytestring to an unsigned, 32-bit integer.

#include <stdio.h>
#include <stdint.h>

/* convert bitstring[32] to 32-bit uint */
uint32_t btoi(char *binstr, int len)
{
    int      i;
    uint32_t result;

    result = 0;
    for (i=0; i<len; ++i)
        if (binstr[i] == '1')
            result += 1 << (len - 1 - i);

    return result;
}

int main()
{
    char c, binstr[32];
    int i = 0;

    while ((c = getchar()) != EOF)
    {
        if (c != '\n')
        {
            if (i < 32)
                // add `c` to `binstr` and increment `i` while `i` < 32 
                binstr[i++] = c;
        } else {
            // print result
            printf("-> %u\n", btoi(binstr, i));
            // reset all `binstr` elements to null
            for (; i>0; --i)
                binstr[i] = '\0';
        }
    }

    return 0;
}

You know what? I’m proud of this little function.

First because it handles strings less than and greater than 32 characters in length, no problem. Any characters in excess of 32 just never get added to the binstr array in the first place. And for strings less than 32 characters in length, the len parameter of btoi() ensures that the most significant bit is at place len (not at place 32!).

And second because I landed on what feels like a clever solution to resetting all the binstr elements to null at the same time as I reset counter i, which I accomplish by decrementing i back to 0 and setting binstr[i] to the null string at each step. That way, I ensure there’s no garbage left over in any of those memory locations.

Here it is in action:

> 10011
19
> 001
1
> 100
4
> 100110110100101010110101011110101001011011
2605364602

The next step is to do the first part of the parsing operation, namely walking through the IPv4 address, extracting each one-byte integer, translating it to a bytestring, and concatenating the results to pass to btoi(). And at that point we will be well on our way to building a sockaddr_in struct, which means we’ll be one step closer to making a connection.