7 minute read

Daily dispatches from my 12 weeks at the Recurse Center in Summer 2023

In the last installment of Building a DNS Resolver in C, I talked about writing a component of a conversion function that will eventually translate a valid IPv4 address (e.g., one of those 31.13.70.36 things) to a single 32-bit integer – the reason being that this 32-bit integer format is the format that we need to send out when we make our DNS query using sendto(). It was only a component, however, since the btoi() function we wrote just does the part where it takes a 32-character string of 0s and 1s and converts it to its 32-bit unsigned integer equivalent. So its usage would look something like this:

printf("Bitstring %s (length %d) -> %u\n", binstr, i, btoi(binstr, i));
Bistring 00011111000011010100011000100100 (length 32) -> 520963620

To perform the entire conversion starting with an IPv4 address, there are still a few things that need to happen, namely parsing the IP address, converting each one-byte number to an 8-bit bytestring, and concatenating the result (which is what finally gets passed to btoi()).

I still intend to do this, but I wanted to explore C’s inet_pton() function (part of the arpa/inet.h library), which basically just does this all for you. Here’s a few things I learned from playing with it.

Enter: inet_pton()

P-to-N means presentation-to-network. I.e., human readable to computer readable. I.e., a human friendly IPv4 address like 31.13.70.36 to some big, inscrutable number like 520963620.

The function signature looks like this:

int inet_pton(int af, const char *restrict src, void *restrict dst);

Parameters:

  • af: address family – either AF_INET for IPv4 addresses or AF_INET6 for IPv6 addresses. I’ll be keeping it simple here with IPv4 addresses
  • src: the input string (something like 31.13.70.36 for an IPv4 address)
  • dst: the destination address where the 32-bit integer conversion of the thing in src will go. That means that this destination address must be 4 bytes in size.

Returns:

  • 1 if successful, else 0

It is my understanding that most of the time when doing networking things in C one will avail oneself of some handy structs like these:

struct sockaddr_in {
    short int          sin_family;
    unsigned short int sin_port;
    struct in_addr     sin_addr;
    unsigned char      sin_zero[8];
};

struct in_addr {
    uint32_t s_addr;
};

sockaddr_in stores the address family (AF_INET or AF_NET6 – type int), the port we want to use (also type int), and the address we want to do stuff with. This last part is type struct in_addr, which we can see has a single 32-integer member called s_addr. This is what interests us here, since, normally when converting an IPv4 address to 32-bit integer using inet_pton(), you’d store the result in the 32-bit home designated by sockaddr_in.sin_addr. Something like this:

struct sockaddr_in my_sock;

inet_pton(AF_INET, "8.8.8.8", &(my_sock.sin_addr));

Tinkering with inet_pton()

I’m trying to keep things simple, so I’m just sending the result of inet_pton() to a regular old 32-bit int, like so:

uint32_t s_addr;

inet_pton(AF_INET, "8.8.8.8", &s_addr);

Same thing, just without the extra struct-bulk.

So that’s it, right?? Let’s go!

#include <stdio.h>
#include <stdint.h>
#include <sys/socket.h>
#include <arpa/inet.h>

int main()
{
    uint32_t s_addr; // 4-bytes to hold address
    char buf[16];    // Buffer to hold max IPv4 addr plus null string terminator
    char c;
    int i = 0;

    while ((c = getchar()) != EOF)
    {
        if (c == '\n') {
            buf[i] = '\0';
            if (inet_pton(AF_INET, buf, &s_addr) == 1) {
                printf("--> %u\n", s_addr);
            }
            s_addr = 0;
            for (; i>0; --i)
                buf[i] = '\0';
        } else {
            if (i < 15)
                buf[i++] = c;
        }
    }

    return 0;
}
8.8.8.8
--> 134744072

31.13.70.36
--> 608570655

172.217.14.78
--> 1309596076

This little REPL has a 16-character buffer (that the maximum length of an IPv4 address plus a null terminator), to which it adds characters building the address string. When it sees a linebreak, it passes this buf string to inet_pton() and lets it rip! Looks good, right? All finished, right??

Well. . . no.

Meanwhile, I thought it’d be fun to just whip up a quick Python script that effectively does what inet_pton() does:

def inet_pton(ip):
    # split `ip` at dot, convert each int to 8-bit bytestring, and concatenate
    binstr = "0b" + "".join(["{:08b}".format(int(int8)) for int8 in ip.split('.')])
    # convert 32-bit bytestring to int
    int32 = int(binstr, 2)

    return binstr, int32

def display_pton(ip="8.8.8.8"):
    '''
    Pretty-print results of inet_pton()
    '''
    binstr, int32 = inet_pton(ip)
    print("{:<15} -> {} ({})".format(ip, binstr, int32))

That first part of inet_pton() is not super readable, but basically it is:

  • splitting the ip argument at the dots (1.23.4.56 -> ['1', '23', '4', '56']
  • using a list comprehension to reformat each element as an 8-bit string (['00000001', '00010111', '00000100', '00111000'])
  • joining the result and concatenating to “0b” ('0b00000001000101110000010000111000) – this is the first thing this function will return
  • and turning that thing into an integer (18285624) – the second thing this function will return

So you can do something like

display_pton("8.8.8.8")
display_pton("31.13.70.36")
display_pton("172.217.14.78")

and see a result like

8.8.8.8         -> 0b00001000000010000000100000001000 (134744072)
31.13.70.36     -> 0b00011111000011010100011000100100 (520963620)
172.217.14.78   -> 0b10101100110110010000111001001110 (2899906126)

Hey, wait a second. . . the result for 8.8.8.8 is the same as the C version of inet_pton(), but the other two IPs are yielding different results. What gives?

Network Byte Order, duh

eggs
Big-endian and . . . middle-endian? Get it together, DALL-E

This one really threw me off for a bit. I thought for certain I must have some bug or maybe there was some garbage leftover at the s_addr address in my C code that was messing up the calculation or something.

Eventually I thought to convert the integer results of both my Python and C versions back to binary:

print("Python result: {} -> {}".format(520963620, bin(520963620)))
print("C result:      {} -> {}".format(608570655, bin(608570655)))
Python result: 520963620 -> 0b11111000011010100011000100100
C result:      608570655 -> 0b100100010001100000110100011111

After staring at these for a while, I realized that, once you account for a few missing leading zeroes, the same bytes are occuring in each, but just in opposite orders!

python version: 00011111 00001101 01000110 00100100
                    A        B        C        D

c version:      00100100 01000110 00001101 00011111
                    D        C        B        A

Okay, interesting! Suddenly I’m remembering seeing, and skipping over, stuff about byte order, which has to be the issue here. Only problem is, who’s in the right? C or Python?

I thought my Python script had to be the culprit since it was the one I was responsible for. I figured that I must have put the bytes in the wrong order is all. They have to go in Network Byte Order! Which is (some googling revealed) Most Significant Byte first. Obviously!

But that’s . . . what I did, wasn’t it? In the example above (31.13.70.36), the first element (31) became the first – which is to say left-most, which is to say most significant – byte in my 32-bit string, right? So that means inet_ptoi() has it twisted.

After some further perusing of Beej’s guide, I discovered the problem. Networks like stuff in, you guessed it, Network Byte Order. That’s big-endian style. But computers are divided on the subject, apparently. Intel machines like lil’ endian, and non-intel machines like big-endian, and, well, I’m on an intel Mac. So it seems that what’s happening here is inet_pton() is doing its job of taking an IPv4 address and converting it to a 4-byte integer; it’s just that my computer is storing those four bytes in its own, slightly eccentric way, which is not the way that networks do it.

However, there’s a handy suite of functions that will happily do this conversion back and forth:

  • htons() - host to network short
  • htonl() - host to network long
  • ntohs() - network to host short
  • ntohl() - network to host long

We need to go from host to network, and we want the long here, since we’re working with 4-byte integers. Here’s the revised code:

#include <stdio.h>
#include <stdint.h>
#include <sys/socket.h>
#include <arpa/inet.h>

int main()
{
    uint32_t s_addr; // 4-bytes to hold address
    char buf[16];    // Buffer to hold max IPv4 addr plus null string terminator
    char c;
    int i = 0;

    while ((c = getchar()) != EOF)
    {
        if (c == '\n') {
            buf[i] = '\0';
            if (inet_pton(AF_INET, buf, &s_addr) == 1) {
                printf("--> Host Byte Order:    %u\n", s_addr);
                printf("--> Network Byte Order: %u\n\n", htonl(s_addr));
            }
            s_addr = 0;
            for (; i>0; --i)
                buf[i] = '\0';
        } else {
            if (i < 15)
                buf[i++] = c;
        }
    }

    return 0;
}

And here’s a sample REPL session so we can see it working:

8.8.8.8
--> Host Byte Order:    134744072
--> Network Byte Order: 134744072

31.13.70.36
--> Host Byte Order:    608570655
--> Network Byte Order: 520963620

172.217.14.78
--> Host Byte Order:    1309596076
--> Network Byte Order: 2899906126

Of course the reason “8.8.8.8” had been working before was that it’s palindromic! So, big-endian, little-endian – it’s all the same. But for the rest, we can now clearly see the discrepancy.