RC41. Building a DNS Resolver in C, Part 3: Implementing a toy version of inet_pton()
Daily dispatches from my 12 weeks at the Recurse Center in Summer 2023
Last time on stumbling through building a DNS resolver in C, I messed around with the nifty function inet_pton()
, which takes IPv4 and IPv6 addresses and converts them from Presentation to Network format, which is to say from a human-friendly format like 31.13.70.36
to a machine-friendly 32-bit integer like 520963620
. Along the way we discovered the hazards of endian-ness and learned our lesson: never, ever don’t convert numbers from host-to-network byte order and back!
That goes for port numbers, too, by the way. Want to make a DNS query on port 53? Don’t forget to first convert it with htons(53)
! Otherwise, if you’re on an Intel machine, you’ll end up querying port 13563. Why? 53 in hex is 0x35, so you’d think a two-byte integer would render that value as 0x0035. And you’d be right if you’re thinking the way networks think! But Intel machines are eccentric and little-endian, and they would store that integer as 0x3500 (least significant byte first), which would be interpreted as 13563 by big-endian machines and networks. So, do your conversions, folks!
One of the things I did since then, in any case, is flesh out a toy version of inet_pton()
, which does the full conversion from an IPv4 address to a 32-bit integer in a REPL-style interface.
The little program isn’t that complicated, but, well, it’s in C, so the whole thing felt like a bunch of LeetCode problems. (That’s a compliment.)
Here’s the full code:
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
int main()
{
char c, chunk[4];
int i, chunk_as_i, chunks_count;
uint32_t addr;
i = 0;
addr = 0;
chunks_count = 0;
while ((c = getchar()) != EOF)
{
/* convert chunk to int if we reach a `.` or `\n` */
if (c == '.' || c == '\n') {
chunk[i] = '\0'; // append null to chunk
addr <<= 8; // shift addr 8 bits left
chunk_as_i = atoi(chunk); // convert current chunk to int
if (chunk_as_i > 255) { // exit on invalid chunk
printf("Invalid IPv4 address component: %s\n", chunk);
return 1;
}
addr |= atoi(chunk); // add chunk to addr
// reset chunk
for (; i>0; --i) { chunk[i] = '\0'; }
// inc chunks_count
chunks_count++;
// output addr and reset on newline
if (c == '\n') {
if (chunks_count == 4) {
printf("%u\n", addr);
} else {
printf("Unexpected IPv4 address format ");
for (int j=1; j<chunks_count; ++j)
printf("x.");
printf("x\n");
return 1;
}
chunks_count = 0;
addr = 0;
}
/* else append char to chunk if numeric */
} else if (i < 3 && c >= '0' && c <= '9'){
chunk[i++] = c;
}
}
return 0;
}
There’s plenty of room for refactoring here, and probably error-detection and handling could be better, and, well, probably a lot could be better. But there are a few things that I think are especially neat that I’m psyched about.
The first is the decision not to read in an entire line at a time (i.e., up until a newline is detected), but to read in what I’m calling here a chunk at a time, meaning the parts of the IPv4 address between dots. So I’ve got chunk
, a little array of chars of length 4 (the extra space being for a newline character so that I can easily treat the variable like a string), and as I traverse the input stream, I add the numeric characters I encounter to consecutive indexes. When I encounter a dot or a newline character, though, it’s time to rock and roll.
If it’s a dot, then that means we have a chunk to deal with, which leads us to my second innovation (which actaully I borrowed from a helpful recurser who was perplexed by my previous decision to work with bitstrings representing this 32-bit integer): the bitwise approach, of course! We shift addr
, our output variable, 8 bits left and OR
it with the integer value of the current chunk (once we ensure that it’s a valid one-byte value, that is). Then we reset chunk
and our incrementer in one fell swoop, and continue along our input stream.
That just leaves the output. If we encounter a newline character, that means (in theory) that we’re finished and it’s time to print out addr
. Once we make sure that there are, indeed, 4 chunks as expected, then, yeah, let ‘er rip! Otherwise, something ain’t right. ABORT PROGRAM.
And that, ladies and gentlemen, is what reinventing the wheel, in a much worse and more brittle way, looks like.