<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.3">Jekyll</generator><link href="https://www.datadoodad.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.datadoodad.com/" rel="alternate" type="text/html" /><updated>2023-11-02T13:38:41-07:00</updated><id>https://www.datadoodad.com/feed.xml</id><title type="html">DATADOODAD</title><subtitle>Data science and web development project portfolio</subtitle><author><name>Zach Rottman, PhD</name></author><entry><title type="html">On Adding Executables to PATH and Becoming All-Powerful</title><link href="https://www.datadoodad.com/adding-to-path/" rel="alternate" type="text/html" title="On Adding Executables to PATH and Becoming All-Powerful" /><published>2023-09-16T00:00:00-07:00</published><updated>2023-09-16T00:00:00-07:00</updated><id>https://www.datadoodad.com/adding-to-path</id><content type="html" xml:base="https://www.datadoodad.com/adding-to-path/">&lt;p&gt;For the past few months you’ve been learning C and writing toy programs. But you’re getting sick of having to include their relative paths when you want to execute them.&lt;/p&gt;

&lt;!--more--&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;~/path/to/my/program $ ./hi
Hello, handsome genius.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;What a hassle for a quick compliment!&lt;/p&gt;

&lt;p&gt;Wouldn’t it be cool if you could indulge your vanity just as quickly and easily as using any of the familiar utilities on which you rely? Things like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;wc&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dig&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;Thanks to your C journey, you have the vague but probably correct notion that those classic utilities are also written in C and that, moreover, the compiled executables are in all likelihood just stored somewhere special. So, you think, perhaps it’s possible to write your own custom utilities, if only you knew how to make them executable from anywhere in the shell.&lt;/p&gt;

&lt;p&gt;Turns out it’s super easy and high-reward, and you feel like the handsome genius you knew you always were even without the computer telling you so.&lt;/p&gt;

&lt;p&gt;Here are the steps you took along the way:&lt;/p&gt;

&lt;h3 id=&quot;1-compile-your-c-program&quot;&gt;1. Compile your C program.&lt;/h3&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;~/path/to/my/program $ cc hi.c -o hi
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;2-make-your-special-utils-directory&quot;&gt;2. Make your special utils directory&lt;/h3&gt;
&lt;p&gt;Make a directory where you’ll put your custom utility (apparently &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;~/bin&lt;/code&gt; is a common place to put these, so that’s what you do, but there are plenty of other conventions and options), and put your executable there.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;~/path/to/my/program $ mkdir ~/bin
~/path/to/my/program $ mv hi ~/bin
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;3-update-path&quot;&gt;3. Update PATH&lt;/h3&gt;
&lt;p&gt;Add your new &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;~/bin&lt;/code&gt; directory to your PATH variable, which specifies where the shell should look for commands. You can do this for the duration of your session with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;export PATH=$PATH&lt;/code&gt;, or you can add a line like this to your shell config file (i.e., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.bash_rc&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.bash_profile&lt;/code&gt;, etc.). If you do the latter, reload that config file with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;source ~/path/to/.config_file&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Option 1: Add to PATH for the sesh&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;~/path/to/my/program $ export PATH=&quot;$PATH:$HOME/bin&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Option 2: Permanently add to PATH&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;~/path/to/my/program $ echo 'export PATH=&quot;$PATH:$HOME/bin&quot;' &amp;gt;&amp;gt; ~/.bash_profile
~/path/to/my/program $ source ~/.bash_profile
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;4-call-your-utility&quot;&gt;4. Call your utility.&lt;/h3&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; hi
Hello, handsome genius.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;5-revel-in-your-power&quot;&gt;5. Revel in your power.&lt;/h3&gt;
&lt;p&gt;You are a genius.&lt;/p&gt;</content><author><name>Zach Rottman, PhD</name></author><category term="C" /><category term="Linux" /><category term="Unix" /><summary type="html">For the past few months you’ve been learning C and writing toy programs. But you’re getting sick of having to include their relative paths when you want to execute them.</summary></entry><entry><title type="html">Type Punning</title><link href="https://www.datadoodad.com/recurse%20center/typepunning/" rel="alternate" type="text/html" title="Type Punning" /><published>2023-08-10T00:00:00-07:00</published><updated>2023-08-10T00:00:00-07:00</updated><id>https://www.datadoodad.com/recurse%20center/typepunning</id><content type="html" xml:base="https://www.datadoodad.com/recurse%20center/typepunning/">&lt;p&gt;Here are 32 bits, or 4 bytes. Seems like a pretty big number. And it is . . . sorta. Depends on how you look at it.&lt;/p&gt;

&lt;style type=&quot;text/css&quot;&gt;
  table, tr, td {
    border: none;
    text-align: center;
    padding: 0;
  }

  .array td {
    border: 1px solid black;
    padding: 0;
    margin: 0;
    width: 20px;
    height: 25px;
  }

  .bitsPlace { color: gray; }

  .one { background: #d3f6db; }
  .zero { background: #92d5e6; }

&lt;/style&gt;

&lt;table&gt;
  &lt;tr class=&quot;array&quot;&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;

    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;

    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;

    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;zero&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;one&quot;&gt;1&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr class=&quot;bitsPlace&quot;&gt;
    &lt;td&gt;31&lt;/td&gt;
    &lt;td&gt;30&lt;/td&gt;
    &lt;td&gt;29&lt;/td&gt;
    &lt;td&gt;28&lt;/td&gt;
    &lt;td&gt;27&lt;/td&gt;
    &lt;td&gt;26&lt;/td&gt;
    &lt;td&gt;25&lt;/td&gt;
    &lt;td&gt;24&lt;/td&gt;
    &lt;td&gt;23&lt;/td&gt;
    &lt;td&gt;22&lt;/td&gt;
    &lt;td&gt;21&lt;/td&gt;
    &lt;td&gt;20&lt;/td&gt;
    &lt;td&gt;19&lt;/td&gt;
    &lt;td&gt;18&lt;/td&gt;
    &lt;td&gt;17&lt;/td&gt;
    &lt;td&gt;16&lt;/td&gt;
    &lt;td&gt;15&lt;/td&gt;
    &lt;td&gt;14&lt;/td&gt;
    &lt;td&gt;13&lt;/td&gt;
    &lt;td&gt;12&lt;/td&gt;
    &lt;td&gt;11&lt;/td&gt;
    &lt;td&gt;10&lt;/td&gt;
    &lt;td&gt;9&lt;/td&gt;
    &lt;td&gt;8&lt;/td&gt;
    &lt;td&gt;7&lt;/td&gt;
    &lt;td&gt;6&lt;/td&gt;
    &lt;td&gt;5&lt;/td&gt;
    &lt;td&gt;4&lt;/td&gt;
    &lt;td&gt;3&lt;/td&gt;
    &lt;td&gt;2&lt;/td&gt;
    &lt;td&gt;1&lt;/td&gt;
    &lt;td&gt;0&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;One way to look at it is as an 32-bit integer, in which case this number is, indeed, on the large side – 1,078,530,011.&lt;/p&gt;

&lt;p&gt;Another way to look at it is as a float, which also takes up 4 bytes of space, but which is represented in memory very differently. As a float, the same sequence of bytes represents 3.141593.&lt;/p&gt;

&lt;p&gt;Now for the fun part: let’s say that you want to store a value in memory as one type, but that you sometimes want to interpret that same value as another type entirely. Enter . . .&lt;/p&gt;

&lt;h1 id=&quot;type-punning&quot;&gt;Type Punning&lt;/h1&gt;

&lt;p&gt;Before getting into why one would want to do something insane like this, it bears mentioning that doing this is generally frowned upon, since it is non-portable and can lead to unexpected behavior. Some computers and architectures are big-endian, some little-endian; some represent floats one way, others another way. So, while on my machine the underlying bits representing the integer 1,078,530,011 also represent the floating-point decimal 3.141593, the same may not be true on your machine – all depends on how types are represented.&lt;/p&gt;

&lt;p&gt;As a result, there are a number of anti-aliasing rules that put in place some guardrails to generally discourage this kind of thing.&lt;/p&gt;

&lt;p&gt;The C17 standard lays out the following guidelines in &lt;strong&gt;§6.5 Expressions&lt;/strong&gt;:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object
- an aggragate or union type that includes one of the aforementioned types among its members, or
- a character type.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In other words, a variable typed &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int&lt;/code&gt; can be accessed as a qualified version of that type (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;const int&lt;/code&gt;) or a signed/unsigned type corresponding to its type (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unsigned int&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;signed int&lt;/code&gt;) or both. Or it can be accessed as a character type. Or as a union type (more on this soon). But a value in memory typed as an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int&lt;/code&gt; cannot be accessed as a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;float&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Well, it sort of can, if you try really hard. It’s just that, like I said, it can lead to unexpected and weird behaviour. Here’s one way you can force it:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_int&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1078530011&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;my_float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;my_int: %u&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;     &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; 1078530011&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prtinf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;my_float: %f&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; 3.141593&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here we’re basically saying: get the address of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;my_int&lt;/code&gt;, recast that address as a float pointer, and dereference the result.&lt;/p&gt;

&lt;p&gt;But if you’re really insistent on type punning, there’s a better and more above-board way of doing it, which also happens to obey the C17 rules above. We can use a union:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;union&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;type_pun&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;union&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;type_pun&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1078530011&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;u.i: %u&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; 1078530011&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;u.f: %f&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; 3.141593&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The reason this works is because all the members of the union share the same hunk of memory, so the integer &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;u.i&lt;/code&gt; and the floating-point number &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;u.f&lt;/code&gt; are both represented by the same underlying bytes. It’s just a matter of whether we access those bytes through the integer-typed member or the float-typed member.&lt;/p&gt;

&lt;p&gt;That said, there are still good reasons &lt;em&gt;not&lt;/em&gt; to do this – and those reasons are the same as before: type punning like this is non-portable and can lead to inconsistent results that are machine- and architecture-dependent. But if you really wanna pun, then, by god, pun!&lt;/p&gt;

&lt;p&gt;So . . . why would anyone ever want to do something like this?&lt;/p&gt;

&lt;h1 id=&quot;application-1-nan-boxing&quot;&gt;Application 1: NaN-Boxing&lt;/h1&gt;

&lt;p&gt;One application is nan-boxing, where you use a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;double&lt;/code&gt; (a 64-bit floating point number) either literally as a double or, in the event that it’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;not a number&lt;/code&gt;, as a vessel for carrying a payload.&lt;/p&gt;

&lt;p&gt;A double consists of three parts:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;a sign bit (bit 63)&lt;/li&gt;
  &lt;li&gt;11 bits representing the exponent (bits 52-62)&lt;/li&gt;
  &lt;li&gt;52 bits representing the mantissa, or fraction (bits 0-51)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To represent &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nan&lt;/code&gt; (not a number), all 11 exponent bits are set to 1. So the idea is that we can use a double encoded as a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nan&lt;/code&gt; to carry 52 bits of information in its mantissa – it’s just a matter of accessing them.&lt;/p&gt;

&lt;p&gt;And that’s where type punning comes in: To access those mantissa bits and extract the payload, we’d have to access that double as a 64-bit integer so we can do the appropriate bitwise mask.&lt;/p&gt;

&lt;h1 id=&quot;application-2-type-agnostic-data-structures&quot;&gt;Application 2: Type Agnostic Data Structures&lt;/h1&gt;

&lt;p&gt;Another application I stumbled upon more recently is creating type-agnostic data structures. For instance, let’s say I want to create a linked list but that I want the nodes comprising that linked list to store either integers or floats or strings. Here’s a punny approach involving what’s called tagged unions:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;
#include &amp;lt;string.h&amp;gt;
&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data_type&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;union&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;   &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;variable_data&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;union&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data_type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;variable_data&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cargo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;          &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data_type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;switch&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cargo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strlen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cargo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cargo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cargo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cargo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;       &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;print_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;switch&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cargo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;%s&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cargo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;%d&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cargo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;%f&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cargo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;12345&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int_node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;INT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;print_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; 12345&lt;/span&gt;

    &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_float&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1415&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float_node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;print_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; 3.1415&lt;/span&gt;

    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;hello, world&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string_node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;print_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;string_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; hello, world&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now I can link all these nodes together in a linked list, I can iterate through them and do stuff, etc.&lt;/p&gt;</content><author><name>Zach Rottman, PhD</name></author><category term="Recurse Center" /><category term="C" /><category term="type punning" /><summary type="html">Here are 32 bits, or 4 bytes. Seems like a pretty big number. And it is . . . sorta. Depends on how you look at it. 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0 1 1 0 1 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 One way to look at it is as an 32-bit integer, in which case this number is, indeed, on the large side – 1,078,530,011. Another way to look at it is as a float, which also takes up 4 bytes of space, but which is represented in memory very differently. As a float, the same sequence of bytes represents 3.141593. Now for the fun part: let’s say that you want to store a value in memory as one type, but that you sometimes want to interpret that same value as another type entirely. Enter . . . Type Punning Before getting into why one would want to do something insane like this, it bears mentioning that doing this is generally frowned upon, since it is non-portable and can lead to unexpected behavior. Some computers and architectures are big-endian, some little-endian; some represent floats one way, others another way. So, while on my machine the underlying bits representing the integer 1,078,530,011 also represent the floating-point decimal 3.141593, the same may not be true on your machine – all depends on how types are represented. As a result, there are a number of anti-aliasing rules that put in place some guardrails to generally discourage this kind of thing. The C17 standard lays out the following guidelines in §6.5 Expressions: An object shall have its stored value accessed only by an lvalue expression that has one of the following types: - a type compatible with the effective type of the object, - a qualified version of a type compatible with the effective type of the object, - a type that is the signed or unsigned type corresponding to the effective type of the object, - a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object - an aggragate or union type that includes one of the aforementioned types among its members, or - a character type. In other words, a variable typed int can be accessed as a qualified version of that type (e.g., const int) or a signed/unsigned type corresponding to its type (e.g., unsigned int or signed int) or both. Or it can be accessed as a character type. Or as a union type (more on this soon). But a value in memory typed as an int cannot be accessed as a float. Well, it sort of can, if you try really hard. It’s just that, like I said, it can lead to unexpected and weird behaviour. Here’s one way you can force it: uint32_t my_int = 1078530011; float my_float = *(float*)&amp;amp;my_int; printf(&quot;my_int: %u\n&quot;, my_int); // =&amp;gt; 1078530011 prtinf(&quot;my_float: %f\n&quot;, my_float); // =&amp;gt; 3.141593 Here we’re basically saying: get the address of my_int, recast that address as a float pointer, and dereference the result. But if you’re really insistent on type punning, there’s a better and more above-board way of doing it, which also happens to obey the C17 rules above. We can use a union: union type_pun { uint32_t i; float f; }; union type_pun u; u.i = 1078530011; printf(&quot;u.i: %u\n&quot;, u.i); // =&amp;gt; 1078530011 printf(&quot;u.f: %f\n&quot;, u.f); // =&amp;gt; 3.141593 The reason this works is because all the members of the union share the same hunk of memory, so the integer u.i and the floating-point number u.f are both represented by the same underlying bytes. It’s just a matter of whether we access those bytes through the integer-typed member or the float-typed member. That said, there are still good reasons not to do this – and those reasons are the same as before: type punning like this is non-portable and can lead to inconsistent results that are machine- and architecture-dependent. But if you really wanna pun, then, by god, pun! So . . . why would anyone ever want to do something like this? Application 1: NaN-Boxing One application is nan-boxing, where you use a double (a 64-bit floating point number) either literally as a double or, in the event that it’s not a number, as a vessel for carrying a payload. A double consists of three parts: a sign bit (bit 63) 11 bits representing the exponent (bits 52-62) 52 bits representing the mantissa, or fraction (bits 0-51) To represent nan (not a number), all 11 exponent bits are set to 1. So the idea is that we can use a double encoded as a nan to carry 52 bits of information in its mantissa – it’s just a matter of accessing them. And that’s where type punning comes in: To access those mantissa bits and extract the payload, we’d have to access that double as a 64-bit integer so we can do the appropriate bitwise mask. Application 2: Type Agnostic Data Structures Another application I stumbled upon more recently is creating type-agnostic data structures. For instance, let’s say I want to create a linked list but that I want the nodes comprising that linked list to store either integers or floats or strings. Here’s a punny approach involving what’s called tagged unions: #include &amp;lt;stdio.h&amp;gt; #include &amp;lt;stdlib.h&amp;gt; #include &amp;lt;string.h&amp;gt; enum data_type { STRING, INT, FLOAT }; union value { int i; float f; char *s; }; struct variable_data { union value val; enum data_type type; }; struct Node { struct variable_data cargo; struct Node *next; }; struct Node* create_node(enum data_type type, void *val) { struct Node *n = malloc(sizeof(struct Node)); switch (type) { case STRING: n-&amp;gt;cargo.val.s = (char*)malloc(strlen((char*)val) + 1); strcpy(n-&amp;gt;cargo.val.s, (char*)val); break; case INT: n-&amp;gt;cargo.val.i = *(int*)val; break; case FLOAT: n-&amp;gt;cargo.val.f = *(float*)val; break; } n-&amp;gt;cargo.type = type; n-&amp;gt;next = NULL; return n; } void print_node(struct Node* n) { switch (n-&amp;gt;cargo.type) { case STRING: printf(&quot;%s\n&quot;, n-&amp;gt;cargo.val.s); break; case INT: printf(&quot;%d\n&quot;, n-&amp;gt;cargo.val.i); break; case FLOAT: printf(&quot;%f\n&quot;, n-&amp;gt;cargo.val.f); break; } } int main() { int my_int = 12345; struct Node *int_node = create_node(INT, &amp;amp;my_int); print_node(int_node); // =&amp;gt; 12345 float my_float = 3.1415; struct Node *float_node = create_node(FLOAT, &amp;amp;my_float); print_node(float_node); // =&amp;gt; 3.1415 char *my_str = &quot;hello, world&quot;; struct Node *string_node = create_node(STRING, my_str); print_node(string_node); // =&amp;gt; hello, world return 0; } Now I can link all these nodes together in a linked list, I can iterate through them and do stuff, etc.</summary></entry><entry><title type="html">String Chaos and Compiler Sleight-of-Hand</title><link href="https://www.datadoodad.com/recurse%20center/stringchaos/" rel="alternate" type="text/html" title="String Chaos and Compiler Sleight-of-Hand" /><published>2023-07-17T00:00:00-07:00</published><updated>2023-07-17T00:00:00-07:00</updated><id>https://www.datadoodad.com/recurse%20center/stringchaos</id><content type="html" xml:base="https://www.datadoodad.com/recurse%20center/stringchaos/">&lt;p&gt;We need to talk about strings.&lt;/p&gt;

&lt;p&gt;I’m working on porting my Python assembler to C, and I’m getting to the harder stuff. As I wrote about &lt;a href=&quot;https://www.datadoodad.com/recurse%20center/Assembler-Part-2/&quot;&gt;in my last post&lt;/a&gt; I settled on parallel arrays to deal with lookups for various assembly commands. Now I need to start thinking about how to support symbols.&lt;/p&gt;

&lt;p&gt;Symbols are a slightly different beast because, unlike the finite list of assembly commands, symbols can be arbitrary in number. That means I need some sort of symbols table that can grow every time the assembler discovers a new &lt;strong&gt;label&lt;/strong&gt; (a line like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(JUMPHERE)&lt;/code&gt;) or a &lt;strong&gt;symbol&lt;/strong&gt; (a human-friendly name for an arbitrary space in RAM, like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@my_var&lt;/code&gt;). As with the lookup tables from last time, the obvious choice is a hashmap, which can do inserts and lookups in constant time, but I’m leaving hashmaps for another day. So the way I see it I have two options:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;parallel arrays&lt;/strong&gt; (again). These have the advantage of being easy to use but the disadvantage of being static in size, which means I either have to allocate enough space to safely handle what I determine to be a reasonable number of symbols (probably not a great idea, and also inefficient for small programs with few labels) or include functionality to reallocate larger arrays in the event that the symbols outgrow the array initially allocated for them (. . .nah)&lt;/li&gt;
  &lt;li&gt;a &lt;strong&gt;linked list&lt;/strong&gt;, with each node holding a key/value pair. Linked lists also have the advantage of being relatively easy to implement, and, like the parallel arrays solution, have linear time commplexity. However, unlike arrays, they’re dynamic, which means they need only take up as much space as the number of symbols we encounter – nothing more, nothing less.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Linked list it is! My decision made, I turned towards implementation.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;        &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;          &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

 &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;insert_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;new_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
     &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;head&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 
     &lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;new_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So far, so good, right? *brushes dirt off hands*&lt;/p&gt;

&lt;p&gt;Just need a little search function specific to this assembler: it’ll perform a linear search for a given &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;key&lt;/code&gt; string and return the matching &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;value&lt;/code&gt; if it finds it. If not, it’ll insert a new node into the list with that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;key&lt;/code&gt; and assign it a unique integer value (the arbitrary memory address that that symbol key will be associated with).&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;insert_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;target&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s say, then, we have the following linked list &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;key/value&lt;/code&gt; pairs (an abridged version of how the assmbler’s symbols table will be initialized):&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SP/0 -&amp;gt; LCL/1 -&amp;gt; ARG/2 -&amp;gt; THIS/3 -&amp;gt; THAT/4 -&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we can search for things:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;// default value for symbol additions&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;res&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;                &lt;span class=&quot;c1&quot;&gt;// search result&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;ARG&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;     &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; 2&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;my_var&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; 16&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;x&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;       &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; 17&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;res&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;THAT&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; 4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And now the linked list looks like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SP/0 -&amp;gt; LCL/1 -&amp;gt; ARG/2 -&amp;gt; THIS/3 -&amp;gt; THAT/4 -&amp;gt; my_var/16 -&amp;gt; x/17 -&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It works! But it probably shouldn’t. Or, rather, it works because the compiler is doing a little legerdemain that obscures the reasons why this solution is not excellent.&lt;/p&gt;

&lt;h1 id=&quot;string-chaos&quot;&gt;String chaos&lt;/h1&gt;

&lt;figure&gt;
&lt;img src=&quot;/assets/images/2023-07-17/chaos-reigns.gif&quot; alt=&quot;chaos reigns&quot; width=&quot;100%;&quot; /&gt;
&lt;figcaption align=&quot;center&quot;&gt;I'm pretty sure the compiler is the fox in this metaphor&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;The problem lies in my naive approach to comparing strings: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cur-&amp;gt;key == target&lt;/code&gt;. This is not doing what I think it’s doing, that is, comparing the string of characters to which each pointer is pointing. No, what it’s doing is comparing the &lt;em&gt;pointers&lt;/em&gt;, full stop. Which, come to think of it, seems like it shouldn’t even work in the first place, since how could the string “ARG” that I’m passing into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;search()&lt;/code&gt; have the same address as the the node whose &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;key&lt;/code&gt; is “ARG”? And yet, if we printed the addresses of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cur-&amp;gt;key&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;target&lt;/code&gt; from within the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;search()&lt;/code&gt; function, it turns out that they’re the same!&lt;/p&gt;

&lt;p&gt;Here’s a more concise illustration of this unexpected behaviour:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol_1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;my_var&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol_2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;my_var&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol_1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol_1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol_2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;symbol_1 == symbol_2&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// this gets printed&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;symbol_1 != symbol_2&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;symbol_1 address: %p&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol_1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;symbol_2 address: %p&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol_2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;my_node-&amp;gt;key:     %p&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The above outputs:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;symbol_1 == symbol_2
symbol_1 address: 0x10d56feb8
symbol_2 address: 0x10d56feb8
my_node-&amp;gt;key:     0x10d56feb8
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That’s counterintuitive! After all, we’re creating two separate char pointers, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;symbol_1&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;symbol_2&lt;/code&gt;, so intuitively &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;symbol_1 != symbol_2&lt;/code&gt; (since they ought to be pointing to different places in memorry), and yet they share the same address along with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;my_node-&amp;gt;key&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Turns out there are a few things going on here. The first is called string interning, where the compiler notices that we’re initializing two char pointers with the same string. Rather than putting that same string in memory twice, the compiler does some optimizations and just points them both to the same sequence of characters. That’s why inspecting the addresses of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;symbol_1&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;symbol_2&lt;/code&gt; reveals them to be pointing at precisely the same location in memory.&lt;/p&gt;

&lt;p&gt;The second issue has to do with building the Node struct. When we set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;node-&amp;gt;key = key&lt;/code&gt;, we are effectively assigning one char pointer (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;node.key&lt;/code&gt;) the value of another char pointer (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;key&lt;/code&gt;), which means that both are pointing to the same spot in memory.&lt;/p&gt;

&lt;p&gt;Here is a revised version copies the contents of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;key&lt;/code&gt; string to newly-allocated space in memory and performs string comparisons not by comparing their pointers (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;str_1 == str_2&lt;/code&gt;) but by comparing their contents, character-for-character (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strcmp(str_1, str_2)&lt;/code&gt;):&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;        &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;          &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strlen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strcmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;insert_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;target&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;create_node()&lt;/code&gt; function, we first allocate space on the heap for a new Node struct (20 bytes in my estimation – two 8-byte pointers and a 4-byte int). Then we allocate enough space starting where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;node-&amp;gt;key&lt;/code&gt; is pointing to accommodate the string that has to fit there (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strlen(key) + 1&lt;/code&gt;). Finally we can stick the contents of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;key&lt;/code&gt; parameter there using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strcpy()&lt;/code&gt;. Now, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;search()&lt;/code&gt; checks for equality using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cur-&amp;gt;key == target&lt;/code&gt;, this comparison will be false even if the contents of each char pointer are the same, since each is pointing to a separate place in memory, which is to say that the first and second pointers are not the same. Instead, we check for equality using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strcmp&lt;/code&gt; (which returns 0 when the contents of each string are identical).&lt;/p&gt;

&lt;h1 id=&quot;for-my-final-trick---&quot;&gt;For my final trick . . .&lt;/h1&gt;

&lt;p&gt;Here’s my full linked list implementation, which I’ve fleshed out with wrapper struct &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LinkedList&lt;/code&gt; to hold on to references to the head and tail of the linked list.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;
#include &amp;lt;string.h&amp;gt;
&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;        &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;          &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LinkedList&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tail&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;   &lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LinkedList&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strlen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;LinkedList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;create_linked_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;LinkedList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;malloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LinkedList&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tail&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;print_linked_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LinkedList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;%s/%d -&amp;gt; &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LinkedList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;new_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tail&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;new_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tail&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;new_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tail&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tail&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LinkedList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target_key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strcmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target_key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;target_key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;delete_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LinkedList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target_key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;cm&quot;&gt;/* doesn't yet handle deletion of head node */&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linkedlist&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strcmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target_key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;tmp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;free&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;next&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;           &lt;span class=&quot;c1&quot;&gt;// return 0 if found&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;                  &lt;span class=&quot;c1&quot;&gt;// return -1 if not found&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;       &lt;span class=&quot;c1&quot;&gt;// starting default val for inserts&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// create empty linked list&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;LinkedList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;create_linked_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// add something&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;create_node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;my_key&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// search for something that doesn't exist&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;your_key&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;default_val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;//  =&amp;gt; returns 16&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// print&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;print_linked_list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// =&amp;gt; my_key/100 -&amp;gt; your_key/16 -&amp;gt;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content><author><name>Zach Rottman, PhD</name></author><category term="Recurse Center" /><category term="C" /><category term="Building an Assembler in C" /><summary type="html">We need to talk about strings. I’m working on porting my Python assembler to C, and I’m getting to the harder stuff. As I wrote about in my last post I settled on parallel arrays to deal with lookups for various assembly commands. Now I need to start thinking about how to support symbols. Symbols are a slightly different beast because, unlike the finite list of assembly commands, symbols can be arbitrary in number. That means I need some sort of symbols table that can grow every time the assembler discovers a new label (a line like (JUMPHERE)) or a symbol (a human-friendly name for an arbitrary space in RAM, like @my_var). As with the lookup tables from last time, the obvious choice is a hashmap, which can do inserts and lookups in constant time, but I’m leaving hashmaps for another day. So the way I see it I have two options: parallel arrays (again). These have the advantage of being easy to use but the disadvantage of being static in size, which means I either have to allocate enough space to safely handle what I determine to be a reasonable number of symbols (probably not a great idea, and also inefficient for small programs with few labels) or include functionality to reallocate larger arrays in the event that the symbols outgrow the array initially allocated for them (. . .nah) a linked list, with each node holding a key/value pair. Linked lists also have the advantage of being relatively easy to implement, and, like the parallel arrays solution, have linear time commplexity. However, unlike arrays, they’re dynamic, which means they need only take up as much space as the number of symbols we encounter – nothing more, nothing less. Linked list it is! My decision made, I turned towards implementation. typedef struct Node { char* key; int value; struct Node* next; } Node; Node* create_node(char* key, int val) { Node* node = malloc(sizeof(Node)); node-&amp;gt;key = key; node-&amp;gt;value = val; node-&amp;gt;next = NULL; return node; } void insert_node(Node* head, Node* new_node) { while (head-&amp;gt;next != NULL) head = head-&amp;gt;next; head-&amp;gt;next = new_node; } So far, so good, right? *brushes dirt off hands* Just need a little search function specific to this assembler: it’ll perform a linear search for a given key string and return the matching value if it finds it. If not, it’ll insert a new node into the list with that key and assign it a unique integer value (the arbitrary memory address that that symbol key will be associated with). int search(Node* head, char* target, int* default_val) { for (Node* cur = head; cur != NULL; cur = cur-&amp;gt;next) { if(cur-&amp;gt;key == target) return cur-&amp;gt;value; } insert_node(head, create_node(target, *default_val)); return (*default_val)++; } Let’s say, then, we have the following linked list key/value pairs (an abridged version of how the assmbler’s symbols table will be initialized): SP/0 -&amp;gt; LCL/1 -&amp;gt; ARG/2 -&amp;gt; THIS/3 -&amp;gt; THAT/4 -&amp;gt; Now we can search for things: int default_val = 16; // default value for symbol additions int res; // search result res = search(symbols, &quot;ARG&quot;, &amp;amp;default_val); // =&amp;gt; 2 res = search(symbols, &quot;my_var&quot;, &amp;amp;default_val); // =&amp;gt; 16 res = search(symbols, &quot;x&quot;, &amp;amp;default_val); // =&amp;gt; 17 res = search(symbols, &quot;THAT&quot;, &amp;amp;default_val); // =&amp;gt; 4 And now the linked list looks like this: SP/0 -&amp;gt; LCL/1 -&amp;gt; ARG/2 -&amp;gt; THIS/3 -&amp;gt; THAT/4 -&amp;gt; my_var/16 -&amp;gt; x/17 -&amp;gt; It works! But it probably shouldn’t. Or, rather, it works because the compiler is doing a little legerdemain that obscures the reasons why this solution is not excellent. String chaos I'm pretty sure the compiler is the fox in this metaphor The problem lies in my naive approach to comparing strings: cur-&amp;gt;key == target. This is not doing what I think it’s doing, that is, comparing the string of characters to which each pointer is pointing. No, what it’s doing is comparing the pointers, full stop. Which, come to think of it, seems like it shouldn’t even work in the first place, since how could the string “ARG” that I’m passing into search() have the same address as the the node whose key is “ARG”? And yet, if we printed the addresses of cur-&amp;gt;key and target from within the search() function, it turns out that they’re the same! Here’s a more concise illustration of this unexpected behaviour: char *symbol_1 = &quot;my_var&quot;; char *symbol_2 = &quot;my_var&quot;; Node* my_node = create_node(symbol_1, 10); if (symbol_1 == symbol_2) printf(&quot;symbol_1 == symbol_2\n&quot;); // this gets printed else printf(&quot;symbol_1 != symbol_2\n&quot;); printf(&quot;symbol_1 address: %p\n&quot;, symbol_1); printf(&quot;symbol_2 address: %p\n&quot;, symbol_2); printf(&quot;my_node-&amp;gt;key: %p\n&quot;, my_node-&amp;gt;key); The above outputs: symbol_1 == symbol_2 symbol_1 address: 0x10d56feb8 symbol_2 address: 0x10d56feb8 my_node-&amp;gt;key: 0x10d56feb8 That’s counterintuitive! After all, we’re creating two separate char pointers, symbol_1 and symbol_2, so intuitively symbol_1 != symbol_2 (since they ought to be pointing to different places in memorry), and yet they share the same address along with my_node-&amp;gt;key. Turns out there are a few things going on here. The first is called string interning, where the compiler notices that we’re initializing two char pointers with the same string. Rather than putting that same string in memory twice, the compiler does some optimizations and just points them both to the same sequence of characters. That’s why inspecting the addresses of symbol_1 and symbol_2 reveals them to be pointing at precisely the same location in memory. The second issue has to do with building the Node struct. When we set node-&amp;gt;key = key, we are effectively assigning one char pointer (node.key) the value of another char pointer (key), which means that both are pointing to the same spot in memory. Here is a revised version copies the contents of the key string to newly-allocated space in memory and performs string comparisons not by comparing their pointers (str_1 == str_2) but by comparing their contents, character-for-character (strcmp(str_1, str_2)): typedef struct Node { char* key; int value; struct Node* next; } Node; Node* create_node(char* key, int val) { Node* node = malloc(sizeof(Node)); node-&amp;gt;key = malloc(strlen(key) + 1); strcpy(node-&amp;gt;key, key); node-&amp;gt;value = val; node-&amp;gt;next = NULL; return node; } int search(Node* head, char* target, int* default_val) { for (Node* cur = head; cur != NULL; cur = cur-&amp;gt;next) { if (strcmp(cur-&amp;gt;key, target) == 0) return cur-&amp;gt;value; } } insert_node(head, create_node(target, (*default_val)++)); return *default_val; } In the create_node() function, we first allocate space on the heap for a new Node struct (20 bytes in my estimation – two 8-byte pointers and a 4-byte int). Then we allocate enough space starting where node-&amp;gt;key is pointing to accommodate the string that has to fit there (strlen(key) + 1). Finally we can stick the contents of the key parameter there using strcpy(). Now, if search() checks for equality using cur-&amp;gt;key == target, this comparison will be false even if the contents of each char pointer are the same, since each is pointing to a separate place in memory, which is to say that the first and second pointers are not the same. Instead, we check for equality using strcmp (which returns 0 when the contents of each string are identical). For my final trick . . . Here’s my full linked list implementation, which I’ve fleshed out with wrapper struct LinkedList to hold on to references to the head and tail of the linked list. #include &amp;lt;stdio.h&amp;gt; #include &amp;lt;stdlib.h&amp;gt; #include &amp;lt;string.h&amp;gt; typedef struct Node { char* key; int val; struct Node* next; } Node; typedef struct LinkedList { Node* head; Node* tail; int len; } LinkedList; Node* create_node(char* key, int val) { Node* node = malloc(sizeof(Node)); node-&amp;gt;key = malloc(strlen(key) + 1); strcpy(node-&amp;gt;key, key); node-&amp;gt;val = val; node-&amp;gt;next = NULL; return node; } LinkedList* create_linked_list(void) { LinkedList* linkedlist = malloc(sizeof(LinkedList)); linkedlist-&amp;gt;head = NULL; linkedlist-&amp;gt;tail = NULL; linkedlist-&amp;gt;len = 0; return linkedlist; } void print_linked_list(LinkedList* linkedlist) { for (Node* cur = linkedlist-&amp;gt;head; cur != NULL; cur = cur-&amp;gt;next) printf(&quot;%s/%d -&amp;gt; &quot;, cur-&amp;gt;key, cur-&amp;gt;val); printf(&quot;\n&quot;); } int append(LinkedList* linkedlist, Node* new_node) { if (linkedlist-&amp;gt;head == NULL) { linkedlist-&amp;gt;head = linkedlist-&amp;gt;tail = new_node; } else { linkedlist-&amp;gt;tail-&amp;gt;next = new_node; linkedlist-&amp;gt;tail = linkedlist-&amp;gt;tail-&amp;gt;next; } return ++linkedlist-&amp;gt;len; } int search(LinkedList* linkedlist, char* target_key, int* default_val) { for (Node* cur = linkedlist-&amp;gt;head; cur != NULL; cur = cur-&amp;gt;next) { if (strcmp(cur-&amp;gt;key, target_key) == 0) return cur-&amp;gt;val; } append(linkedlist, create_node(target_key, *default_val)); return (*default_val)++; } int delete_node(LinkedList* linkedlist, char* target_key) { /* doesn't yet handle deletion of head node */ Node* tmp; for (Node* cur = linkedlist-&amp;gt;head; cur-&amp;gt;next != NULL; cur = cur-&amp;gt;next) { if (strcmp(cur-&amp;gt;next-&amp;gt;key, target_key) == 0) { tmp = cur-&amp;gt;next-&amp;gt;next; free(cur-&amp;gt;next-&amp;gt;key); free(cur-&amp;gt;next); cur-&amp;gt;next = tmp; return 0; // return 0 if found } } return -1; // return -1 if not found } int main() { int default_val = 16; // starting default val for inserts // create empty linked list LinkedList* symbols = create_linked_list(); // add something append(symbols, create_node(&quot;my_key&quot;, 100)); // search for something that doesn't exist search(symbols, &quot;your_key&quot;, &amp;amp;default_val); // =&amp;gt; returns 16 // print print_linked_list(symbols); // =&amp;gt; my_key/100 -&amp;gt; your_key/16 -&amp;gt;</summary></entry><entry><title type="html">Porting my assembler to C, Part II</title><link href="https://www.datadoodad.com/recurse%20center/Assembler-Part-2/" rel="alternate" type="text/html" title="Porting my assembler to C, Part II" /><published>2023-07-14T00:00:00-07:00</published><updated>2023-07-14T00:00:00-07:00</updated><id>https://www.datadoodad.com/recurse%20center/Assembler-Part-2</id><content type="html" xml:base="https://www.datadoodad.com/recurse%20center/Assembler-Part-2/">&lt;style type=&quot;text/css&quot;&gt;
  table, tr, td {
    border: none;
    text-align: center;
  }

  .array td {
    border: 1px solid black;
    /*background-color: white;*/
    width: 30px;
    height: 30px;
  }

  .bitsPlace { color: gray; }

  .i { background: #d3f6db; }
  .a { background: #92d5e6; }
  .c { background: #772d8b; color: #ccc; }
  .d { background: #5a0b4d; color: #ccc; }
  .j { background: #a1ef8b; }

&lt;/style&gt;

&lt;p&gt;I’ve been keeping that good Impossible Stuff Day energy going today and made excellent progress on my assembler project. It can’t handle labels or symbols yet, however it is happily parsing A instructions with number literals as well as the whole gamut of C instructions. It reads from a file, but for the sake of convenience it currently just emits machine code translations to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;stdout&lt;/code&gt; like so:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; 1:    @2 --&amp;gt; 0000000000000010
 2:   D=A --&amp;gt; 1110110000010000
 3:    @3 --&amp;gt; 0000000000000011
 4: D=D+A --&amp;gt; 1110000010010000
 5:    @0 --&amp;gt; 0000000000000000
 6:   M=D --&amp;gt; 1110001100001000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Hey, cool!&lt;/p&gt;

&lt;p&gt;Here are some of my solutions to the many little challenges along the way.&lt;/p&gt;

&lt;h1 id=&quot;trimming-leading-spaces-from-each-line&quot;&gt;Trimming leading spaces from each line&lt;/h1&gt;

&lt;p&gt;This is a feature my previous iteration did not have, which meant that the assembler assembled very little of sample .asm files that indented with reckless abandon.&lt;/p&gt;

&lt;p&gt;Here’s what I did:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; &lt;span class=&quot;c1&quot;&gt;// trim leading spaces from `line_in[]`&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;' '&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\t'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
     &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
         &lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
 &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I am using two pointers, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;j&lt;/code&gt;, which both start at zero. I march &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt; along past every damn space and tab it sees and then stop. Presuming &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt; has indeed done some marching, which is to say that it is no longer &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt;, I then march &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt; (from wherever it is) and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;j&lt;/code&gt; (which is at zero) along until &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;line[i] == '\0'&lt;/code&gt;, and as I go I reassign whatever non-space character is at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;line_in[i]&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;line_in[j]&lt;/code&gt;, in effect shifting the non-space portion of the string left.&lt;/p&gt;

&lt;h1 id=&quot;tokenizing-c-instructions&quot;&gt;Tokenizing C Instructions&lt;/h1&gt;

&lt;p&gt;Identifying A instructions is easy enough – they start with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@&lt;/code&gt;. And identifying L instructions (i.e., labels) is also easy enough – they start with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(&lt;/code&gt;. Everything else is a C instruction. Tokenizing a C instruction turned out to be a bit tricky, though, since they can consist of two or three parts.&lt;/p&gt;

&lt;p&gt;The general format is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;destination&amp;gt;=&amp;lt;computation&amp;gt;;&amp;lt;jump&amp;gt;&lt;/code&gt;, however a valid command need only have &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;destination&amp;gt;&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;jump&amp;gt;&lt;/code&gt; (or both, although this is rare in practice). That means I’m often faced with a command that looks like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;M=M-1&lt;/code&gt; (just the destination and computation commands) or that look like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D;JNE&lt;/code&gt; (just the computation and jump commands).&lt;/p&gt;

&lt;p&gt;At first I played around with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strtok()&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strsep()&lt;/code&gt;. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strtok()&lt;/code&gt; in particular seemed promising, since you can pass it multiple characters to split on (in this case &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;=;&lt;/code&gt;), and on successive iterations it returns the next chunk of the input string. The problem I encountered, though, was how to efficiently deal with the results. If all C Instructions had three tokens, that would have been one thing, but most of the time they end up consisting of just two tokens. How, then, to know if you’re getting a destination/computation pair or a computation/jump pair?&lt;/p&gt;

&lt;p&gt;Eventually I abandoned this (although I’m sure there’s a good way of doing what I wanted to do) in favor of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strchr()&lt;/code&gt;, which takes in a string and a search character and returns the index of the first occurrence of that character if it’s there or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NULL&lt;/code&gt; if not. Here’s what the code looked like:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;tokenize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jump&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;equal_sign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;semicolon&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// if there's an equal sign, we must have dest. and comp. tokens at least&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;equal_sign&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;strchr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'='&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;equal_sign&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;             &lt;span class=&quot;c1&quot;&gt;// copy everything up to the equal sign to dest&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;equal_sign&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;// copy everything else to comp&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// if comp contains a semi-colon, we have to extract the jump command&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;semicolon&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;strchr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;';'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;semicolon&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;          &lt;span class=&quot;c1&quot;&gt;// terminate comp where the semicolon was&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jump&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;semicolon&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// copy everything else to jump&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jump&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;           &lt;span class=&quot;c1&quot;&gt;// else jump is NULL&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// if there's no equal sign, we just have comp. and jump tokens&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;semicolon&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;strchr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;';'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;semicolon&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;               &lt;span class=&quot;c1&quot;&gt;// dest. is NULL&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;             &lt;span class=&quot;c1&quot;&gt;// copy everything up to semi-colon to comp&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;strcpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jump&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;semicolon&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;// copy everything else to jump&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I pass this function my &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;line_in&lt;/code&gt; string, as well as empty strings to hold each of the three tokens. It is this function’s job to put stuff in those.&lt;/p&gt;

&lt;p&gt;First, I check to see if the line has an equal sign. If so, then we assume it has destination and computation tokens at the very least, so we copy everything up to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;=&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dest&lt;/code&gt; and everything after it to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comp&lt;/code&gt;. Then, we check &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comp&lt;/code&gt; for the presence of a semicolon. If there is one, we terminate &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comp&lt;/code&gt; where the semicolon was and copy everything after to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jump&lt;/code&gt;. If not, then &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comp&lt;/code&gt; is complete and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jump&lt;/code&gt; just needs to be set to NULL.&lt;/p&gt;

&lt;p&gt;If, on the other hand, the input line does not have an equal sign, then we just go ahead and assume it only consists of computation and jump instructions. Same process here, except we set &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dest&lt;/code&gt; to NULL, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;comp&lt;/code&gt; to everything up to where the semicolon was, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jump&lt;/code&gt; to everything after.&lt;/p&gt;

&lt;p&gt;When I was trying to figure this out, I made this little REPL version of the tokenizer, which looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/2023-07-14/tokenizer_demo.gif&quot; alt=&quot;Tokenizer Demo&quot; /&gt;&lt;/p&gt;

&lt;h1 id=&quot;lookups&quot;&gt;Lookups&lt;/h1&gt;

&lt;p&gt;Once we have our tokens, we need some way of tranlsating them to their binary equivalents. For example, a 3-bit jump command like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JLT&lt;/code&gt; is encoded as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;100&lt;/code&gt;, and a 3-bit destination command like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;MD&lt;/code&gt; is encoded as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;011&lt;/code&gt;. There are 8 permutations of both jump and destination commands, which is fitting for 3-bit numbers (\(2^3=8\)).&lt;/p&gt;

&lt;p&gt;The computation commands consist of 7 bits, but mercifully there are not \(2^7\) permutations to deal with! Just 28 permutations in this version of assembly.&lt;/p&gt;

&lt;p&gt;As another reminder, here’s what the C instruction encoding looks like:&lt;/p&gt;

&lt;table&gt;
  &lt;tr class=&quot;array&quot;&gt;
    &lt;td class=&quot;i&quot;&gt;1&lt;/td&gt;
    &lt;td&gt;x&lt;/td&gt;
    &lt;td&gt;x&lt;/td&gt;
    &lt;td class=&quot;a&quot;&gt;a&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;d&quot;&gt;d&lt;/td&gt;
    &lt;td class=&quot;d&quot;&gt;d&lt;/td&gt;
    &lt;td class=&quot;d&quot;&gt;d&lt;/td&gt;
    &lt;td class=&quot;j&quot;&gt;j&lt;/td&gt;
    &lt;td class=&quot;j&quot;&gt;j&lt;/td&gt;
    &lt;td class=&quot;j&quot;&gt;j&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr class=&quot;bitsPlace&quot;&gt;
    &lt;td&gt;15&lt;/td&gt;
    &lt;td&gt;14&lt;/td&gt;
    &lt;td&gt;13&lt;/td&gt;
    &lt;td&gt;12&lt;/td&gt;
    &lt;td&gt;11&lt;/td&gt;
    &lt;td&gt;10&lt;/td&gt;
    &lt;td&gt;9&lt;/td&gt;
    &lt;td&gt;8&lt;/td&gt;
    &lt;td&gt;7&lt;/td&gt;
    &lt;td&gt;6&lt;/td&gt;
    &lt;td&gt;5&lt;/td&gt;
    &lt;td&gt;4&lt;/td&gt;
    &lt;td&gt;3&lt;/td&gt;
    &lt;td&gt;2&lt;/td&gt;
    &lt;td&gt;1&lt;/td&gt;
    &lt;td&gt;0&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;span class=&quot;i&quot;&gt;C instruction bit&lt;/span&gt;: identifies the computation instruction as such&lt;/li&gt;
  &lt;li&gt;bits 13 and 14 aren’t used&lt;/li&gt;
  &lt;li&gt;&lt;span class=&quot;a&quot;&gt;&lt;b&gt;a&lt;/b&gt; bit&lt;/span&gt;: dictates whether the ALU should accept its &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; input from the A Register or from somewhere in Memory. Effectively part of the computation instruction.&lt;/li&gt;
  &lt;li&gt;&lt;span class=&quot;c&quot;&gt;&lt;b&gt;c&lt;/b&gt; bits&lt;/span&gt;: comutation instruction&lt;/li&gt;
  &lt;li&gt;&lt;span class=&quot;d&quot;&gt;&lt;b&gt;d&lt;/b&gt; bits&lt;/span&gt;: destination instruction&lt;/li&gt;
  &lt;li&gt;&lt;span class=&quot;j&quot;&gt;&lt;b&gt;j&lt;/b&gt; bits&lt;/span&gt;: jump instruction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I initially wrote my assembler in Python, my approach was to use dictionaries as lookup tables to match each of the 8 destination instructions with their 3-bit binary counterparts, each of the 8 jump instructions with their 3-bit binary counterparts, and each of the 28 computation instructions with their 7-bit counterparts, sort of like this:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;dest_table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'M'&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'001'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D'&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'010'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'MD'&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'011'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'A'&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'100'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'AM'&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'101'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'AD'&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'110'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'AMD'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'111'&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;jump_table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'JGT'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'001'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'JEQ'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'010'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'JGE'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'011'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'JLT'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'100'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'JNE'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'101'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'JLE'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'110'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'JMP'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'111'&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;comp_table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'0'&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0101010'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'1'&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0111111'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'-1'&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0111010'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D'&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0001100'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'A'&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0110000'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'M'&lt;/span&gt;  &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'1110000'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'!D'&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0001101'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'!A'&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0110001'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'!M'&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'1110001'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'-D'&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0001111'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'-A'&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0110011'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'-M'&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'1110011'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D+1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0011111'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'A+1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0110111'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'M+1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'1110111'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D-1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0001110'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'A-1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0110010'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'M-1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'1110010'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D+A'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0000010'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D+M'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'1000010'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D-A'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0010011'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D-M'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'1010011'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'A-D'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0000111'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'M-D'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'1000111'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D&amp;amp;A'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0000000'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D&amp;amp;M'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'1000000'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D|A'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'0010101'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;'D|M'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'1010101'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then it was just a matter of concatenating the strings that resulted.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;dest&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dest_table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_token&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'000'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;jump&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jump_table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j_token&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'000'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp_table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c_token&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'0000000'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;c_instruction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;''&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'111'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jump&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;C doesn’t have hashmaps built in, of course. I considered building the data structure, but then thought it was overkill for lookups from lists of just a handful of items. Instead, I decided to use the ol’ parallel arrays method (although truth be told this concept was brand new to me!). If dictionaries/hashmaps have key/value pairs, this other method uses one array to hold the keys and the other to hold the values. To find what you need, you do a linear-time search on the keys array, hold on to the index once you find it, and then index in to the values array in constant time to get your match. It’s a far cry from the awesome O(1) time-complexity that a hashmap boasts, but, you know what? It’s fine.&lt;/p&gt;

&lt;p&gt;Here, then, is an example of one of my parallel arrays:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dest_keys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;M&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;D&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;MD&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;A&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;AM&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;AD&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;AMD&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
 &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;   &lt;span class=&quot;n&quot;&gt;dest_vals&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And here is the matching lookup function:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;parse_dest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dest_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;len&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dest_vals&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dest_vals&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]);&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strcmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dest_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dest_keys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dest_vals&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(I’m realizing all of a sudden that, while I can’t consolidate the three pairs of parallel arrays into a single parralel array – since, for instance, some destination instructions and computation instructions are identical – I &lt;em&gt;can&lt;/em&gt; get away with a single parsing function, since each one does the same thing! Duh.)&lt;/p&gt;

&lt;p&gt;Anyway, the parsing function does the thing I described: it searches through the keys array for the destination token, and if it finds a match it returns the value at that same index of the values array. Else it returns 0.&lt;/p&gt;

&lt;h1 id=&quot;building-the-c-instruction-bitwise&quot;&gt;Building the C Instruction bitwise&lt;/h1&gt;

&lt;p&gt;You may have noticed that the values array above did not consist of 3-bit strings but of integers. Aha! This is my final breakthrough of the day (which, incidentally, was a direct beneficiary of my &lt;a href=&quot;https://www.datadoodad.com/recurse%20center/RC41/&quot;&gt;excursions into network programming&lt;/a&gt;). In Python, as I mentioned, I just concatenated each of my little bitstrings together to form my C-instruction. Sounds tedious in C, though. Fortunately I realized I can build up the 16-bit integer I need using bitwise operations.&lt;/p&gt;

&lt;p&gt;Here’s how I’d do it in Python:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# tokens
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d_token&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;# --&amp;gt; 101 for dest. token AM
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c_token&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;114&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# --&amp;gt; 1110010 for comp. token M-1
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j_token&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;# --&amp;gt; 000 when jump is NULL
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# c-instruction under construction
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c_inst&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;# --&amp;gt; 111, which is what our three most signficant bits will need to be
&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;c_inst&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;      &lt;span class=&quot;c1&quot;&gt;# make room for the 7-bit computation token
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c_inst&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c_token&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# add in computation token
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c_inst&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;      &lt;span class=&quot;c1&quot;&gt;# make room for the 3-bit dest token
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c_inst&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d_token&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# add in the destination token
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c_inst&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;      &lt;span class=&quot;c1&quot;&gt;# make room for the 3-bit jump token
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c_inst&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j_token&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# add in the jump token
&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c_inst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;                   &lt;span class=&quot;c1&quot;&gt;# =&amp;gt; 64680
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;{:016b}&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c_inst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# =&amp;gt; 1111110010101000
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Voila! Now our 16-bit C instruction is represented by a single 16-bit unsigned integer. Yes, I’ll eventually need to convert this to a 16-bit bitstring, &lt;a href=&quot;https://www.datadoodad.com/recurse%20center/RC43/&quot;&gt;but I already built that conversion function&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here’s my C version:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;build_C_COMMAND&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;uint16_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jump&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dest_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jump_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;tokenize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dest_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jump_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;dest&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parse_dest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dest_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parse_comp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;comp_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;jump&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parse_jump&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jump_command&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// set output bits&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;         &lt;span class=&quot;c1&quot;&gt;// set most signifiant 3 bits to 111&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;comp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;     &lt;span class=&quot;c1&quot;&gt;// set 7 comp bits&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;     &lt;span class=&quot;c1&quot;&gt;// set 3 dest bits&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;jump&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;                &lt;span class=&quot;c1&quot;&gt;// set 3 jump bits&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;itob&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line_out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;// convert to binstring&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Next on the list is handling symbols and labels.&lt;/p&gt;</content><author><name>Zach Rottman, PhD</name></author><category term="Recurse Center" /><category term="Nand2Tetris" /><category term="C" /><category term="Building an Assembler in C" /><summary type="html">I’ve been keeping that good Impossible Stuff Day energy going today and made excellent progress on my assembler project. It can’t handle labels or symbols yet, however it is happily parsing A instructions with number literals as well as the whole gamut of C instructions. It reads from a file, but for the sake of convenience it currently just emits machine code translations to stdout like so: 1: @2 --&amp;gt; 0000000000000010 2: D=A --&amp;gt; 1110110000010000 3: @3 --&amp;gt; 0000000000000011 4: D=D+A --&amp;gt; 1110000010010000 5: @0 --&amp;gt; 0000000000000000 6: M=D --&amp;gt; 1110001100001000 Hey, cool! Here are some of my solutions to the many little challenges along the way. Trimming leading spaces from each line This is a feature my previous iteration did not have, which meant that the assembler assembled very little of sample .asm files that indented with reckless abandon. Here’s what I did: // trim leading spaces from `line_in[]` i = 0; j = 0; while (line_in[i] == ' ' || line_in[i] == '\t') i++; if (i &amp;gt; 0) { while (line_in[i] != '\0') line_in[j++] = line_in[i++]; line_in[j] = '\0'; } I am using two pointers, i and j, which both start at zero. I march i along past every damn space and tab it sees and then stop. Presuming i has indeed done some marching, which is to say that it is no longer 0, I then march i (from wherever it is) and j (which is at zero) along until line[i] == '\0', and as I go I reassign whatever non-space character is at line_in[i] to line_in[j], in effect shifting the non-space portion of the string left. Tokenizing C Instructions Identifying A instructions is easy enough – they start with @. And identifying L instructions (i.e., labels) is also easy enough – they start with (. Everything else is a C instruction. Tokenizing a C instruction turned out to be a bit tricky, though, since they can consist of two or three parts. The general format is &amp;lt;destination&amp;gt;=&amp;lt;computation&amp;gt;;&amp;lt;jump&amp;gt;, however a valid command need only have &amp;lt;destination&amp;gt; or &amp;lt;jump&amp;gt; (or both, although this is rare in practice). That means I’m often faced with a command that looks like M=M-1 (just the destination and computation commands) or that look like D;JNE (just the computation and jump commands). At first I played around with strtok() and strsep(). strtok() in particular seemed promising, since you can pass it multiple characters to split on (in this case =;), and on successive iterations it returns the next chunk of the input string. The problem I encountered, though, was how to efficiently deal with the results. If all C Instructions had three tokens, that would have been one thing, but most of the time they end up consisting of just two tokens. How, then, to know if you’re getting a destination/computation pair or a computation/jump pair? Eventually I abandoned this (although I’m sure there’s a good way of doing what I wanted to do) in favor of strchr(), which takes in a string and a search character and returns the index of the first occurrence of that character if it’s there or NULL if not. Here’s what the code looked like: void tokenize(char *line, char *comp, char *dest, char *jump) { char *equal_sign, *semicolon; // if there's an equal sign, we must have dest. and comp. tokens at least if ((equal_sign = strchr(line, '='))) { *equal_sign = '\0'; strcpy(dest, line); // copy everything up to the equal sign to dest strcpy(comp, equal_sign + 1); // copy everything else to comp // if comp contains a semi-colon, we have to extract the jump command if ((semicolon = strchr(comp, ';'))) { *semicolon = '\0'; // terminate comp where the semicolon was strcpy(jump, semicolon + 1);// copy everything else to jump } else { strcpy(jump, &quot;&quot;); // else jump is NULL } // if there's no equal sign, we just have comp. and jump tokens } else { semicolon = strchr(line, ';'); *semicolon = '\0'; strcpy(dest, &quot;&quot;); // dest. is NULL strcpy(comp, line); // copy everything up to semi-colon to comp strcpy(jump, semicolon + 1); // copy everything else to jump } } I pass this function my line_in string, as well as empty strings to hold each of the three tokens. It is this function’s job to put stuff in those. First, I check to see if the line has an equal sign. If so, then we assume it has destination and computation tokens at the very least, so we copy everything up to the = to dest and everything after it to comp. Then, we check comp for the presence of a semicolon. If there is one, we terminate comp where the semicolon was and copy everything after to jump. If not, then comp is complete and jump just needs to be set to NULL. If, on the other hand, the input line does not have an equal sign, then we just go ahead and assume it only consists of computation and jump instructions. Same process here, except we set dest to NULL, comp to everything up to where the semicolon was, and jump to everything after. When I was trying to figure this out, I made this little REPL version of the tokenizer, which looks like this: Lookups Once we have our tokens, we need some way of tranlsating them to their binary equivalents. For example, a 3-bit jump command like JLT is encoded as 100, and a 3-bit destination command like MD is encoded as 011. There are 8 permutations of both jump and destination commands, which is fitting for 3-bit numbers (\(2^3=8\)). The computation commands consist of 7 bits, but mercifully there are not \(2^7\) permutations to deal with! Just 28 permutations in this version of assembly. As another reminder, here’s what the C instruction encoding looks like: 1 x x a c c c c c c d d d j j j 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 C instruction bit: identifies the computation instruction as such bits 13 and 14 aren’t used a bit: dictates whether the ALU should accept its y input from the A Register or from somewhere in Memory. Effectively part of the computation instruction. c bits: comutation instruction d bits: destination instruction j bits: jump instruction When I initially wrote my assembler in Python, my approach was to use dictionaries as lookup tables to match each of the 8 destination instructions with their 3-bit binary counterparts, each of the 8 jump instructions with their 3-bit binary counterparts, and each of the 28 computation instructions with their 7-bit counterparts, sort of like this: dest_table = { 'M' :'001', 'D' :'010', 'MD' :'011', 'A' :'100', 'AM' :'101', 'AD' :'110', 'AMD':'111' } jump_table = { 'JGT':'001', 'JEQ':'010', 'JGE':'011', 'JLT':'100', 'JNE':'101', 'JLE':'110', 'JMP':'111' } comp_table = { '0' :'0101010', '1' :'0111111', '-1' :'0111010', 'D' :'0001100', 'A' :'0110000', 'M' :'1110000', '!D' :'0001101', '!A' :'0110001', '!M' :'1110001', '-D' :'0001111', '-A' :'0110011', '-M' :'1110011', 'D+1':'0011111', 'A+1':'0110111', 'M+1':'1110111', 'D-1':'0001110', 'A-1':'0110010', 'M-1':'1110010', 'D+A':'0000010', 'D+M':'1000010', 'D-A':'0010011', 'D-M':'1010011', 'A-D':'0000111', 'M-D':'1000111', 'D&amp;amp;A':'0000000', 'D&amp;amp;M':'1000000', 'D|A':'0010101', 'D|M':'1010101', } Then it was just a matter of concatenating the strings that resulted. dest = dest_table.get(d_token, '000') jump = jump_table.get(j_token, '000') comp = comp_table.get(c_token, '0000000') c_instruction = ''.join(['111', comp, dest, jump]) C doesn’t have hashmaps built in, of course. I considered building the data structure, but then thought it was overkill for lookups from lists of just a handful of items. Instead, I decided to use the ol’ parallel arrays method (although truth be told this concept was brand new to me!). If dictionaries/hashmaps have key/value pairs, this other method uses one array to hold the keys and the other to hold the values. To find what you need, you do a linear-time search on the keys array, hold on to the index once you find it, and then index in to the values array in constant time to get your match. It’s a far cry from the awesome O(1) time-complexity that a hashmap boasts, but, you know what? It’s fine. Here, then, is an example of one of my parallel arrays: char *dest_keys[] = { &quot;M&quot;, &quot;D&quot;, &quot;MD&quot;, &quot;A&quot;, &quot;AM&quot;, &quot;AD&quot;, &quot;AMD&quot; }; int dest_vals[] = { 1, 2, 3, 4, 5, 6, 7 }; And here is the matching lookup function: int parse_dest(char *dest_command) { int len = sizeof(dest_vals)/sizeof(dest_vals[0]); for (int i=0; i&amp;lt;len; ++i) { if (!strcmp(dest_command, dest_keys[i])) { return dest_vals[i]; } } return 0; } (I’m realizing all of a sudden that, while I can’t consolidate the three pairs of parallel arrays into a single parralel array – since, for instance, some destination instructions and computation instructions are identical – I can get away with a single parsing function, since each one does the same thing! Duh.) Anyway, the parsing function does the thing I described: it searches through the keys array for the destination token, and if it finds a match it returns the value at that same index of the values array. Else it returns 0. Building the C Instruction bitwise You may have noticed that the values array above did not consist of 3-bit strings but of integers. Aha! This is my final breakthrough of the day (which, incidentally, was a direct beneficiary of my excursions into network programming). In Python, as I mentioned, I just concatenated each of my little bitstrings together to form my C-instruction. Sounds tedious in C, though. Fortunately I realized I can build up the 16-bit integer I need using bitwise operations. Here’s how I’d do it in Python: # tokens d_token = 5 # --&amp;gt; 101 for dest. token AM c_token = 114 # --&amp;gt; 1110010 for comp. token M-1 j_token = 0 # --&amp;gt; 000 when jump is NULL # c-instruction under construction c_inst = 7 # --&amp;gt; 111, which is what our three most signficant bits will need to be c_inst &amp;lt;&amp;lt;= 7 # make room for the 7-bit computation token c_inst |= c_token # add in computation token c_inst &amp;lt;&amp;lt;= 3 # make room for the 3-bit dest token c_inst |= d_token # add in the destination token c_inst &amp;lt;&amp;lt;= 3 # make room for the 3-bit jump token c_inst |= j_token # add in the jump token print(c_inst) # =&amp;gt; 64680 print(&quot;{:016b}&quot;.format(c_inst)) # =&amp;gt; 1111110010101000 Voila! Now our 16-bit C instruction is represented by a single 16-bit unsigned integer. Yes, I’ll eventually need to convert this to a 16-bit bitstring, but I already built that conversion function. Here’s my C version: void build_C_COMMAND(char *line_in, char *line_out) { uint16_t out = 7; int dest, comp, jump; char dest_command[4] = {0}; char comp_command[4] = {0}; char jump_command[4] = {0}; tokenize(line_in, comp_command, dest_command, jump_command); dest = parse_dest(dest_command); comp = parse_comp(comp_command); jump = parse_jump(jump_command); // set output bits out = 7; out &amp;lt;&amp;lt;= 7; // set most signifiant 3 bits to 111 out |= comp; out &amp;lt;&amp;lt;= 3; // set 7 comp bits out |= dest; out &amp;lt;&amp;lt;= 3; // set 3 dest bits out |= jump; // set 3 jump bits itob(out, line_out, 16); // convert to binstring } Next on the list is handling symbols and labels.</summary></entry><entry><title type="html">RC43. Impossible Stuff Day: Porting my assembler to C, Part I</title><link href="https://www.datadoodad.com/recurse%20center/RC43/" rel="alternate" type="text/html" title="RC43. Impossible Stuff Day: Porting my assembler to C, Part I" /><published>2023-07-13T00:00:00-07:00</published><updated>2023-07-13T00:00:00-07:00</updated><id>https://www.datadoodad.com/recurse%20center/RC43</id><content type="html" xml:base="https://www.datadoodad.com/recurse%20center/RC43/">&lt;style type=&quot;text/css&quot;&gt;
  table, tr, td {
    border: none;
    text-align: center;
  }

  .array td {
    border: 1px solid black;
    /*background-color: white;*/
    width: 30px;
    height: 30px;
  }

  .bitsPlace { color: gray; }

  .i { background: #d3f6db; }
  .a { background: #92d5e6; }
  .c { background: #772d8b; color: #ccc; }
  .d { background: #5a0b4d; color: #ccc; }
  .j { background: #a1ef8b; }

&lt;/style&gt;

&lt;p&gt;A few weeks back I built an assembler in Python as a project for Nand2Tetris. When I was first plotting my approach and feeling ambitious, I thought, hey, maybe I should write this in C. But every time I thought about how to actually do that, it felt, well, impossible, so in the end I just used Python to do my dirty work.&lt;/p&gt;

&lt;p&gt;Today, however, was Impossible Stuff Day, which meant it was time for me to dive in and see how far I could get.&lt;/p&gt;

&lt;h1 id=&quot;assembler-preambler&quot;&gt;Assembler Preamble(r)&lt;/h1&gt;

&lt;p&gt;The assembler’s job is to take an assembly program like this one from n2t&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;@R0
D=M
@R1
D=D-M
@OUTPUT_FIRST
D;JGT
@R1
D=M
@OUTPUT_D
0;JMP
(OUTPUT_FIRST)
@R0
D=M
(OUTPUT_D)
@R2
M=D
(INFINITE_LOOP)
@INFINITE_LOOP
0;JMP
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and translate it into machine language like this&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;0000000000000000
1111110000010000
0000000000000001
1111010011010000
0000000000001010
1110001100000001
0000000000000001
1111110000010000
0000000000001100
1110101010000111
0000000000000000
1111110000010000
0000000000000010
1110001100001000
0000000000001110
1110101010000111
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the case of nand2tetris, we’re working with a 16-bit machine, which means each instruction is 16-bits long.&lt;/p&gt;

&lt;p&gt;I wrote about assembly basics &lt;a href=&quot;https://www.datadoodad.com/recurse%20center/RC26/&quot;&gt;a few weeks ago&lt;/a&gt; when building the CPU to this computer, which is the thing that actually does stuff with the 16-bit codes above. It’s worth revisiting some of the key ideas here.&lt;/p&gt;

&lt;p&gt;In the nand2tetris dialect of assembly, there are two types of commands:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;address commands (they start with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@&lt;/code&gt; and load a value into the CPU’s A Register)&lt;/li&gt;
  &lt;li&gt;compute commands (the other stuff, which tells the CPU which registers to read from, what to do with those values, and where to store the result)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(There are also labels – the things in parentheses – which aren’t really commands but label line-numbers so that the program can jump around later on.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Address commands&lt;/strong&gt; are the simplest, since converting them to 16-bit strings is a matter of converting the number to a 15-bit integer and sticking a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt; on the front end, which is what identifies the instruction as an A Instruction.&lt;/p&gt;

&lt;table&gt;
  &lt;tr class=&quot;array&quot;&gt;
    &lt;td class=&quot;i&quot;&gt;0&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
    &lt;td&gt;v&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr class=&quot;bitsPlace&quot;&gt;
    &lt;td&gt;15&lt;/td&gt;
    &lt;td&gt;14&lt;/td&gt;
    &lt;td&gt;13&lt;/td&gt;
    &lt;td&gt;12&lt;/td&gt;
    &lt;td&gt;11&lt;/td&gt;
    &lt;td&gt;10&lt;/td&gt;
    &lt;td&gt;9&lt;/td&gt;
    &lt;td&gt;8&lt;/td&gt;
    &lt;td&gt;7&lt;/td&gt;
    &lt;td&gt;6&lt;/td&gt;
    &lt;td&gt;5&lt;/td&gt;
    &lt;td&gt;4&lt;/td&gt;
    &lt;td&gt;3&lt;/td&gt;
    &lt;td&gt;2&lt;/td&gt;
    &lt;td&gt;1&lt;/td&gt;
    &lt;td&gt;0&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;span class=&quot;i&quot;&gt;the most significant bit&lt;/span&gt; (set to 0) identifies the instruction as an A instruction&lt;/li&gt;
  &lt;li&gt;the least significant 15 bits represent a 15-bit unsigned integer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;C Instructions&lt;/strong&gt; are trickier, since they have three component parts that are encoded into various chunks of the 16-bit instruction string. For sample C-instructions like “AD=M+1” or “D;JEQ” or “D=1” or “D=M-1;JLT”:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;everything to the left of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;=&lt;/code&gt; is the destination part of the command – i.e., where the ALU’s output needs to go. (&lt;strong&gt;D&lt;/strong&gt; is the destination in the command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D=M+1&lt;/code&gt;.) This piece of information gets encoded as a 3-bit sequence and spans bits 3-5 of the resulting instruction string.&lt;/li&gt;
  &lt;li&gt;everything to the right of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;=&lt;/code&gt; and to the left of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;;&lt;/code&gt; is the computation command – i.e., what computation the ALU performs on its inputs. (&lt;strong&gt;M+1&lt;/strong&gt; is the computation in the command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D=M+1&lt;/code&gt;.) There are a lot of options here, so this piece of information gets encoded as a 7-bit sequence and spans bits 6-12 of the resulting instruction string.&lt;/li&gt;
  &lt;li&gt;everything right of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;;&lt;/code&gt; is the jump command, which, if present, loads the value in the A Register (in this case representing a program line number) into the Program Counter, thus “jumping” to another line in the program. (&lt;strong&gt;JGT&lt;/strong&gt; specifies the jump in the command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D;JGT&lt;/code&gt;.) This piece of information gets encoded as a 3-bit sequence and spans bits 0-2 in the resulting instruction string.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s complicated, but the whole thing looks like this:&lt;/p&gt;

&lt;table&gt;
  &lt;tr class=&quot;array&quot;&gt;
    &lt;td class=&quot;i&quot;&gt;1&lt;/td&gt;
    &lt;td&gt;x&lt;/td&gt;
    &lt;td&gt;x&lt;/td&gt;
    &lt;td class=&quot;a&quot;&gt;a&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;c&quot;&gt;c&lt;/td&gt;
    &lt;td class=&quot;d&quot;&gt;d&lt;/td&gt;
    &lt;td class=&quot;d&quot;&gt;d&lt;/td&gt;
    &lt;td class=&quot;d&quot;&gt;d&lt;/td&gt;
    &lt;td class=&quot;j&quot;&gt;j&lt;/td&gt;
    &lt;td class=&quot;j&quot;&gt;j&lt;/td&gt;
    &lt;td class=&quot;j&quot;&gt;j&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr class=&quot;bitsPlace&quot;&gt;
    &lt;td&gt;15&lt;/td&gt;
    &lt;td&gt;14&lt;/td&gt;
    &lt;td&gt;13&lt;/td&gt;
    &lt;td&gt;12&lt;/td&gt;
    &lt;td&gt;11&lt;/td&gt;
    &lt;td&gt;10&lt;/td&gt;
    &lt;td&gt;9&lt;/td&gt;
    &lt;td&gt;8&lt;/td&gt;
    &lt;td&gt;7&lt;/td&gt;
    &lt;td&gt;6&lt;/td&gt;
    &lt;td&gt;5&lt;/td&gt;
    &lt;td&gt;4&lt;/td&gt;
    &lt;td&gt;3&lt;/td&gt;
    &lt;td&gt;2&lt;/td&gt;
    &lt;td&gt;1&lt;/td&gt;
    &lt;td&gt;0&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;span class=&quot;i&quot;&gt;the most significant bit&lt;/span&gt; (set to 1) identifies the computation instruction as such&lt;/li&gt;
  &lt;li&gt;bits 13 and 14 aren’t used&lt;/li&gt;
  &lt;li&gt;&lt;span class=&quot;a&quot;&gt;the &lt;b&gt;a&lt;/b&gt; bit&lt;/span&gt; (bit 12) dictates whether the ALU should accept its &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; input from the A Register or from somewhere in Memory. This is effectively part of the computation instruction, which is why I grouped it that way in describing it above.&lt;/li&gt;
  &lt;li&gt;&lt;span class=&quot;c&quot;&gt;the 6 &lt;b&gt;c&lt;/b&gt; bits&lt;/span&gt; (bits 6-11) specify the ALU function and are fed into its 6 function inputs &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;zx&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nx&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;zy&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ny&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;no&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;span class=&quot;d&quot;&gt;the 3 &lt;b&gt;d&lt;/b&gt; bits&lt;/span&gt; (bits 3-5) specify the destination for the ALU’s output, either &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;M&lt;/code&gt; (memory), &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt; (A register), or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;D&lt;/code&gt; (D register), or some combination of the three.&lt;/li&gt;
  &lt;li&gt;&lt;span class=&quot;j&quot;&gt;the 3 &lt;b&gt;j&lt;/b&gt; bits&lt;/span&gt; (bits 0-2) specify any jump command based on whether the ALU’s output is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt; 0&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;= 0&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;== 0&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt;= 0&lt;/code&gt;, or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;gt; 0&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;assembler-in-c&quot;&gt;Assembler in C&lt;/h1&gt;

&lt;p&gt;Right. So how far did I get porting my assembler to C today? Well, not really that far, I suppose. Basically, I handled the A-Instructions (and only in the case where A-Instructions consist of Number literals, at that) – i.e., taking a command like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@53&lt;/code&gt; and emitting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0000000000110101&lt;/code&gt; or a command like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@9&lt;/code&gt; and emitting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0000000000001001&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here’s what my code in action looks like at the moment:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; 1:    @2 --&amp;gt; 0000000000000010
 2:   D=A --&amp;gt; C Command
 3:    @3 --&amp;gt; 0000000000000011
 4: D=D+A --&amp;gt; C Command
 5:    @0 --&amp;gt; 0000000000000000
 6:   M=D --&amp;gt; C Command
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Of course the C Commands aren’t yet implemented, so just placeholders for now. But the A Instructions are doing their thing and doing it right.&lt;/p&gt;

&lt;p&gt;It doesn’t seem like much, and maybe it isn’t. But getting here meant bumping into many, MANY walls, which meant making HUGE leaps.&lt;/p&gt;

&lt;p&gt;Here are some highlights:&lt;/p&gt;

&lt;h3 id=&quot;reading-in-a-file&quot;&gt;Reading in a file&lt;/h3&gt;

&lt;p&gt;Here’s how.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;
#include &amp;lt;string.h&amp;gt;
&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;argc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// exit if argc is not as expected&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;usage: assembler &amp;lt;path&amp;gt;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXIT_FAILURE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// macro defined in stdlib.h&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;kt&quot;&gt;FILE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;256&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;linecount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// open file passed as CL arg&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fopen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;r&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// check for errors opening file&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Error opening file %s&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;argv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]);&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXIT_FAILURE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// loop through input file and parse&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fgets&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sizeof&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;// remove trailing newline or carriage return&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strcspn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n\r&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;// strip comments&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;comment_pos&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;strstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;//&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;comment_pos&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;comment_pos&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;// do stuff if line not blank&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;%2d: %5s --&amp;gt; &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;linecount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// print line_in&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// close file&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;fclose&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;All this does is read in the file passed along as a command-line argument and print it back out to stdout (along with a line number), stripping comments and removing blank lines as it does so.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fgets()&lt;/code&gt; is the bread-and-butter of this version of a file reader, although I’ve been advised to consider &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getline()&lt;/code&gt;. To remove trailing newlines and carriage returns, I’m using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strcspn()&lt;/code&gt; to reset instances of those with a string terminator. To strip comments, I’m using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strstr()&lt;/code&gt; in a somewhat similar way to return a pointer to the location where a comment is found and reassigning that index a string terminator. As long as there’s something in the resulting line, (i.e., the first character isn’t &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;\0&lt;/code&gt;), then I print it!&lt;/p&gt;

&lt;h3 id=&quot;converting-integers-to-bitstrings&quot;&gt;Converting integers to bitstrings&lt;/h3&gt;

&lt;p&gt;Then there’s the issue of converting integers to bitstrings. Here’s how.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;itob&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint16_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;num&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;num&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'1'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;num&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;build_A_COMMAND&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;uint16_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;atoi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line_in&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// convert line[1:] to int&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x7fff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;         &lt;span class=&quot;c1&quot;&gt;// set MSB to 0 if i&amp;gt;32767&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;itob&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line_out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// convert i to 15+1-bit string and save to output&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;First, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;build_A_COMMAND()&lt;/code&gt; function, which I call only after determining a line is an A Instruction (i.e., it starts with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@&lt;/code&gt;). This function takes two strings as arguments – the assembly line it’s parsing and the binary line it’s building. It converts the input line (after the initial &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@&lt;/code&gt; character) to an 16-bit integer, then it resets the most significant bit to 0 (a hack-y way of ensuring that the most significant bit is 0, but some better error-handling is needed here to validate that the input integer is max 15-bits to begin with). Lastly it calls &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;itob()&lt;/code&gt;, which is my fun little integer-to-bitstring converter.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;itob()&lt;/code&gt; checks to see if the least significant bit of the 16-bit integer above is 1, and then sets output string &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b[15]&lt;/code&gt; to 1 or 0 accordingly. Then it shifts the 16-bit integer right one bit and continues along for the full 16 bits, writing the bitstring from least [15] to most [0] significant bit as it goes.&lt;/p&gt;

&lt;h1 id=&quot;next-steps&quot;&gt;Next steps&lt;/h1&gt;
&lt;p&gt;I’ve already scaffolded out the various functions for building C Instructions. That means the next big step is sorting out some sort of lookup table to connect a command component (e.g., a jump command like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JNE&lt;/code&gt;) to its binary equivalen (in this case &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;101&lt;/code&gt;). In Python that was a simple matter of dictionaries. Here, I’m considering implementing a simple hash table to do that.&lt;/p&gt;

&lt;p&gt;After that, it’s a matter of implementing functionality to handle symbols and labels, which requires another hash table.&lt;/p&gt;</content><author><name>Zach Rottman, PhD</name></author><category term="Recurse Center" /><category term="Nand2Tetris" /><category term="C" /><category term="Building an Assembler in C" /><summary type="html">A few weeks back I built an assembler in Python as a project for Nand2Tetris. When I was first plotting my approach and feeling ambitious, I thought, hey, maybe I should write this in C. But every time I thought about how to actually do that, it felt, well, impossible, so in the end I just used Python to do my dirty work. Today, however, was Impossible Stuff Day, which meant it was time for me to dive in and see how far I could get. Assembler Preamble(r) The assembler’s job is to take an assembly program like this one from n2t @R0 D=M @R1 D=D-M @OUTPUT_FIRST D;JGT @R1 D=M @OUTPUT_D 0;JMP (OUTPUT_FIRST) @R0 D=M (OUTPUT_D) @R2 M=D (INFINITE_LOOP) @INFINITE_LOOP 0;JMP and translate it into machine language like this 0000000000000000 1111110000010000 0000000000000001 1111010011010000 0000000000001010 1110001100000001 0000000000000001 1111110000010000 0000000000001100 1110101010000111 0000000000000000 1111110000010000 0000000000000010 1110001100001000 0000000000001110 1110101010000111 In the case of nand2tetris, we’re working with a 16-bit machine, which means each instruction is 16-bits long. I wrote about assembly basics a few weeks ago when building the CPU to this computer, which is the thing that actually does stuff with the 16-bit codes above. It’s worth revisiting some of the key ideas here. In the nand2tetris dialect of assembly, there are two types of commands: address commands (they start with @ and load a value into the CPU’s A Register) compute commands (the other stuff, which tells the CPU which registers to read from, what to do with those values, and where to store the result) (There are also labels – the things in parentheses – which aren’t really commands but label line-numbers so that the program can jump around later on.) Address commands are the simplest, since converting them to 16-bit strings is a matter of converting the number to a 15-bit integer and sticking a 0 on the front end, which is what identifies the instruction as an A Instruction. 0 v v v v v v v v v v v v v v v 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 the most significant bit (set to 0) identifies the instruction as an A instruction the least significant 15 bits represent a 15-bit unsigned integer C Instructions are trickier, since they have three component parts that are encoded into various chunks of the 16-bit instruction string. For sample C-instructions like “AD=M+1” or “D;JEQ” or “D=1” or “D=M-1;JLT”: everything to the left of the = is the destination part of the command – i.e., where the ALU’s output needs to go. (D is the destination in the command D=M+1.) This piece of information gets encoded as a 3-bit sequence and spans bits 3-5 of the resulting instruction string. everything to the right of the = and to the left of the ; is the computation command – i.e., what computation the ALU performs on its inputs. (M+1 is the computation in the command D=M+1.) There are a lot of options here, so this piece of information gets encoded as a 7-bit sequence and spans bits 6-12 of the resulting instruction string. everything right of the ; is the jump command, which, if present, loads the value in the A Register (in this case representing a program line number) into the Program Counter, thus “jumping” to another line in the program. (JGT specifies the jump in the command D;JGT.) This piece of information gets encoded as a 3-bit sequence and spans bits 0-2 in the resulting instruction string. That’s complicated, but the whole thing looks like this: 1 x x a c c c c c c d d d j j j 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 the most significant bit (set to 1) identifies the computation instruction as such bits 13 and 14 aren’t used the a bit (bit 12) dictates whether the ALU should accept its y input from the A Register or from somewhere in Memory. This is effectively part of the computation instruction, which is why I grouped it that way in describing it above. the 6 c bits (bits 6-11) specify the ALU function and are fed into its 6 function inputs zx, nx, zy, ny, f, and no. the 3 d bits (bits 3-5) specify the destination for the ALU’s output, either M (memory), A (A register), or D (D register), or some combination of the three. the 3 j bits (bits 0-2) specify any jump command based on whether the ALU’s output is &amp;lt; 0, &amp;lt;= 0, == 0, &amp;gt;= 0, or &amp;gt; 0. Assembler in C Right. So how far did I get porting my assembler to C today? Well, not really that far, I suppose. Basically, I handled the A-Instructions (and only in the case where A-Instructions consist of Number literals, at that) – i.e., taking a command like @53 and emitting 0000000000110101 or a command like @9 and emitting 0000000000001001. Here’s what my code in action looks like at the moment: 1: @2 --&amp;gt; 0000000000000010 2: D=A --&amp;gt; C Command 3: @3 --&amp;gt; 0000000000000011 4: D=D+A --&amp;gt; C Command 5: @0 --&amp;gt; 0000000000000000 6: M=D --&amp;gt; C Command Of course the C Commands aren’t yet implemented, so just placeholders for now. But the A Instructions are doing their thing and doing it right. It doesn’t seem like much, and maybe it isn’t. But getting here meant bumping into many, MANY walls, which meant making HUGE leaps. Here are some highlights: Reading in a file Here’s how. #include &amp;lt;stdio.h&amp;gt; #include &amp;lt;stdlib.h&amp;gt; #include &amp;lt;string.h&amp;gt; int main(int argc, char **argv) { // exit if argc is not as expected if (argc != 2) { printf(&quot;usage: assembler &amp;lt;path&amp;gt;\n&quot;); return EXIT_FAILURE; // macro defined in stdlib.h } FILE *f; char line_in[256]; int linecount = 0; // open file passed as CL arg f = fopen(argv[1], &quot;r&quot;); // check for errors opening file if (f == NULL) { printf(&quot;Error opening file %s\n&quot;, argv[1]); return EXIT_FAILURE; } // loop through input file and parse while (fgets(line_in, sizeof line_in, f) != NULL) { // remove trailing newline or carriage return line_in[strcspn(line_in, &quot;\n\r&quot;)] = '\0'; // strip comments char *comment_pos = strstr(line_in, &quot;//&quot;); if (comment_pos != NULL) *comment_pos = '\0'; // do stuff if line not blank if (line_in[0] != '\0') printf(&quot;%2d: %5s --&amp;gt; &quot;, ++linecount, line_in); // print line_in } } // close file fclose(f); return 0; } All this does is read in the file passed along as a command-line argument and print it back out to stdout (along with a line number), stripping comments and removing blank lines as it does so. fgets() is the bread-and-butter of this version of a file reader, although I’ve been advised to consider getline(). To remove trailing newlines and carriage returns, I’m using strcspn() to reset instances of those with a string terminator. To strip comments, I’m using strstr() in a somewhat similar way to return a pointer to the location where a comment is found and reassigning that index a string terminator. As long as there’s something in the resulting line, (i.e., the first character isn’t \0), then I print it! Converting integers to bitstrings Then there’s the issue of converting integers to bitstrings. Here’s how. void itob(uint16_t num, char *b, int len) { for (int i=0; i&amp;lt;len; ++i) { b[len-i-1] = ((num &amp;amp; 1) == 1) ? '1' : '0'; num &amp;gt;&amp;gt;= 1; } } void build_A_COMMAND(char *line_in, char *line_out) { uint16_t i; i = atoi(line_in + 1); // convert line[1:] to int i = i &amp;amp; 0x7fff; // set MSB to 0 if i&amp;gt;32767 itob(i, line_out, 16); // convert i to 15+1-bit string and save to output } First, the build_A_COMMAND() function, which I call only after determining a line is an A Instruction (i.e., it starts with @). This function takes two strings as arguments – the assembly line it’s parsing and the binary line it’s building. It converts the input line (after the initial @ character) to an 16-bit integer, then it resets the most significant bit to 0 (a hack-y way of ensuring that the most significant bit is 0, but some better error-handling is needed here to validate that the input integer is max 15-bits to begin with). Lastly it calls itob(), which is my fun little integer-to-bitstring converter. itob() checks to see if the least significant bit of the 16-bit integer above is 1, and then sets output string b[15] to 1 or 0 accordingly. Then it shifts the 16-bit integer right one bit and continues along for the full 16 bits, writing the bitstring from least [15] to most [0] significant bit as it goes. Next steps I’ve already scaffolded out the various functions for building C Instructions. That means the next big step is sorting out some sort of lookup table to connect a command component (e.g., a jump command like JNE) to its binary equivalen (in this case 101). In Python that was a simple matter of dictionaries. Here, I’m considering implementing a simple hash table to do that. After that, it’s a matter of implementing functionality to handle symbols and labels, which requires another hash table.</summary></entry><entry><title type="html">RC41. Building a DNS Resolver in C, Part 3: Implementing a toy version of inet_pton()</title><link href="https://www.datadoodad.com/recurse%20center/RC41/" rel="alternate" type="text/html" title="RC41. Building a DNS Resolver in C, Part 3: Implementing a toy version of inet_pton()" /><published>2023-07-11T00:00:00-07:00</published><updated>2023-07-11T00:00:00-07:00</updated><id>https://www.datadoodad.com/recurse%20center/RC41</id><content type="html" xml:base="https://www.datadoodad.com/recurse%20center/RC41/">&lt;p&gt;Last time on &lt;a href=&quot;https://www.datadoodad.com/recurse%20center/RC39-5/&quot;&gt;stumbling through building a DNS resolver in C&lt;/a&gt;, I messed around with the nifty function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_pton()&lt;/code&gt;, which takes IPv4 and IPv6 addresses and converts them from &lt;strong&gt;P&lt;/strong&gt;resentation to &lt;strong&gt;N&lt;/strong&gt;etwork format, which is to say from a human-friendly format like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;31.13.70.36&lt;/code&gt; to a machine-friendly 32-bit integer like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;520963620&lt;/code&gt;. Along the way we discovered the hazards of endian-ness and learned our lesson: never, ever don’t convert numbers from host-to-network byte order and back!&lt;/p&gt;

&lt;p&gt;That goes for port numbers, too, by the way. Want to make a DNS query on port 53? Don’t forget to first convert it with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;htons(53)&lt;/code&gt;! Otherwise, if you’re on an Intel machine, you’ll end up querying port 13563. Why? 53 in hex is 0x35, so you’d think a two-byte integer would render that value as 0x0035. And you’d be right if you’re thinking the way networks think! But Intel machines are eccentric and little-endian, and they would store that integer as 0x3500 (least significant byte first), which would be interpreted as 13563 by big-endian machines and networks. So, do your conversions, folks!&lt;/p&gt;

&lt;p&gt;One of the things I did since then, in any case, is flesh out a toy version of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_pton()&lt;/code&gt;, which does the full conversion from an IPv4 address to a 32-bit integer in a REPL-style interface.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/RC41_ipv4_to_i.gif&quot; alt=&quot;IPv4 to int&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The little program isn’t that complicated, but, well, it’s in C, so the whole thing felt like a bunch of LeetCode problems. (That’s a compliment.)&lt;/p&gt;

&lt;p&gt;Here’s the full code:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdint.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;
&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;chunk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;chunk_as_i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;chunks_count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;addr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;chunks_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EOF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;

        &lt;span class=&quot;cm&quot;&gt;/* convert chunk to int if we reach a `.` or `\n` */&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'.'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\n'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;chunk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;            &lt;span class=&quot;c1&quot;&gt;// append null to chunk&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;addr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;                 &lt;span class=&quot;c1&quot;&gt;// shift addr 8 bits left&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;chunk_as_i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;atoi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;chunk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;// convert current chunk to int&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;chunk_as_i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;255&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;     &lt;span class=&quot;c1&quot;&gt;// exit on invalid chunk&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Invalid IPv4 address component: %s&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;chunk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;addr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;atoi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;chunk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;        &lt;span class=&quot;c1&quot;&gt;// add chunk to addr&lt;/span&gt;

            &lt;span class=&quot;c1&quot;&gt;// reset chunk&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;--&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;chunk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

            &lt;span class=&quot;c1&quot;&gt;// inc chunks_count&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;chunks_count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

            &lt;span class=&quot;c1&quot;&gt;// output addr and reset on newline&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\n'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;chunks_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;%u&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Unexpected IPv4 address format &quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
                    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;chunks_count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                        &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;x.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;x&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
                    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;chunks_count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;addr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

        &lt;span class=&quot;cm&quot;&gt;/* else append char to chunk if numeric */&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'0'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'9'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;){&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;chunk&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There’s plenty of room for refactoring here, and probably error-detection and handling could be better, and, well, probably a lot could be better. But there are a few things that I think are especially neat that I’m psyched about.&lt;/p&gt;

&lt;p&gt;The first is the decision not to read in an entire line at a time (i.e., up until a newline is detected), but to read in what I’m calling here a chunk at a time, meaning the parts of the IPv4 address between dots. So I’ve got &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;chunk&lt;/code&gt;, a little array of chars of length 4 (the extra space being for a newline character so that I can easily treat the variable like a string), and as I traverse the input stream, I add the numeric characters I encounter to consecutive indexes. When I encounter a dot or a newline character, though, it’s time to rock and roll.&lt;/p&gt;

&lt;p&gt;If it’s a dot, then that means we have a chunk to deal with, which leads us to my second innovation (which actaully I borrowed from a helpful recurser who was perplexed by my previous decision to work with bitstrings representing this 32-bit integer): the bitwise approach, of course! We shift &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;addr&lt;/code&gt;, our output variable, 8 bits left and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OR&lt;/code&gt; it with the integer value of the current chunk (once we ensure that it’s a valid one-byte value, that is). Then we reset &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;chunk&lt;/code&gt; and our incrementer in one fell swoop, and continue along our input stream.&lt;/p&gt;

&lt;p&gt;That just leaves the output. If we encounter a newline character, that means (in theory) that we’re finished and it’s time to print out &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;addr&lt;/code&gt;. Once we make sure that there are, indeed, 4 chunks as expected, then, yeah, let ‘er rip! Otherwise, something ain’t right. ABORT PROGRAM.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/RC41_hal.gif&quot; alt=&quot;Bye HAL&quot; width=&quot;100%&quot; /&gt;&lt;/p&gt;

&lt;p&gt;And that, ladies and gentlemen, is what reinventing the wheel, in a much worse and more brittle way, looks like.&lt;/p&gt;</content><author><name>Zach Rottman, PhD</name></author><category term="Recurse Center" /><category term="C" /><category term="DNS" /><summary type="html">Last time on stumbling through building a DNS resolver in C, I messed around with the nifty function inet_pton(), which takes IPv4 and IPv6 addresses and converts them from Presentation to Network format, which is to say from a human-friendly format like 31.13.70.36 to a machine-friendly 32-bit integer like 520963620. Along the way we discovered the hazards of endian-ness and learned our lesson: never, ever don’t convert numbers from host-to-network byte order and back! That goes for port numbers, too, by the way. Want to make a DNS query on port 53? Don’t forget to first convert it with htons(53)! Otherwise, if you’re on an Intel machine, you’ll end up querying port 13563. Why? 53 in hex is 0x35, so you’d think a two-byte integer would render that value as 0x0035. And you’d be right if you’re thinking the way networks think! But Intel machines are eccentric and little-endian, and they would store that integer as 0x3500 (least significant byte first), which would be interpreted as 13563 by big-endian machines and networks. So, do your conversions, folks! One of the things I did since then, in any case, is flesh out a toy version of inet_pton(), which does the full conversion from an IPv4 address to a 32-bit integer in a REPL-style interface. The little program isn’t that complicated, but, well, it’s in C, so the whole thing felt like a bunch of LeetCode problems. (That’s a compliment.) Here’s the full code: #include &amp;lt;stdio.h&amp;gt; #include &amp;lt;stdint.h&amp;gt; #include &amp;lt;stdlib.h&amp;gt; int main() { char c, chunk[4]; int i, chunk_as_i, chunks_count; uint32_t addr; i = 0; addr = 0; chunks_count = 0; while ((c = getchar()) != EOF) { /* convert chunk to int if we reach a `.` or `\n` */ if (c == '.' || c == '\n') { chunk[i] = '\0'; // append null to chunk addr &amp;lt;&amp;lt;= 8; // shift addr 8 bits left chunk_as_i = atoi(chunk); // convert current chunk to int if (chunk_as_i &amp;gt; 255) { // exit on invalid chunk printf(&quot;Invalid IPv4 address component: %s\n&quot;, chunk); return 1; } addr |= atoi(chunk); // add chunk to addr // reset chunk for (; i&amp;gt;0; --i) { chunk[i] = '\0'; } // inc chunks_count chunks_count++; // output addr and reset on newline if (c == '\n') { if (chunks_count == 4) { printf(&quot;%u\n&quot;, addr); } else { printf(&quot;Unexpected IPv4 address format &quot;); for (int j=1; j&amp;lt;chunks_count; ++j) printf(&quot;x.&quot;); printf(&quot;x\n&quot;); return 1; } chunks_count = 0; addr = 0; } /* else append char to chunk if numeric */ } else if (i &amp;lt; 3 &amp;amp;&amp;amp; c &amp;gt;= '0' &amp;amp;&amp;amp; c &amp;lt;= '9'){ chunk[i++] = c; } } return 0; } There’s plenty of room for refactoring here, and probably error-detection and handling could be better, and, well, probably a lot could be better. But there are a few things that I think are especially neat that I’m psyched about. The first is the decision not to read in an entire line at a time (i.e., up until a newline is detected), but to read in what I’m calling here a chunk at a time, meaning the parts of the IPv4 address between dots. So I’ve got chunk, a little array of chars of length 4 (the extra space being for a newline character so that I can easily treat the variable like a string), and as I traverse the input stream, I add the numeric characters I encounter to consecutive indexes. When I encounter a dot or a newline character, though, it’s time to rock and roll. If it’s a dot, then that means we have a chunk to deal with, which leads us to my second innovation (which actaully I borrowed from a helpful recurser who was perplexed by my previous decision to work with bitstrings representing this 32-bit integer): the bitwise approach, of course! We shift addr, our output variable, 8 bits left and OR it with the integer value of the current chunk (once we ensure that it’s a valid one-byte value, that is). Then we reset chunk and our incrementer in one fell swoop, and continue along our input stream. That just leaves the output. If we encounter a newline character, that means (in theory) that we’re finished and it’s time to print out addr. Once we make sure that there are, indeed, 4 chunks as expected, then, yeah, let ‘er rip! Otherwise, something ain’t right. ABORT PROGRAM. And that, ladies and gentlemen, is what reinventing the wheel, in a much worse and more brittle way, looks like.</summary></entry><entry><title type="html">RC39ish. Building a DNS Resolver in C, Part 2: Fun with inet_pton()</title><link href="https://www.datadoodad.com/recurse%20center/RC39-5/" rel="alternate" type="text/html" title="RC39ish. Building a DNS Resolver in C, Part 2: Fun with inet_pton()" /><published>2023-07-08T00:00:00-07:00</published><updated>2023-07-08T00:00:00-07:00</updated><id>https://www.datadoodad.com/recurse%20center/RC39-5</id><content type="html" xml:base="https://www.datadoodad.com/recurse%20center/RC39-5/">&lt;p&gt;In the &lt;a href=&quot;https://www.datadoodad.com/recurse%20center/RC38/&quot;&gt;last installment of Building a DNS Resolver in C&lt;/a&gt;, I talked about writing a component of a conversion function that will eventually translate a valid IPv4 address (e.g., one of those 31.13.70.36 things) to a single 32-bit integer – the reason being that this 32-bit integer format is the format that we need to send out when we make our DNS query using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sendto()&lt;/code&gt;. It was only a component, however, since the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;btoi()&lt;/code&gt; function we wrote just does the part where it takes a 32-character string of 0s and 1s and converts it to its 32-bit unsigned integer equivalent. So its usage would look something like this:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Bitstring %s (length %d) -&amp;gt; %u&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;btoi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Bistring 00011111000011010100011000100100 (length 32) -&amp;gt; 520963620
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To perform the entire conversion starting with an IPv4 address, there are still a few things that need to happen, namely parsing the IP address, converting each one-byte number to an 8-bit bytestring, and concatenating the result (which is what finally gets passed to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;btoi()&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;I still intend to do this, but I wanted to explore C’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_pton()&lt;/code&gt; function (part of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;arpa/inet.h&lt;/code&gt; library), which basically just does this all for you. Here’s a few things I learned from playing with it.&lt;/p&gt;

&lt;h1 id=&quot;enter-inet_pton&quot;&gt;Enter: inet_pton()&lt;/h1&gt;

&lt;p&gt;P-to-N means presentation-to-network. I.e., human readable to computer readable. I.e., a human friendly IPv4 address like 31.13.70.36 to some big, inscrutable number like 520963620.&lt;/p&gt;

&lt;p&gt;The function signature looks like this:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;inet_pton&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;af&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;kr&quot;&gt;restrict&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;kr&quot;&gt;restrict&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Parameters:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;af&lt;/code&gt;: address family – either AF_INET for IPv4 addresses or AF_INET6 for IPv6 addresses. I’ll be keeping it simple here with IPv4 addresses&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;src&lt;/code&gt;: the input string (something like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;31.13.70.36&lt;/code&gt; for an IPv4 address)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dst&lt;/code&gt;: the destination address where the 32-bit integer conversion of the thing in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;src&lt;/code&gt; will go. That means that this destination address must be 4 bytes in size.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Returns:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;1 if successful, else 0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is my understanding that most of the time when doing networking things in C one will avail oneself of some handy structs like these:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sockaddr_in&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;short&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;          &lt;span class=&quot;n&quot;&gt;sin_family&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;short&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sin_port&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in_addr&lt;/span&gt;     &lt;span class=&quot;n&quot;&gt;sin_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;sin_zero&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in_addr&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sockaddr_in&lt;/code&gt; stores the address family (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AF_INET&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AF_NET6&lt;/code&gt; – type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int&lt;/code&gt;), the port we want to use (also type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int&lt;/code&gt;), and the address we want to do stuff with. This last part is type &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;struct in_addr&lt;/code&gt;, which we can see has a single 32-integer member called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s_addr&lt;/code&gt;. This is what interests us here, since, normally when converting an IPv4 address to 32-bit integer using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_pton()&lt;/code&gt;, you’d store the result in the 32-bit home designated by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sockaddr_in.sin_addr&lt;/code&gt;. Something like this:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sockaddr_in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_sock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;inet_pton&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AF_INET&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;8.8.8.8&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_sock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sin_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;tinkering-with-inet_pton&quot;&gt;Tinkering with inet_pton()&lt;/h1&gt;

&lt;p&gt;I’m trying to keep things simple, so I’m just sending the result of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_pton()&lt;/code&gt; to a regular old 32-bit int, like so:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;inet_pton&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AF_INET&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;8.8.8.8&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Same thing, just without the extra struct-bulk.&lt;/p&gt;

&lt;p&gt;So that’s it, right?? Let’s go!&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdint.h&amp;gt;
#include &amp;lt;sys/socket.h&amp;gt;
#include &amp;lt;arpa/inet.h&amp;gt;
&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 4-bytes to hold address&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;// Buffer to hold max IPv4 addr plus null string terminator&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EOF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\n'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inet_pton&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AF_INET&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;--&amp;gt; %u&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;--&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;8.8.8.8
--&amp;gt; 134744072

31.13.70.36
--&amp;gt; 608570655

172.217.14.78
--&amp;gt; 1309596076
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This little REPL has a 16-character buffer (that the maximum length of an IPv4 address plus a null terminator), to which it adds characters building the address string. When it sees a linebreak, it passes this &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;buf&lt;/code&gt; string to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_pton()&lt;/code&gt; and lets it rip! Looks good, right? All finished, right??&lt;/p&gt;

&lt;h1 id=&quot;well----no&quot;&gt;Well. . .  no.&lt;/h1&gt;

&lt;p&gt;Meanwhile, I thought it’d be fun to just whip up a quick Python script that effectively does what &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_pton()&lt;/code&gt; does:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;inet_pton&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# split `ip` at dot, convert each int to 8-bit bytestring, and concatenate
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;0b&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;{:08b}&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int8&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'.'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)])&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# convert 32-bit bytestring to int
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;int32&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int32&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;display_pton&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;8.8.8.8&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;'''
    Pretty-print results of inet_pton()
    '''&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int32&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inet_pton&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;{:&amp;lt;15} -&amp;gt; {} ({})&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That first part of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_pton()&lt;/code&gt; is not super readable, but basically it is:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;splitting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ip&lt;/code&gt; argument at the dots (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1.23.4.56&lt;/code&gt; -&amp;gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;['1', '23', '4', '56']&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;using a list comprehension to reformat each element as an 8-bit string (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;['00000001', '00010111', '00000100', '00111000']&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;joining the result and concatenating to “0b” (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;'0b00000001000101110000010000111000&lt;/code&gt;) – this is the first thing this function will return&lt;/li&gt;
  &lt;li&gt;and turning that thing into an integer (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;18285624&lt;/code&gt;) – the second thing this function will return&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So you can do something like&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;display_pton&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;8.8.8.8&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;display_pton&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;31.13.70.36&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;display_pton&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;172.217.14.78&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and see a result like&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;8.8.8.8         -&amp;gt; 0b00001000000010000000100000001000 (134744072)
31.13.70.36     -&amp;gt; 0b00011111000011010100011000100100 (520963620)
172.217.14.78   -&amp;gt; 0b10101100110110010000111001001110 (2899906126)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Hey, wait a second. . . the result for 8.8.8.8 is the same as the C version of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_pton()&lt;/code&gt;, but the other two IPs are yielding different results. What gives?&lt;/p&gt;

&lt;h1 id=&quot;network-byte-order-duh&quot;&gt;Network Byte Order, duh&lt;/h1&gt;

&lt;figure&gt;
&lt;img src=&quot;/assets/images/RC39-5_eggs.png&quot; alt=&quot;eggs&quot; /&gt;
&lt;figcaption align=&quot;center&quot;&gt;Big-endian and . . . middle-endian? Get it together, DALL-E&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;This one really threw me off for a bit. I thought for certain I must have some bug or maybe there was some garbage leftover at the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s_addr&lt;/code&gt; address in my C code that was messing up the calculation or &lt;em&gt;something&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Eventually I thought to convert the integer results of both my Python and C versions back to binary:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Python result: {} -&amp;gt; {}&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;520963620&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;520963620&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;C result:      {} -&amp;gt; {}&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;608570655&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;bin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;608570655&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Python result: 520963620 -&amp;gt; 0b11111000011010100011000100100
C result:      608570655 -&amp;gt; 0b100100010001100000110100011111
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After staring at these for a while, I realized that, once you account for a few missing leading zeroes, the same bytes are occuring in each, but just in opposite orders!&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;python version: 00011111 00001101 01000110 00100100
                    A        B        C        D

c version:      00100100 01000110 00001101 00011111
                    D        C        B        A
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Okay, interesting! Suddenly I’m remembering seeing, and skipping over, stuff about byte order, which has to be the issue here. Only problem is, who’s in the right? C or Python?&lt;/p&gt;

&lt;p&gt;I thought my Python script had to be the culprit since it was the one I was responsible for. I figured that I must have put the bytes in the wrong order is all. They have to go in Network Byte Order! Which is (some googling revealed) Most Significant Byte first. Obviously!&lt;/p&gt;

&lt;p&gt;But that’s . . . what I did, wasn’t it? In the example above (31.13.70.36), the first element (31) became the first – which is to say left-most, which is to say most significant – byte in my 32-bit string, right? So that means &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_ptoi()&lt;/code&gt; has it twisted.&lt;/p&gt;

&lt;p&gt;After some further perusing of &lt;a href=&quot;https://beej.us/guide/bgnet/html/split/man-pages.html#structsockaddrman&quot;&gt;Beej’s guide&lt;/a&gt;, I discovered the problem. Networks like stuff in, you guessed it, Network Byte Order. That’s big-endian style. But computers are divided on the subject, apparently. Intel machines like lil’ endian, and non-intel machines like big-endian, and, well, I’m on an intel Mac. So it seems that what’s happening here is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_pton()&lt;/code&gt; is doing its job of taking an IPv4 address and converting it to a 4-byte integer; it’s just that my computer is storing those four bytes in its own, slightly eccentric way, which is not the way that networks do it.&lt;/p&gt;

&lt;p&gt;However, there’s a handy suite of functions that will happily do this conversion back and forth:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;htons()&lt;/code&gt; - host to network short&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;htonl()&lt;/code&gt; - host to network long&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ntohs()&lt;/code&gt; - network to host short&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ntohl()&lt;/code&gt; - network to host long&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We need to go from host to network, and we want the long here, since we’re working with 4-byte integers. Here’s the revised code:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdint.h&amp;gt;
#include &amp;lt;sys/socket.h&amp;gt;
#include &amp;lt;arpa/inet.h&amp;gt;
&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// 4-bytes to hold address&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;// Buffer to hold max IPv4 addr plus null string terminator&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EOF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\n'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inet_pton&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AF_INET&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;--&amp;gt; Host Byte Order:    %u&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;--&amp;gt; Network Byte Order: %u&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;htonl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;--&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;buf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And here’s a sample REPL session so we can see it working:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;8.8.8.8
--&amp;gt; Host Byte Order:    134744072
--&amp;gt; Network Byte Order: 134744072

31.13.70.36
--&amp;gt; Host Byte Order:    608570655
--&amp;gt; Network Byte Order: 520963620

172.217.14.78
--&amp;gt; Host Byte Order:    1309596076
--&amp;gt; Network Byte Order: 2899906126
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Of course the reason “8.8.8.8” had been working before was that it’s palindromic! So, big-endian, little-endian – it’s all the same. But for the rest, we can now clearly see the discrepancy.&lt;/p&gt;</content><author><name>Zach Rottman, PhD</name></author><category term="Recurse Center" /><category term="C" /><category term="DNS" /><category term="python" /><summary type="html">In the last installment of Building a DNS Resolver in C, I talked about writing a component of a conversion function that will eventually translate a valid IPv4 address (e.g., one of those 31.13.70.36 things) to a single 32-bit integer – the reason being that this 32-bit integer format is the format that we need to send out when we make our DNS query using sendto(). It was only a component, however, since the btoi() function we wrote just does the part where it takes a 32-character string of 0s and 1s and converts it to its 32-bit unsigned integer equivalent. So its usage would look something like this: printf(&quot;Bitstring %s (length %d) -&amp;gt; %u\n&quot;, binstr, i, btoi(binstr, i)); Bistring 00011111000011010100011000100100 (length 32) -&amp;gt; 520963620 To perform the entire conversion starting with an IPv4 address, there are still a few things that need to happen, namely parsing the IP address, converting each one-byte number to an 8-bit bytestring, and concatenating the result (which is what finally gets passed to btoi()). I still intend to do this, but I wanted to explore C’s inet_pton() function (part of the arpa/inet.h library), which basically just does this all for you. Here’s a few things I learned from playing with it. Enter: inet_pton() P-to-N means presentation-to-network. I.e., human readable to computer readable. I.e., a human friendly IPv4 address like 31.13.70.36 to some big, inscrutable number like 520963620. The function signature looks like this: int inet_pton(int af, const char *restrict src, void *restrict dst); Parameters: af: address family – either AF_INET for IPv4 addresses or AF_INET6 for IPv6 addresses. I’ll be keeping it simple here with IPv4 addresses src: the input string (something like 31.13.70.36 for an IPv4 address) dst: the destination address where the 32-bit integer conversion of the thing in src will go. That means that this destination address must be 4 bytes in size. Returns: 1 if successful, else 0 It is my understanding that most of the time when doing networking things in C one will avail oneself of some handy structs like these: struct sockaddr_in { short int sin_family; unsigned short int sin_port; struct in_addr sin_addr; unsigned char sin_zero[8]; }; struct in_addr { uint32_t s_addr; }; sockaddr_in stores the address family (AF_INET or AF_NET6 – type int), the port we want to use (also type int), and the address we want to do stuff with. This last part is type struct in_addr, which we can see has a single 32-integer member called s_addr. This is what interests us here, since, normally when converting an IPv4 address to 32-bit integer using inet_pton(), you’d store the result in the 32-bit home designated by sockaddr_in.sin_addr. Something like this: struct sockaddr_in my_sock; inet_pton(AF_INET, &quot;8.8.8.8&quot;, &amp;amp;(my_sock.sin_addr)); Tinkering with inet_pton() I’m trying to keep things simple, so I’m just sending the result of inet_pton() to a regular old 32-bit int, like so: uint32_t s_addr; inet_pton(AF_INET, &quot;8.8.8.8&quot;, &amp;amp;s_addr); Same thing, just without the extra struct-bulk. So that’s it, right?? Let’s go! #include &amp;lt;stdio.h&amp;gt; #include &amp;lt;stdint.h&amp;gt; #include &amp;lt;sys/socket.h&amp;gt; #include &amp;lt;arpa/inet.h&amp;gt; int main() { uint32_t s_addr; // 4-bytes to hold address char buf[16]; // Buffer to hold max IPv4 addr plus null string terminator char c; int i = 0; while ((c = getchar()) != EOF) { if (c == '\n') { buf[i] = '\0'; if (inet_pton(AF_INET, buf, &amp;amp;s_addr) == 1) { printf(&quot;--&amp;gt; %u\n&quot;, s_addr); } s_addr = 0; for (; i&amp;gt;0; --i) buf[i] = '\0'; } else { if (i &amp;lt; 15) buf[i++] = c; } } return 0; } 8.8.8.8 --&amp;gt; 134744072 31.13.70.36 --&amp;gt; 608570655 172.217.14.78 --&amp;gt; 1309596076 This little REPL has a 16-character buffer (that the maximum length of an IPv4 address plus a null terminator), to which it adds characters building the address string. When it sees a linebreak, it passes this buf string to inet_pton() and lets it rip! Looks good, right? All finished, right?? Well. . . no. Meanwhile, I thought it’d be fun to just whip up a quick Python script that effectively does what inet_pton() does: def inet_pton(ip): # split `ip` at dot, convert each int to 8-bit bytestring, and concatenate binstr = &quot;0b&quot; + &quot;&quot;.join([&quot;{:08b}&quot;.format(int(int8)) for int8 in ip.split('.')]) # convert 32-bit bytestring to int int32 = int(binstr, 2) return binstr, int32 def display_pton(ip=&quot;8.8.8.8&quot;): ''' Pretty-print results of inet_pton() ''' binstr, int32 = inet_pton(ip) print(&quot;{:&amp;lt;15} -&amp;gt; {} ({})&quot;.format(ip, binstr, int32)) That first part of inet_pton() is not super readable, but basically it is: splitting the ip argument at the dots (1.23.4.56 -&amp;gt; ['1', '23', '4', '56'] using a list comprehension to reformat each element as an 8-bit string (['00000001', '00010111', '00000100', '00111000']) joining the result and concatenating to “0b” ('0b00000001000101110000010000111000) – this is the first thing this function will return and turning that thing into an integer (18285624) – the second thing this function will return So you can do something like display_pton(&quot;8.8.8.8&quot;) display_pton(&quot;31.13.70.36&quot;) display_pton(&quot;172.217.14.78&quot;) and see a result like 8.8.8.8 -&amp;gt; 0b00001000000010000000100000001000 (134744072) 31.13.70.36 -&amp;gt; 0b00011111000011010100011000100100 (520963620) 172.217.14.78 -&amp;gt; 0b10101100110110010000111001001110 (2899906126) Hey, wait a second. . . the result for 8.8.8.8 is the same as the C version of inet_pton(), but the other two IPs are yielding different results. What gives? Network Byte Order, duh Big-endian and . . . middle-endian? Get it together, DALL-E This one really threw me off for a bit. I thought for certain I must have some bug or maybe there was some garbage leftover at the s_addr address in my C code that was messing up the calculation or something. Eventually I thought to convert the integer results of both my Python and C versions back to binary: print(&quot;Python result: {} -&amp;gt; {}&quot;.format(520963620, bin(520963620))) print(&quot;C result: {} -&amp;gt; {}&quot;.format(608570655, bin(608570655))) Python result: 520963620 -&amp;gt; 0b11111000011010100011000100100 C result: 608570655 -&amp;gt; 0b100100010001100000110100011111 After staring at these for a while, I realized that, once you account for a few missing leading zeroes, the same bytes are occuring in each, but just in opposite orders! python version: 00011111 00001101 01000110 00100100 A B C D c version: 00100100 01000110 00001101 00011111 D C B A Okay, interesting! Suddenly I’m remembering seeing, and skipping over, stuff about byte order, which has to be the issue here. Only problem is, who’s in the right? C or Python? I thought my Python script had to be the culprit since it was the one I was responsible for. I figured that I must have put the bytes in the wrong order is all. They have to go in Network Byte Order! Which is (some googling revealed) Most Significant Byte first. Obviously! But that’s . . . what I did, wasn’t it? In the example above (31.13.70.36), the first element (31) became the first – which is to say left-most, which is to say most significant – byte in my 32-bit string, right? So that means inet_ptoi() has it twisted. After some further perusing of Beej’s guide, I discovered the problem. Networks like stuff in, you guessed it, Network Byte Order. That’s big-endian style. But computers are divided on the subject, apparently. Intel machines like lil’ endian, and non-intel machines like big-endian, and, well, I’m on an intel Mac. So it seems that what’s happening here is inet_pton() is doing its job of taking an IPv4 address and converting it to a 4-byte integer; it’s just that my computer is storing those four bytes in its own, slightly eccentric way, which is not the way that networks do it. However, there’s a handy suite of functions that will happily do this conversion back and forth: htons() - host to network short htonl() - host to network long ntohs() - network to host short ntohl() - network to host long We need to go from host to network, and we want the long here, since we’re working with 4-byte integers. Here’s the revised code: #include &amp;lt;stdio.h&amp;gt; #include &amp;lt;stdint.h&amp;gt; #include &amp;lt;sys/socket.h&amp;gt; #include &amp;lt;arpa/inet.h&amp;gt; int main() { uint32_t s_addr; // 4-bytes to hold address char buf[16]; // Buffer to hold max IPv4 addr plus null string terminator char c; int i = 0; while ((c = getchar()) != EOF) { if (c == '\n') { buf[i] = '\0'; if (inet_pton(AF_INET, buf, &amp;amp;s_addr) == 1) { printf(&quot;--&amp;gt; Host Byte Order: %u\n&quot;, s_addr); printf(&quot;--&amp;gt; Network Byte Order: %u\n\n&quot;, htonl(s_addr)); } s_addr = 0; for (; i&amp;gt;0; --i) buf[i] = '\0'; } else { if (i &amp;lt; 15) buf[i++] = c; } } return 0; } And here’s a sample REPL session so we can see it working: 8.8.8.8 --&amp;gt; Host Byte Order: 134744072 --&amp;gt; Network Byte Order: 134744072 31.13.70.36 --&amp;gt; Host Byte Order: 608570655 --&amp;gt; Network Byte Order: 520963620 172.217.14.78 --&amp;gt; Host Byte Order: 1309596076 --&amp;gt; Network Byte Order: 2899906126 Of course the reason “8.8.8.8” had been working before was that it’s palindromic! So, big-endian, little-endian – it’s all the same. But for the rest, we can now clearly see the discrepancy.</summary></entry><entry><title type="html">RC38. Building a DNS Resolver in C, Part 1 (of many)</title><link href="https://www.datadoodad.com/recurse%20center/RC38/" rel="alternate" type="text/html" title="RC38. Building a DNS Resolver in C, Part 1 (of many)" /><published>2023-07-06T00:00:00-07:00</published><updated>2023-07-06T00:00:00-07:00</updated><id>https://www.datadoodad.com/recurse%20center/RC38</id><content type="html" xml:base="https://www.datadoodad.com/recurse%20center/RC38/">&lt;p&gt;A few weeks back, a few recurser friends and I implemented a DNS resolver in Python using &lt;a href=&quot;https://implement-dns.wizardzines.com/&quot;&gt;this excellent guide&lt;/a&gt; put together by a legendary RC alum. Knowing nothing about networking or how the internet works, I found this to be a wonderful and approachable introduction to the topic.&lt;/p&gt;

&lt;p&gt;My appetite thusly whet (whetted?), I figured, hey, since I’ve also been focusing on C these past few months, why not try to re-implement this DNS resolver thing in C ? How hard could it be?&lt;/p&gt;

&lt;p&gt;My initial thought was to take the approach of line-by-line translation, so I began dutifully building structs to handle the DNS header and DNS question as I had in the Python version. In that case, the DNS header class consisted of 6 integer attributes, and the DNS question class had 2 integer attributes and a string (the domain we want to resolve). The main thing that needed to happen with the values contained by these objects was to translate them into one long hex sequence, with each integer attribute represented as a two-byte hex value.&lt;/p&gt;

&lt;p&gt;I was about to start implementing an integer-to-hex function so that I could encode my header and question structs to hexstrings, but then I stopped, since I realized that I’d better be damn sure that whatever C libraries I’ll be using are expecting the same information in the same format. My understanding is that these protocols are strictly defined, so I’m not expecting something wildly different. All the same, I do want to be sure that the C socket API wants, for instance, this kind of hexstring.  That’s when I realized this translation would not be so straightforward after all. That’s also when I began consulting &lt;a href=&quot;https://beej.us/guide/bgnet/&quot;&gt;Beej’s Guide to Network Programming&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Along with a few friends also interested in this project, I made several small-step-but-giant-leaps forward today with Beej’s guide on our side. Our rough plan for a DNS resolver that attempts to circumvent higher-level abstractions like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getaddreinfo()&lt;/code&gt; looks like this (which basically mirrors the Python tutorial that is our model):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;open a socket with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;socket()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;connect to it with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;connect()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;send stuff with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sendto()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;get stuff with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;recv()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;examine it and decide what to do next&lt;/li&gt;
  &lt;li&gt;close socket with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;close()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We sorted out how &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;socket()&lt;/code&gt; works. But our next obstacle came when we started thinking through how to send stuff with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sendto()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In the Python tutorial, the high level process looks like this:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;socket&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;build_query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;www.example.com&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;sock&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;socket&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;socket&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;socket&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AF_INET&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;socket&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SOCK_DGRAM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;sock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sendto&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;8.8.8.8&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;53&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sock&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;recvfrom&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For the moment let’s ignore the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;build_query()&lt;/code&gt; part, which I’m anticipating will be the bulk of the work. Instead I’m focused on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sock.sendto()&lt;/code&gt; line, in particular the part where we specify that we’d like to send our query to Google’s DNS server (“8.8.8.8”) on port 53.&lt;/p&gt;

&lt;p&gt;When we use C’s version of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sendto()&lt;/code&gt;, we will be passing to it our socket descriptor (returned when we open our socket with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;socket()&lt;/code&gt;) a buffer string (the content we’re sending – presumably something like the header and question combo alluded to above), the length of our query in bytes, and then a pointer to a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sockaddr&lt;/code&gt; struct (or its equivalent &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sockaddr_in&lt;/code&gt;) and its size. This last part, we realized, is where we are specifying where and how we want to send our stuff – in this case, to “8.8.8.8” on port 53. Now we’re getting somewhere! And in that sense, this &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sockaddr_in&lt;/code&gt; struct seems key to unlocking this small part of the puzzle.&lt;/p&gt;

&lt;p&gt;Here’s what the struct looks like:&lt;/p&gt;
&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sockaddr_in&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;short&lt;/span&gt;            &lt;span class=&quot;n&quot;&gt;sin_family&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;   
    &lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;short&lt;/span&gt;   &lt;span class=&quot;n&quot;&gt;sin_port&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;    
    &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in_addr&lt;/span&gt;   &lt;span class=&quot;n&quot;&gt;sin_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;   
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;             &lt;span class=&quot;n&quot;&gt;sin_zero&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in_addr&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s_addr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;          
&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sockaddr_in&lt;/code&gt; structure holds an address family (in our case &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AF_INET&lt;/code&gt;), a port number, and an IPv4 address, AKA one of those things that looks like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;142.250.176.14&lt;/code&gt;, which is four one-byte integers strung together (i.e., each of the four component parts can range from 0-255). But what’s interesting is that this &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sin_addr&lt;/code&gt; component is not some string of four integers or something; it’s a single unsigned long, which I believe is either a 4-byte or 8-byte integer. Either way it’s a big-ass number. &lt;em&gt;That&lt;/em&gt; means we need to translate our address’s component parts to one-byte bytestrings, concatenate them together into a single 4-byte string, and then interpret that as a single integer.&lt;/p&gt;

&lt;p&gt;For example, if we replace each one-byte integer component of&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;142.250.176.14
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;with binary, we’d get&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;10001110 11111010 10110000 00001110
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and if we concatenate those together into a single 4-byte binary value:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;10001110111110101011000000001110
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;which corresponds to the integer&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;2,398,793,742
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, I believe there is a convenient way to tranlsate a human-readable IPv4 address to an integer and vice-versa with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;inet_pton()&lt;/code&gt;, and we very well may avail ourselves of this convenience. But why do that when we could just . . . implement it ourselves?? In which case, ultimately what we’re going to need to do is take an IPv4 address (“8.8.8.8”), parse it into four bytestrings, concatenate the result into a single 4-byte bytestring, and return it as an integer. &lt;em&gt;Then&lt;/em&gt; we’ll have that component part of our &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sockaddr_in&lt;/code&gt; struct and, in turn, a component of what we’ll need to pass along to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sendto()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And with that, I present to you a fun REPL implementation of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;btoi()&lt;/code&gt;, our friendly bitstring-to-integer parser which will handle that last step of translating a 4-byte bytestring to an unsigned, 32-bit integer.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdint.h&amp;gt;
&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;/* convert bitstring[32] to 32-bit uint */&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;btoi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt;      &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;uint32_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'1'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EOF&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\n'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;c1&quot;&gt;// add `c` to `binstr` and increment `i` while `i` &amp;lt; 32 &lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;c1&quot;&gt;// print result&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;-&amp;gt; %u&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;btoi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
            &lt;span class=&quot;c1&quot;&gt;// reset all `binstr` elements to null&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;--&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;binstr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'\0'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You know what? I’m proud of this little function.&lt;/p&gt;

&lt;p&gt;First because it handles strings less than and greater than 32 characters in length, no problem. Any characters in excess of 32 just never get added to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;binstr&lt;/code&gt; array in the first place. And for strings less than 32 characters in length, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;len&lt;/code&gt; parameter of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;btoi()&lt;/code&gt; ensures that the most significant bit is at place &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;len&lt;/code&gt; (not at place 32!).&lt;/p&gt;

&lt;p&gt;And second because I landed on what feels like a clever solution to resetting all the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;binstr&lt;/code&gt; elements to null at the same time as I reset counter &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt;, which I accomplish by decrementing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt; back to 0 and setting &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;binstr[i]&lt;/code&gt; to the null string at each step. That way, I ensure there’s no garbage left over in any of those memory locations.&lt;/p&gt;

&lt;p&gt;Here it is in action:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; 10011
19
&amp;gt; 001
1
&amp;gt; 100
4
&amp;gt; 100110110100101010110101011110101001011011
2605364602
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The next step is to do the first part of the parsing operation, namely walking through the IPv4 address, extracting each one-byte integer, translating it to a bytestring, and concatenating the results to pass to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;btoi()&lt;/code&gt;. And at that point we will be well on our way to building a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sockaddr_in&lt;/code&gt; struct, which means we’ll be one step closer to making a connection.&lt;/p&gt;</content><author><name>Zach Rottman, PhD</name></author><category term="Recurse Center" /><category term="C" /><category term="DNS" /><summary type="html">A few weeks back, a few recurser friends and I implemented a DNS resolver in Python using this excellent guide put together by a legendary RC alum. Knowing nothing about networking or how the internet works, I found this to be a wonderful and approachable introduction to the topic. My appetite thusly whet (whetted?), I figured, hey, since I’ve also been focusing on C these past few months, why not try to re-implement this DNS resolver thing in C ? How hard could it be? My initial thought was to take the approach of line-by-line translation, so I began dutifully building structs to handle the DNS header and DNS question as I had in the Python version. In that case, the DNS header class consisted of 6 integer attributes, and the DNS question class had 2 integer attributes and a string (the domain we want to resolve). The main thing that needed to happen with the values contained by these objects was to translate them into one long hex sequence, with each integer attribute represented as a two-byte hex value. I was about to start implementing an integer-to-hex function so that I could encode my header and question structs to hexstrings, but then I stopped, since I realized that I’d better be damn sure that whatever C libraries I’ll be using are expecting the same information in the same format. My understanding is that these protocols are strictly defined, so I’m not expecting something wildly different. All the same, I do want to be sure that the C socket API wants, for instance, this kind of hexstring. That’s when I realized this translation would not be so straightforward after all. That’s also when I began consulting Beej’s Guide to Network Programming. Along with a few friends also interested in this project, I made several small-step-but-giant-leaps forward today with Beej’s guide on our side. Our rough plan for a DNS resolver that attempts to circumvent higher-level abstractions like getaddreinfo() looks like this (which basically mirrors the Python tutorial that is our model): open a socket with socket() connect to it with connect() send stuff with sendto() get stuff with recv() examine it and decide what to do next close socket with close() We sorted out how socket() works. But our next obstacle came when we started thinking through how to send stuff with sendto(). In the Python tutorial, the high level process looks like this: import socket query = build_query(&quot;www.example.com&quot;, 1) sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.sendto(query, (&quot;8.8.8.8&quot;, 53)) response, _ = sock.recvfrom(1024) For the moment let’s ignore the build_query() part, which I’m anticipating will be the bulk of the work. Instead I’m focused on the sock.sendto() line, in particular the part where we specify that we’d like to send our query to Google’s DNS server (“8.8.8.8”) on port 53. When we use C’s version of sendto(), we will be passing to it our socket descriptor (returned when we open our socket with socket()) a buffer string (the content we’re sending – presumably something like the header and question combo alluded to above), the length of our query in bytes, and then a pointer to a sockaddr struct (or its equivalent sockaddr_in) and its size. This last part, we realized, is where we are specifying where and how we want to send our stuff – in this case, to “8.8.8.8” on port 53. Now we’re getting somewhere! And in that sense, this sockaddr_in struct seems key to unlocking this small part of the puzzle. Here’s what the struct looks like: struct sockaddr_in { short sin_family; unsigned short sin_port; struct in_addr sin_addr; char sin_zero[8]; }; struct in_addr { unsigned long s_addr; }; This sockaddr_in structure holds an address family (in our case AF_INET), a port number, and an IPv4 address, AKA one of those things that looks like 142.250.176.14, which is four one-byte integers strung together (i.e., each of the four component parts can range from 0-255). But what’s interesting is that this sin_addr component is not some string of four integers or something; it’s a single unsigned long, which I believe is either a 4-byte or 8-byte integer. Either way it’s a big-ass number. That means we need to translate our address’s component parts to one-byte bytestrings, concatenate them together into a single 4-byte string, and then interpret that as a single integer. For example, if we replace each one-byte integer component of 142.250.176.14 with binary, we’d get 10001110 11111010 10110000 00001110 and if we concatenate those together into a single 4-byte binary value: 10001110111110101011000000001110 which corresponds to the integer 2,398,793,742 Now, I believe there is a convenient way to tranlsate a human-readable IPv4 address to an integer and vice-versa with inet_pton(), and we very well may avail ourselves of this convenience. But why do that when we could just . . . implement it ourselves?? In which case, ultimately what we’re going to need to do is take an IPv4 address (“8.8.8.8”), parse it into four bytestrings, concatenate the result into a single 4-byte bytestring, and return it as an integer. Then we’ll have that component part of our sockaddr_in struct and, in turn, a component of what we’ll need to pass along to sendto(). And with that, I present to you a fun REPL implementation of btoi(), our friendly bitstring-to-integer parser which will handle that last step of translating a 4-byte bytestring to an unsigned, 32-bit integer. #include &amp;lt;stdio.h&amp;gt; #include &amp;lt;stdint.h&amp;gt; /* convert bitstring[32] to 32-bit uint */ uint32_t btoi(char *binstr, int len) { int i; uint32_t result; result = 0; for (i=0; i&amp;lt;len; ++i) if (binstr[i] == '1') result += 1 &amp;lt;&amp;lt; (len - 1 - i); return result; } int main() { char c, binstr[32]; int i = 0; while ((c = getchar()) != EOF) { if (c != '\n') { if (i &amp;lt; 32) // add `c` to `binstr` and increment `i` while `i` &amp;lt; 32 binstr[i++] = c; } else { // print result printf(&quot;-&amp;gt; %u\n&quot;, btoi(binstr, i)); // reset all `binstr` elements to null for (; i&amp;gt;0; --i) binstr[i] = '\0'; } } return 0; } You know what? I’m proud of this little function. First because it handles strings less than and greater than 32 characters in length, no problem. Any characters in excess of 32 just never get added to the binstr array in the first place. And for strings less than 32 characters in length, the len parameter of btoi() ensures that the most significant bit is at place len (not at place 32!). And second because I landed on what feels like a clever solution to resetting all the binstr elements to null at the same time as I reset counter i, which I accomplish by decrementing i back to 0 and setting binstr[i] to the null string at each step. That way, I ensure there’s no garbage left over in any of those memory locations. Here it is in action: &amp;gt; 10011 19 &amp;gt; 001 1 &amp;gt; 100 4 &amp;gt; 100110110100101010110101011110101001011011 2605364602 The next step is to do the first part of the parsing operation, namely walking through the IPv4 address, extracting each one-byte integer, translating it to a bytestring, and concatenating the results to pass to btoi(). And at that point we will be well on our way to building a sockaddr_in struct, which means we’ll be one step closer to making a connection.</summary></entry><entry><title type="html">RC35. Preliminary Tinkerings with the OpenAI API</title><link href="https://www.datadoodad.com/recurse%20center/RC35/" rel="alternate" type="text/html" title="RC35. Preliminary Tinkerings with the OpenAI API" /><published>2023-07-03T00:00:00-07:00</published><updated>2023-07-03T00:00:00-07:00</updated><id>https://www.datadoodad.com/recurse%20center/RC35</id><content type="html" xml:base="https://www.datadoodad.com/recurse%20center/RC35/">&lt;style type=&quot;text/css&quot;&gt;
.highlight pre{
  white-space: pre-wrap;
}

&lt;/style&gt;

&lt;p&gt;I’ve been curious to do a little tinkering with the OpenAI API since early on in batch, when a fellow recurser casually demo’ed a GPT pipeline and suddenly all was demystified, but I never got around to it. And then last week another recurser showed off a project involving multiple LLM and stable diffusion pipelines. So today I let loose.&lt;/p&gt;

&lt;p&gt;Maybe it would be cool, I thought, to run a Google search and then scrape each of the results and have GPT summarize the text from each hit? This is redundant on the one hand, since, if it’s summaries of popular or relevant websites that you want, just asking ChatGPT would be easier. But what if you wanted GPT to be able to summarize more recent material from after it was trained? I figured this way you can run your search and get a quick survey of Google’s top hits before deciding whether to click through. As an added bonus, this prototype preserves the source of the summary in the form of the origin URL, so you have a citation if you want (I do).&lt;/p&gt;

&lt;p&gt;Totalling a couple dozen lines of code, this program&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;takes as a CLI argument the search query&lt;/li&gt;
  &lt;li&gt;uses &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;requests&lt;/code&gt; to get the JSON-formatted results of the search using the Google API&lt;/li&gt;
  &lt;li&gt;parses the JSON response and loops through each of the hits, extracting the text with BeautifulSoup&lt;/li&gt;
  &lt;li&gt;passes the lightly cleaned page text to the OpenAI API along with some system messages specifying a request for a summary of the page text in light of the original query&lt;/li&gt;
  &lt;li&gt;prints GPT-3.5’s results to stdout&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; python3 search.py &quot;fourth of july 2023 los angeles&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Result 1:
65+ 4th of July Celebrations in Los Angeles 2023
https://momsla.com/best-fourth-of-july-celebrations-in-los-angeles/
In 2023, there are several Fourth of July celebrations happening in Los Angeles. One of the largest free parties is the Gloria Molina Grand Park's 4th of July Block Party, which includes live music, a digital playground for children, and a drone show instead of traditional fireworks. The Los Angeles Dodgers will also be playing a game against the Pittsburgh Pirates, followed by a fireworks show. Universal Studios Hollywood will have a July 4th Fireworks Spectacular included with admission tickets, and the Hollywood Bowl will have a special performance by The Beach Boys with fireworks. Disneyland will have its own celebration, and LEGOLAND California will have special activities and a fireworks show. Other events include Councilman Bob Blumenfield's July 4th Extravaganza in Woodland Hills and various events in the Conejo Valley and Thousand Oaks area.

Result 2:
The Best Fourth of July Events in Los Angeles | Discover Los Angeles
https://www.discoverlosangeles.com/things-to-do/the-best-fourth-of-july-events-in-los-angeles
In Los Angeles, there are several exciting Fourth of July events to celebrate Independence Day. One of the signature events is the Fourth of July Block Party at Grand Park, which features food, music, and performances by artists like Maya Jupiter and DJ Ethos. The block party also includes a dazzling drone show over The Music Center Plaza. Another option is La Lo La Rooftop at the AC Hotel DTLA, which is hosting &quot;A Celebration of the California Cowboy&quot; with a family-style Santa Maria BBQ, live music, and Red, White &amp;amp; Blue Sangria. Dodger Stadium offers Friday Night Fireworks after select home games, including on July 4th when the Dodgers host the Pittsburgh Pirates. Alamo Drafthouse in Downtown LA is screening the classic film &quot;Jaws&quot; on July 4th, complete with interactive experiences and themed drinks. Universal Studios Hollywood celebrates with a fireworks display, live music performances, and specially themed décor. Additionally, the Hollywood Bowl is hosting The Beach Boys

Result 6:
4th of July Fireworks in Los Angeles &amp;amp; Best Spots to See Them in 2023
https://www.timeout.com/los-angeles/things-to-do/4th-of-july-events-where-to-see-fireworks-in-la
In 2023, there are several places in Los Angeles where you can see Fourth of July fireworks. One popular spot is the Rose Bowl in Pasadena, where an MLS match between the L.A. Galaxy and LAFC will be followed by a fireworks show. While tickets for the match may be sold out, you can still watch the fireworks from the areas around the Rose Bowl. Another great location is Marina del Rey, where fireworks explode over the marina channel. Spectators can gather at Burton Chace Park, Fisherman's Village, Marina &quot;Mother's&quot; Beach, waterfront hotels and restaurants, and even on boats to enjoy the show. Keep in mind that these locations may be crowded, so plan accordingly.

Result 7:
July Fourth Fireworks Spectacular with The Beach Boys | Hollywood ...
https://www.hollywoodbowl.com/events/performances/2277/2023-07-04/july-fourth-fireworks-spectacular-with-the-beach-boys
On July 4, 2023, the Hollywood Bowl in Los Angeles will be hosting the July Fourth Fireworks Spectacular with The Beach Boys. This event is a favorite summer tradition in LA and will feature The Beach Boys performing their chart-topping hits such as &quot;Surfin' USA,&quot; &quot;I Get Around,&quot; and &quot;Fun, Fun, Fun.&quot; The Hollywood Bowl Orchestra, conducted by Thomas Wilkins, will accompany the band, and there will be a spectacular fireworks display to end the night. The event will take place at 7:30 PM, and there are discounted tickets available for children aged 12 and under.

Result 9:
Celebrate Independence Day Safely at a Public Fireworks Display ...
https://www.lafd.org/news/celebrate-independence-day-safely-public-fireworks-display
On July 4, 2023, there will be several professional fireworks shows in Los Angeles. Some of the locations include the Los Angeles Civic Center - Grand Park 4th of July Block Party, Dodger Stadium, Hollywood Bowl, Marina del Rey, Pacific Palisades, Porter Ranch, San Pedro, and Woodland Hills. It is important to note that all fireworks, including &quot;safe and sane&quot; varieties, are illegal for personal use in the City of Los Angeles. Attending a professional fireworks show is the safest and smartest way to celebrate Independence Day. For more information on public admission, parking, and other details, it is recommended to contact the specific venue or organization hosting the show.

Result 10:
Where to See 4th of July Fireworks in Los Angeles 2023
https://mommypoppins.com/los-angeles-kids/4th-of-july-fireworks-los-angeles-orange-county
There are several places in Los Angeles where you can see fireworks on the Fourth of July in 2023. Some popular locations include the Rose Bowl in Pasadena, where they host an annual fireworks show, and the Hollywood Bowl, which also puts on a spectacular display. Other options include the Grand Park in downtown LA, where they have a free Fourth of July Block Party with live music and fireworks, and the Queen Mary in Long Beach, which hosts a fireworks show over the water. Additionally, many cities in the surrounding area, such as Santa Monica and Marina del Rey, also have their own fireworks displays.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I mean… Am I ever going to use this to perform a search? Of any kind? Nope. It’s definitely tedious and annoying and not worth the pennies each search costs. But, the exercise served its purpose, and my toes are now wet.&lt;/p&gt;

&lt;p&gt;Next up: experimenting with ways of cumulatively summarizing large texts using GPT.&lt;/p&gt;</content><author><name>Zach Rottman, PhD</name></author><category term="Recurse Center" /><category term="GPT" /><summary type="html">I’ve been curious to do a little tinkering with the OpenAI API since early on in batch, when a fellow recurser casually demo’ed a GPT pipeline and suddenly all was demystified, but I never got around to it. And then last week another recurser showed off a project involving multiple LLM and stable diffusion pipelines. So today I let loose. Maybe it would be cool, I thought, to run a Google search and then scrape each of the results and have GPT summarize the text from each hit? This is redundant on the one hand, since, if it’s summaries of popular or relevant websites that you want, just asking ChatGPT would be easier. But what if you wanted GPT to be able to summarize more recent material from after it was trained? I figured this way you can run your search and get a quick survey of Google’s top hits before deciding whether to click through. As an added bonus, this prototype preserves the source of the summary in the form of the origin URL, so you have a citation if you want (I do). Totalling a couple dozen lines of code, this program takes as a CLI argument the search query uses requests to get the JSON-formatted results of the search using the Google API parses the JSON response and loops through each of the hits, extracting the text with BeautifulSoup passes the lightly cleaned page text to the OpenAI API along with some system messages specifying a request for a summary of the page text in light of the original query prints GPT-3.5’s results to stdout &amp;gt; python3 search.py &quot;fourth of july 2023 los angeles&quot; Result 1: 65+ 4th of July Celebrations in Los Angeles 2023 https://momsla.com/best-fourth-of-july-celebrations-in-los-angeles/ In 2023, there are several Fourth of July celebrations happening in Los Angeles. One of the largest free parties is the Gloria Molina Grand Park's 4th of July Block Party, which includes live music, a digital playground for children, and a drone show instead of traditional fireworks. The Los Angeles Dodgers will also be playing a game against the Pittsburgh Pirates, followed by a fireworks show. Universal Studios Hollywood will have a July 4th Fireworks Spectacular included with admission tickets, and the Hollywood Bowl will have a special performance by The Beach Boys with fireworks. Disneyland will have its own celebration, and LEGOLAND California will have special activities and a fireworks show. Other events include Councilman Bob Blumenfield's July 4th Extravaganza in Woodland Hills and various events in the Conejo Valley and Thousand Oaks area. Result 2: The Best Fourth of July Events in Los Angeles | Discover Los Angeles https://www.discoverlosangeles.com/things-to-do/the-best-fourth-of-july-events-in-los-angeles In Los Angeles, there are several exciting Fourth of July events to celebrate Independence Day. One of the signature events is the Fourth of July Block Party at Grand Park, which features food, music, and performances by artists like Maya Jupiter and DJ Ethos. The block party also includes a dazzling drone show over The Music Center Plaza. Another option is La Lo La Rooftop at the AC Hotel DTLA, which is hosting &quot;A Celebration of the California Cowboy&quot; with a family-style Santa Maria BBQ, live music, and Red, White &amp;amp; Blue Sangria. Dodger Stadium offers Friday Night Fireworks after select home games, including on July 4th when the Dodgers host the Pittsburgh Pirates. Alamo Drafthouse in Downtown LA is screening the classic film &quot;Jaws&quot; on July 4th, complete with interactive experiences and themed drinks. Universal Studios Hollywood celebrates with a fireworks display, live music performances, and specially themed décor. Additionally, the Hollywood Bowl is hosting The Beach Boys Result 6: 4th of July Fireworks in Los Angeles &amp;amp; Best Spots to See Them in 2023 https://www.timeout.com/los-angeles/things-to-do/4th-of-july-events-where-to-see-fireworks-in-la In 2023, there are several places in Los Angeles where you can see Fourth of July fireworks. One popular spot is the Rose Bowl in Pasadena, where an MLS match between the L.A. Galaxy and LAFC will be followed by a fireworks show. While tickets for the match may be sold out, you can still watch the fireworks from the areas around the Rose Bowl. Another great location is Marina del Rey, where fireworks explode over the marina channel. Spectators can gather at Burton Chace Park, Fisherman's Village, Marina &quot;Mother's&quot; Beach, waterfront hotels and restaurants, and even on boats to enjoy the show. Keep in mind that these locations may be crowded, so plan accordingly. Result 7: July Fourth Fireworks Spectacular with The Beach Boys | Hollywood ... https://www.hollywoodbowl.com/events/performances/2277/2023-07-04/july-fourth-fireworks-spectacular-with-the-beach-boys On July 4, 2023, the Hollywood Bowl in Los Angeles will be hosting the July Fourth Fireworks Spectacular with The Beach Boys. This event is a favorite summer tradition in LA and will feature The Beach Boys performing their chart-topping hits such as &quot;Surfin' USA,&quot; &quot;I Get Around,&quot; and &quot;Fun, Fun, Fun.&quot; The Hollywood Bowl Orchestra, conducted by Thomas Wilkins, will accompany the band, and there will be a spectacular fireworks display to end the night. The event will take place at 7:30 PM, and there are discounted tickets available for children aged 12 and under. Result 9: Celebrate Independence Day Safely at a Public Fireworks Display ... https://www.lafd.org/news/celebrate-independence-day-safely-public-fireworks-display On July 4, 2023, there will be several professional fireworks shows in Los Angeles. Some of the locations include the Los Angeles Civic Center - Grand Park 4th of July Block Party, Dodger Stadium, Hollywood Bowl, Marina del Rey, Pacific Palisades, Porter Ranch, San Pedro, and Woodland Hills. It is important to note that all fireworks, including &quot;safe and sane&quot; varieties, are illegal for personal use in the City of Los Angeles. Attending a professional fireworks show is the safest and smartest way to celebrate Independence Day. For more information on public admission, parking, and other details, it is recommended to contact the specific venue or organization hosting the show. Result 10: Where to See 4th of July Fireworks in Los Angeles 2023 https://mommypoppins.com/los-angeles-kids/4th-of-july-fireworks-los-angeles-orange-county There are several places in Los Angeles where you can see fireworks on the Fourth of July in 2023. Some popular locations include the Rose Bowl in Pasadena, where they host an annual fireworks show, and the Hollywood Bowl, which also puts on a spectacular display. Other options include the Grand Park in downtown LA, where they have a free Fourth of July Block Party with live music and fireworks, and the Queen Mary in Long Beach, which hosts a fireworks show over the water. Additionally, many cities in the surrounding area, such as Santa Monica and Marina del Rey, also have their own fireworks displays. I mean… Am I ever going to use this to perform a search? Of any kind? Nope. It’s definitely tedious and annoying and not worth the pennies each search costs. But, the exercise served its purpose, and my toes are now wet. Next up: experimenting with ways of cumulatively summarizing large texts using GPT.</summary></entry><entry><title type="html">RC32. Where the fear (of pointers) has gone there will be nothing.</title><link href="https://www.datadoodad.com/recurse%20center/RC32/" rel="alternate" type="text/html" title="RC32. Where the fear (of pointers) has gone there will be nothing." /><published>2023-06-28T00:00:00-07:00</published><updated>2023-06-28T00:00:00-07:00</updated><id>https://www.datadoodad.com/recurse%20center/RC32</id><content type="html" xml:base="https://www.datadoodad.com/recurse%20center/RC32/">&lt;style type=&quot;text/css&quot;&gt;
  table, tr, td {
    border: none;
    text-align: center;
    font-weight: bold;
  }

  table {
    margin: 10px auto;
  }

  .register {
    border: 1px solid black;
    /*background-color: white;*/
    width: 100px;
    height: 30px;
    background-color: #CDD6D0;
  }

  .address { 
    color: #D6A99A; 
  }

  .variable {
    color: #E16036;
  }

&lt;/style&gt;

&lt;p&gt;I’m scared of pointers, okay?&lt;/p&gt;

&lt;!--more--&gt;

&lt;figure&gt;
&lt;img src=&quot;/assets/images/RC32_evilpointer-1.png&quot; alt=&quot;Evil Pointer&quot; /&gt;
&lt;figcaption align=&quot;center&quot;&gt;Evil pointer&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;There, I said it!&lt;/p&gt;

&lt;p&gt;I got into whole damned business by building websites and stuff – HTML, maybe some PHP, a little JavaScript, a relational database or two if I was lucky. Then more recently Python has been my daily driver. Python, okay? Understand? So now I’m looking at a bunch of C code filled with ampersands and asterisks, and I’m weighing the possibility that my computer might, I don’t know, melt or explode or something on compilation. I’m scared, I tell you!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/RC32_winkies.gif&quot; alt=&quot;Winkies&quot; width=&quot;100%&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Ah but as young Paul Atreides once said, “Fear is the mind-killer.”&lt;/p&gt;

&lt;p&gt;Alright Muad’Dib, time to face the fear and permit it to pass over me and through me and all that.&lt;/p&gt;

&lt;h1 id=&quot;the-box-of-pain&quot;&gt;The Box of Pain&lt;/h1&gt;

&lt;p&gt;The concept of a register having a particular address where it stores some value is actually familiar thanks to my time with nand2tetris these past 6 weeks. Nevertheless, it strikes me as a great deal of power to wield, having this kind of direct access to computer memory and the sacred knowledge of where, precisely, the very stuff of software resides. So, to ease into the whole thing I created my own version of a simple swap function while following K&amp;amp;R.&lt;/p&gt;

&lt;figure&gt;
&lt;img src=&quot;/assets/images/RC32_evilpointer-2.png&quot; alt=&quot;Evil Pointer&quot; /&gt;
&lt;figcaption align=&quot;center&quot;&gt;Another evil pointer&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Here’s something that will not work.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;swap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;tmp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Before swap: x = %d, y = %d&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;swap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;After swap:  x = %d, y = %d&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Before swap: x = 100, y = 200
After swap:  x = 100, y = 200
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here, I’ve created a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;swap()&lt;/code&gt; function that takes two integers, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b&lt;/code&gt;, as arguments. After assigning the value at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tmp&lt;/code&gt;, I assign the value of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; and then &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tmp&lt;/code&gt; (the value formerly known as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt;) to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b&lt;/code&gt;. Seems promising, but it doesn’t work, since calling &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;swap(x, y)&lt;/code&gt; passes copies of the values, and not the values themselves, to the function. So &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tmp&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b&lt;/code&gt; indeed do their little dance together, but &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; are never affected.&lt;/p&gt;

&lt;p&gt;As an alternative, we can pass not &lt;em&gt;values&lt;/em&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;swap&lt;/code&gt; but the &lt;em&gt;addresses&lt;/em&gt; where those values are stored, and from the body of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;swap()&lt;/code&gt; shuffle the things in these addresses around.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;swap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;px&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;py&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;tmp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;px&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;px&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;py&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;py&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tmp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Before swap: x = %d, y = %d&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;swap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;After swap:  x = %d, y = %d&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Before swap: x = 100, y = 200
After swap:  x = 200, y = 100
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;amp;x&lt;/code&gt; is saying “the address of the register where the integer ‘x’ is.” Same with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;amp;y&lt;/code&gt;. This is called &lt;em&gt;referencing&lt;/em&gt;, I suppose since we’re retrieving not the values assigned to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; but their references.&lt;/p&gt;

&lt;p&gt;Next, we’re passing these pointers to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;swap()&lt;/code&gt; as arguments &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*px&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*py&lt;/code&gt; (‘p’ for pointer). &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*px&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*py&lt;/code&gt; are just addresses that &lt;em&gt;point to&lt;/em&gt; two places in memory.&lt;/p&gt;

&lt;p&gt;In the body of the function, when we assign &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*px&lt;/code&gt; to a new integer variable called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tmp&lt;/code&gt;, what we’re doing is saying “hey, get me the thing at this address &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*px&lt;/code&gt; and assign it to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tmp&lt;/code&gt;, got it?” This is called &lt;em&gt;dereferencing&lt;/em&gt;. We continue to deref stuff: “Hey, also, go ahead and assign the integer at address &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*py&lt;/code&gt; to address &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*px&lt;/code&gt;,” we say, “and while you’re at it do me a favor and put that integer &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tmp&lt;/code&gt; at address &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*py&lt;/code&gt;. Thanks!”&lt;/p&gt;

&lt;p&gt;Now when we call &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;swap()&lt;/code&gt;, we’re not manipulating copies of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;; instead, we have direct access to their locations in memory, and we’re using that knowledge to move things from one box to another.&lt;/p&gt;

&lt;h1 id=&quot;i-must-not-fear&quot;&gt;I must not fear&lt;/h1&gt;

&lt;figure&gt;
&lt;img src=&quot;/assets/images/RC32_evilpointer-3.png&quot; alt=&quot;Evil Pointer&quot; /&gt;
&lt;figcaption align=&quot;center&quot;&gt;Another!&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;Okay, so, here’s a simple diagram of what is happening here (I think).&lt;/p&gt;

&lt;p&gt;Here’s my RAM.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;3&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;x&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;8&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;y&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;2&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;3&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;4&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;It has 5 (!) memory registers, with addresses ranging from 0 to 4. I’ve assigned the integer value 3 to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt;, which the compiler stuck at address &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@0&lt;/code&gt;, which presumably was free, and I’ve assigned the integer value 8 to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;, which the compiler stuck at the next available address, which in this case is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@1&lt;/code&gt;. Great!&lt;/p&gt;

&lt;p&gt;Next order of business: let’s pass the &lt;em&gt;addresses&lt;/em&gt; for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;swap()&lt;/code&gt; like so: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;swap(&amp;amp;x, &amp;amp;y)&lt;/code&gt;. Once we’re inside swap, we’ll initialize a new integer, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tmp&lt;/code&gt;, which the compiler conveniently puts at the next available place in RAM. (Note that there’s no value stored there yet.)&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;3&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;x&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;8&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;y&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;2&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;tmp&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;3&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;4&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;When we assigned &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*px&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tmp&lt;/code&gt;, then, we’re saying we want to assign the value &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@0&lt;/code&gt; (where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*px&lt;/code&gt; is pointing) to wherever &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tmp&lt;/code&gt; is:&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;3&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;x&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;8&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;y&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;2&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;3&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;tmp&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;3&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;4&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Then we assign &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*px = *py&lt;/code&gt;, setting the register &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@0&lt;/code&gt; to whatever value is on the register &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@1&lt;/code&gt;.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;strike&gt;3&lt;/strike&gt; 8&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;x&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;8&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;y&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;2&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;3&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;tmp&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;3&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;4&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Finally, we assign &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;*py = tmp&lt;/code&gt;, which sets register &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@1&lt;/code&gt; to whatever &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tmp&lt;/code&gt; is.&lt;/p&gt;

&lt;table&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;0&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;8&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;x&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;1&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;strike&gt;8&lt;/strike&gt; 3&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;y&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;2&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;3&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;tmp&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;3&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;address&quot;&gt;4&lt;/td&gt;
    &lt;td class=&quot;register&quot;&gt;&lt;/td&gt;
    &lt;td class=&quot;variable&quot;&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Hopefully this is mostly accurate.&lt;/p&gt;

&lt;p&gt;Anyway, what’s cool is that we can actually print out the addresses if we want, so&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;@x = %p&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;@y = %p&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;will yield&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;@x = 0x7ff7bcc93918
@y = 0x7ff7bcc93914
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And this is neat for another reason, too, since we can see that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; are being stored 4 addresses away from one another – which makes total sense, since integer types take up, you guessed it, 4 bytes of space! So the compiler seems to be saying, “Cool, you’re declaring two integers here? Each needs four bytes of space, and I’m gonna store them right next to each other – at addresses 140,702,000,953,620 and 140,702,000,953,624 if you want to write it down.”&lt;/p&gt;

&lt;h1 id=&quot;other-stuff&quot;&gt;Other stuff&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;Finished implementing DNS in python with the Implement DNS in a Weekend crew. Spent a little time reimplementing it in C (which is what led me down this pointer hole in the first place), but, well, that might take a while. Definitely bumping into walls and learning a lot.&lt;/li&gt;
  &lt;li&gt;Paired on some C stuff, such as the above, and felt like I’m having one of those tiny steps/giant leaps situations&lt;/li&gt;
  &lt;li&gt;Group meeting and assembler code review for nand2tetris&lt;/li&gt;
  &lt;li&gt;Fun pairing sesh with someone doing wacky unthinkables in P5.js&lt;/li&gt;
&lt;/ul&gt;</content><author><name>Zach Rottman, PhD</name></author><category term="Recurse Center" /><category term="C" /><summary type="html">I’m scared of pointers, okay?</summary></entry></feed>