14 May 2017

How to Setup a Fable-Elmish Project (May 1 2017)

Table of Contents

How to Setup a Fable-Elmish Project (May 1 2017)

This is a short post that details the steps needed to setup a basic project with the following tools:

  1. .NET Core and F#
  2. Fable-Elmish
  3. Working Time Travel Debugger on Firefox and Chrome.

At the end, you should have the sample Fable-Elmish project (showing navigation and counter example) running in a browser with a working time travel debugger.

These instructions are working as of 1 May 2017.

NOTE: If you are already familiar with Fable-Elmish, this post will probably not help you. However, I would greatly appreciate any tips or corrections to improve these instructions.

Setup .NET Core and F#

The first step I took was to setup .NET Core. I downloaded and installed 3 packages, in the given order.

IDE

Install the IDE of your choice. I decided to go with Visual Studio Code and Ionide.

At a minimum, install the following extensions in VS Code to take full advantage of Ionide's F# development environment. - Ionide-FAKE - Ionide-fsharp - Ionide-Paket

There are lots of additional extensions available in VS Code that can make your life easier during the development process. Feel free to experiement!

.NET Core

Install .NET Core from the Microsoft Website. - Pick the appropriate version for your environment. - I did this on a Windows 7 64-bit machine, and the x64 version worked correctly for me. - At the time that I installed it, my version of the SDK was 1.0.3.

Microsoft Build Tools

Install Microsoft Build Tools 2015 from the Microsoft Website.

F# 4.0

Install Microsoft Visual F# 4.0 Tools RTM from the Microsoft Website.

At the time that I downloaded this, F# 4.1 was available. However, I used F# 4.0 based on advice in an issue in the Ionide GitHub project.

Setup Fable-Elmish

Install yarn

Yarn is a Node.js package manager, similar to npm. Fable / Fable-Elmish are using yarn, so you will need to install it.

I already had Node and npm installed, so I installed yarn with the following command.

npm install --global yarn

If you do not have npm, then you can download a yarn installer from the Yarn Website. Please note that I'm not sure if you will need npm as well, so keep in mind that you may need to install it later.

Setup a new project with Fable-Elmish

Setting up Fable-Elmish in a way that you can use it for a new project requires a bit of CLI work. Here are the steps.

  1. Open a command prompt / shell.

  2. Install the Fable-Elmish templates. This is the command to install the templates for Fable-Elmish based on the React framework.

    dotnet new -i "Fable.Template.Elmish.React::*"
  3. Create a new Fable-Elmish project.

    dotnet new fable-elmish-react -n project-name
  4. Go to your new project directory.

    cd project-name
  5. Install your Node-based dependencies. Running yarn without parameters is equivalent to the npm install command.

    yarn
  6. When I ran the commands, I received a warning from yarn about the version of my remotedev dependency. When I looked at package.json, I saw that my dependency was at version ^0.2.7 but my devDependency was at version ^0.2.5.
    • I fixed this situation by changing the devDependency version to ^0.2.7.
  7. Restore your NuGet-based dependencies.

    dotnet restore

Setup debugging environment

Chrome

  1. Install the React Developer Tools from the Chrome Web Store.

  2. Install the Redux DevTools from the Chrome Web Store.

Firefox

  1. Install the React Developer Tools from the Mozilla Add-ons site. When installed, this add-on will be called React Devtools.

  2. Install the Redux DevTools from the Mozilla Add-ons site.

Opera, IE, Safari, etc.

I have not tried these instructions myself, so please take with a grain of salt.

  1. React Developer Tools offers a standalone version for browsers / environments other than Chrome and Firefox. You can find the installation and usage instructions on the react-devtools GitHub Website.

  2. zalmoxisus, the person who has released the Redux DevTools extensions for Chrome and Firefox, has a package called remote-redux-devtools for use with browsers / environments other than Chrome and Firefox. You can find the installation and usage instructions on the remote-redux-devtools GitHub website.

Run the sample

  1. To run the sample, go to the project directory and run the following command.

    dotnet fable npm-run start
  2. Open your browser (I recommend Chrome or Firefox) and go to http://localhost:8080/.

  3. To open the debugger:
    • If you are using Chrome or Firefox, you will see a small green icon. Clicking on it will bring up the Redux DevTools panel / window that will show state changes (if you've taken any actions in your browser).
    • If you decided to use the remote-redux-devtools, you may have you modify the default app. If it doesn't work by default, see the fable-elmish debugger GitHub Website for more details.
  4. If you want to build the project for release, run the following command.

    dotnet fable npm-run build

Next steps

At this point, the basics should be setup and the project is ready for your custom code.

If you are stuck, an excellent resource is the Fable Gitter channel, where a number of Fable and Fable-Elmish developers often hang out.

If you are not familiar with the Elm architecture, that is a good next step.

This default version also does not give you certain often-requested features, such as using Rollup instead of Webpack.

Conclusion

If you are still reading, I hope this guide has helped you setup your first Fable-Elmish project.

If you have any suggestions, tips, or corrections, please leave them as a comment below.

See you next time!

07 May 2017

AOC 2016 - Day 7

Table of Contents

AOC 2016 - Day 7

Problem statement on the Advent of Code website.

Code for this post can be found on GitHub.

Background

  • On Day 1, we located Easter Bunny headquarters.
  • On Day 2, we broke the code to use the bathroom in Easter Bunny HQ.
  • On Day 3, we helped the EBHQ's design department by removing invalid triangles from the list.
  • On Day 4, we figured out which rooms in the information kiosk are valid.
  • On Day 5, we figured out how to get past security doors by calculating the code.
  • On Day 6, we figured out how to communicate with Santa through the Easter Bunnies' jamming technology.

On Day 7, we will figure out which of the EBHQ's computers use encryption on the network.

Problem - Part 1

For Part 1, we need to find the IP addresses (aka computers) that support the TLS (transport-layer snooping) protocol.

We determine this by parsing the IP address and checking for certain patterns.

Solution - Part 1

When I read the problem description, Day 7 sounded very similar, at least in form, to the Day 4 problem.

The problem has multiple parts:

  1. Extract supernet and hypernet sequences from an IP address.
  2. Check for an ABBA pattern.
  3. Check an entire IP address.
  4. Connect the pieces.

Extract supernet and hypernet sequences from an IP address

The first part of the problem is understanding the format of the IP address. IP addresses are strings of characters alternating between being outside and inside square brackets. Here are some examples of IP addresses:

  • abba[mnop]qrst
  • abcd[bddb]xyyx[bddb]xyyx
  • ioxxoj[asdfgh]zxcvbn

This solution, just like the Day 4 solution, stretched and expanded my regular expression skills. I would like to thank Will Boyd for the Regex Storm Regex Tester. This site made it much faster and easier to quickly test a regular expression (and its expected results) against a corpus of test data.

After a number of attempts, I ended up with two sets of regular expressions.

For supernet strings (whose proper name I did not know until Part 2), I used the following strategy:

  • Remove all text between square brackets [].
  • Break the resulting compound string into a list of strings.
  let outerStrings =
    Regex.Replace (str, @"\[\w+\]", " ")
    |> fun s -> s.Split ([|' '|])

For hypernet strings, I used the following strategy:

  • Extract all text that is between square brackets [].
  • Remove the brackets.
  let hypernetStrings =
    Regex.Matches (str, @"\[\w+\]")
    |> Seq.cast
    |> Seq.map (fun (m:Match) -> Regex.Match(m.Value, @"\w+"))
    |> Seq.cast<Match>
    |> Seq.map (fun x -> x.Value)

Check for an ABBA pattern

An ABBA (Autonomous Bridge Bypass Annotation) pattern is a palindrome of exactly 4 characters within a string of length 4 or more.

let isABBA (str:string) =
  str
  |> Seq.windowed 4
  |> Seq.exists (fun s -> s.[0] = s.[3] && s.[1] = s.[2] && s.[0] <> s.[1])

Check an entire IP address

For Part 1, we have two primary requirements.

  • The supernet sequences must contain at least one ABBA.
  • The hypernet sequences must not contain any ABBAs.

The check for this is a one-liner.

  (Seq.exists isABBA outerStrings) && (Seq.exists isABBA hypernetStrings |> not)

Connect the pieces

The final function that puts it all together is responsible for the following.

  • Read the list of IP addresses.
  • Find the addresses that support TLS.
let day7part1 filename =
  File.ReadAllLines filename
  |> Seq.filter supportsTLS
  |> Seq.length

When you run this function, you get 105, which is the correct number of IP addresses that support TLS.

Problem - Part 2

In Part 2, we have to determine which IP addresses support the SSL (super-secret listening) protocol.

The problem for Part 2 keeps the concept of supernet and hypernet sequences, but it changes the checks we need to perform on them. Instead of checking for ABBAs, we have to now check for ABAs and their dependent BABs.

Solution - Part 2

Part 2 introduces two new types of patterns in IP addresses:

  1. ABA
    • Area-Broadcast Accessor
    • Follows the pattern of 'First char', 'Second char', 'First char'
    • E.g. aba, xyx, pup
  2. BAB
    • Byte Allocation Block
    • Contains the same characters as an ABA, but is reversed positions.
    • E.g. bab, yxy, upu

The requirements are as follows.

  1. There must be at least one ABA in the supernet sequences.
  2. There must be at least one BAB in the hypernet sequences.

My strategy for the Part 2 solution has slightly different steps.

  1. Extract supernet and hypernet sequences from an IP address.
  2. Check if a supernet sequence has at least one ABA.
  3. Check if a hypernet sequence has at least one BAB.
  4. Check a single IP address.
  5. Connect the pieces.

Extract supernet and hypernet sequences from an IP address

Extracting the supernet and hypernet sequences from an IP address follows exactly the same logic as the Part 1 problem. I extracted this code into two new functions and refactored the Part 1 code to use these new functions.

/// Extract supernet sequences from an IP address.
let supernetSeq ip =
  Regex.Replace (ip, @"\[\w+\]", " ")
  |> fun s -> s.Split ([|' '|])

/// Extract hypernet sequences from an IP address.
let hypernetSeq ip =
  Regex.Matches (ip, @"\[\w+\]")
  |> Seq.cast<Match>
  |> Seq.map (fun m -> Regex.Match(m.Value, @"\w+"))
  |> Seq.cast<Match>
  |> Seq.map (fun x -> x.Value)

Check if a supernet sequence has at least one ABA

The function hasABA takes a string of any length and tries to find at least one 3-character substring which matches the ABA pattern, which is:

  • The first and third character must be the same.
  • The first and second character must be different.
let hasABA (str:string) =
  str
  |> Seq.windowed 3
  |> Seq.exists (fun x -> x.[0] = x.[2] && x.[0] <> x.[1])

Check if a hypernet sequence has at least one BAB

To check if a hypernet sequence contains a BAB, I decided to use the Set collection. My overall strategy is as follows.

  1. Get all ABAs from the supernet sequences.
  2. Flip the ABAs (i.e. turn them into BABs).
  3. Get all the BABs from the hypernet sequences.
  4. Check if the Sets intersect.

First (and third), here is the code to get ABAs (or BABs) from IP addresses.

let getABAs (str:string) =
  str
  |> Seq.windowed 3
  |> Seq.map (System.String)
  |> Seq.filter hasABA

Second, here is the code to flip an ABA.

let flipABA (str:string) = sprintf "%c%c%c" str.[1] str.[0] str.[1]

Finally, I connected these pieces in the hasBAB function.

let hasBAB abas hnStr =
  let babs =
    hnStr
    |> getABAs
    |> Set.ofSeq
  abas
  |> Set.ofSeq
  |> Set.map flipABA
  |> Set.intersect babs
  |> (Set.isEmpty >> not)

Check a single IP address

The function supportsSSL is responsible for ensuring that the supernet sequences contains at least one ABA and that the hypernet sequences contain at least one BAB.

let supportsSSL line =
  let abas =
    supernetSeq line
    |> Seq.collect getABAs
  (supernetSeq line |> Seq.exists hasABA)
  && (hypernetSeq line |> Seq.exists (hasBAB abas))

Connect the pieces

The last function, day7part2, is responsible for linking the input file to supportsSSL and for calculating the final answer (i.e. the number of IP addresses that support SSL).

let day7part2 filename =
  File.ReadAllLines filename
  |> Seq.filter supportsSSL
  |> Seq.length

Lessons learned

Thanks to Day 7 being a relatively simple problem, my goal here was to expand my knowledge of testing.

  1. "If at first you don't succeed, try, try again." I once again worked on making Expecto and FsCheck work together.
    • As with Day 6, I did not end up using FsCheck - however, this was mainly because I could not get a good generator written for IP addresses.
    • In the end, I used the provided tests to ensure that my code is working correctly.
    • Since the time that I finished Day 7, I actually rewrote the solution to Day 2 using a variety of graph data structures (and corresponding algorithms) and used FsCheck extensively there and with the updated MD5 testing I did. I feel a little more comfortable with FsCheck now, though I am still not completely clear on how to generate restricted data (e.g. positive ints) other than through rather rudimentary methods.

My goal with Day 8 (and beyond) is to start using new platforms (specifically, Fable, Elmish, Suave, etc.). I'm extremely excited to be doing this.

See you next time!

03 May 2017

AOC 2016 - Day 6

Table of Contents

AOC 2016 - Day 6

Problem Statement on Advent of Code website.

Code for this post can be found on GitHub.

Background

  • On Day 1, we located Easter Bunny headquarters.
  • On Day 2, we broke the code to use the bathroom in Easter Bunny HQ.
  • On Day 3, we helped the EBs' design department by removing invalid triangles from the list.
  • On Day 4, we figured out which rooms in the information kiosk are valid.
  • On Day 5, we figured out how to get past security doors by calculating the code.

On Day 6, we will figure out how to communicate with Santa through the Easter Bunnies' jamming technology.

Problem - Part 1

For Part 1, we are given a list of garbled versions of a message that we sent to Santa. Per the protocol, we switched to a simple repetition code to get the message through.

Our goal is to figure out the original message.

Solution - Part 1

Compared to the rather more difficult problem on Day 5, the solution to Part 1 is relatively simple.

The problem states that we have to find the most frequent character in each column and, by stringing these characters together, we will derive the original message.

To solve this problem, I performed three explicit steps.

  1. Calculate the length of the longest string from the list of garbled messages.
    • I did not want to assume that all strings are the same length.
  2. Find the most frequent character for each position from the subset of strings that are long enough for that position.
  3. Combine the characters to construct the final answer.

Calculate length of the longest garbled message

To calculate the length of the longest garbled message, I used a simple function to take the list, map each string message to its length, and then get the largest number.

let longestLength list =
  list
  |> Seq.map (String.length)
  |> Seq.max

Find the most frequent character in a given position

To find the most frequent character in a given position, I performed the following steps.

  1. Filter the list of strings and only retain those strings that are long enough to have a character in the desired position.
  2. Count the number of occurrences of each character.
  3. Sort the list so that the character with the highest number of occurrences is at the head of the list.
  4. Return the character.
let mostFreqentCharAt pos list =
  list
  |> Seq.filter (fun line -> String.length line >= (pos + 1))
  |> Seq.map (Seq.item pos)
  |> Seq.countBy id
  |> Seq.sortByDescending snd
  |> Seq.head
  |> fst

Collect the most frequent characters in each position

The final collection function is responsible for the following steps.

  1. Read the garbled messages from a file.
  2. Find the most frequent character in each position, based on the length of the longest message.
  3. Concatenate the MFCs to get the original message.
let day6part1 filename =
  let lines = File.ReadAllLines filename
  seq {
    for i = 0 to (longestLength lines - 1) do
      yield (mostFreqentCharAt i lines |> string)
  }
  |> Seq.reduce (+)

If you run the above function, you will get tsreykjj, which is the correct answer.

Problem - Part 2

The problem statement for Part 2 only makes a slight change to the problem in Part 1. Instead of the most frequent character, we had agreed to use a modified repetition code when communicating with Santa and actually want the least frequent character in each position.

Unfortunately, I chose not to make the "frequency" function a parameter to the day6part1 function in the original implementation. Because of this, I had two options:

  1. Go back and make the frequency function a parameter so that I can re-use the function.
  2. Implement another version of the function using the new leastFreqentCharAt function.

I chose to go with option 2 for expediency sake. This is because I was experimenting with two other things for the Day 6 project.

  1. Using a F# project and .fs files instead of .fsx files
  2. Using the Expecto testing framework

Solution - Part 2

For the Part 2 solution, I kept the function that finds the longest string in the list of garbled messages. I had to reimplement two functions.

  1. A function to find the least frequent character in a given position.
  2. The collection function to derive the original message.

Find the least frequent character in a given position

This function is very similar to Part 1's implementation, except that my sort now puts items in ascending instead of descending order.

let leastFreqentCharAt pos list =
  list
  |> Seq.filter (fun line -> String.length line >= (pos + 1))
  |> Seq.map (Seq.item pos)
  |> Seq.countBy id
  |> Seq.sortBy snd
  |> Seq.head
  |> fst

The F# library functions for collections truly make it easy to perform the types of transformations requested in this problem.

Collect the least-frequent characters in each position

This collection function is identical in all ways to the Part 1 solution except for the frequency function that it uses.

let day6part2 filename =
  let lines = File.ReadAllLines filename
  seq {
    for i = 0 to (longestLength lines - 1) do
      yield (leastFreqentCharAt i lines |> string)
  }
  |> Seq.reduce (+)

When you run this function, you get hnfbujie which is, in fact, the correct answer.

Considerations for Day 6

When it came to the Day 6 problem, I wanted to try some new things to continue the learning process with F#. To that end, I incorporated two new pieces into my solution.

  1. Writing the code in .fs files as part of a "proper" project that compiles to an executable.
  2. Writing tests using the Expecto testing library.

Using a F# project template

For the most part, using a F# project template and .fs files was seamless. I was already familiar with Paket from previous projects, so the only difference was that I had to invoke my build script when I wanted to test my code.

However, I ran into an extremely frustrating bug that almost caused me to give up on Ionide. For whatever reason, Ionide is unable to understand references to DLLs in FsProj files and, thus, is unable to provide any Intellisense support for downloaded NuGet packages. This problem was so frustrating that I actually downloaded, installed, and used Visual Studio 2017 Community Edition to finish off the Day 6 implementation (which is why the indentation scheme in the source files is different for Day 6 compared to all previous solutions).

Based on some searching and reading, I am not the only person having this problem (1, 2, 3, etc.). I can only hope that the authors are able to resolve the issue soon. Until this problem is fixed, I have stopped using F# projects and .fs files and have gone back to using F# scripts (.fsx files).

Using a testing framework

For Day 6, I did quite a bit of research to find a testing framework that I would be able to use with my solution. I specifically did this with the Day 6 problem because the solutions were relatively simple compared to previous Days.

Here is a list of the frameworks that I considered.

  1. Expecto
  2. Fuchu
  3. xUnit.net
  4. NUnit
  5. FsCheck, without a framework

In the end, I chose to go with Expecto. This was a rather difficult transition for me because I bit off a little more than I could chew - I tried to use both Expecto and FsCheck for the Day 6 problem. In hindsight, this was a terrible idea for my first-time use of Expecto. Long story short, I was unable to get Expecto+FsCheck to work in a reasonable amount of time and settled for implementing the AOC-provided tests in Expecto. My frustration stemmed, in no small part, from a fundamental misunderstanding of FsCheck's purpose and capabilities.

Despite the large amount of frustration I felt while trying to get FsCheck to work, I was able to get basic testing working and even built a simple driver to either run tests (default) or calculate the Part 1 / Part 2 answer.

[<EntryPoint>]
let main argv =
  match argv with
  | [| first |] when first = "p1" ->
    let ans = Solution.day6part1 @"..\SixMain\SixMain\puzzle_input.txt"
    printfn "Day 6 Part 1: %A" ans
    0
  | [| first |] when first = "p2" ->
    let ans = Solution.day6part2 @"..\SixMain\SixMain\puzzle_input.txt"
    printfn "Day 6 Part 2: %A" ans
    0
  | _ -> Tests.runTestsInAssembly defaultConfig argv

My actual tests were also simple and encoded only the AOC-provided tests and results. Here is the Part 1 test.

[<Tests>]
let testsPart1 =
  testList "part1"
    [ testCase "easter" <| fun _ ->
        let answer = "easter"
        let funcAns = Solution.day6part1 @"..\SixMain\SixMain\test_input1.txt"
        Expect.isTrue (answer = funcAns) @"test input answer should be 'easter'" ]

And here is the Part 2 test.

[<Tests>]
let testsPart2 =
  testList "part2"
    [ testCase "advent" <| fun _ ->
        let answer = "advent"
        let funcAns = Solution.day6part2 @"..\SixMain\SixMain\test_input1.txt"
        Expect.isTrue (answer = funcAns) @"test input answer should be 'advent'" ]

Lessons learned

While not strictly F#-related, I learned a few lessons from this exercise.

  1. When incorporating new frameworks, take it one step at a time. It's better to integrate one framework and become comfortable with it before trying to add the next one.
    • Trying to do Expecto+FsCheck while being unfamiliar with both frameworks (and, in FsCheck's case, being unfamiliar with its purpose and limitations) was simply an exercise in frustration.
  2. When dealing with tooling issues, ask for help early and often.
    • When I first encountered the Intellisense issues in Ionide, I thought I had done something wrong (esp. since all GitHub issues I found at first were closed).
    • However, it turns out that people are still experiencing these issues.

See you next time!

29 April 2017

AOC 2016 - Day 5

Table of Contents

AOC 2016 - Day 5

Problem statement on Advent of Code website.

Code for this post can be found on GitHub.

Background

  • On Day 1, we located Easter Bunny headquarters.
  • On Day 2, we broke the code to use the bathroom in Easter Bunny HQ.
  • On Day 3, we helped the EBs' design department by removing invalid triangles from the list.
  • On Day 4, we figured out which rooms in the information kiosk are valid.

On Day 5, we will figure out how to get past security doors by calculating the code.

Problem - Part 1

For Part 1, we are given two pieces of information.

  1. The door ID
  2. An increasing integer index, starting at 0.

For each MD5 hash that starts with 00000 (five 0's), the 6th digit indicates the next character of the password. If we find eight hashes that start with five 0's, that provides the password.

Solution - Part 1

My strategy to solve this problem was two-fold.

  1. Write an MD5 algorithm in F#.
    1. I outlined my solution for a functional MD5 algorithm in F# in a previous blog post.
  2. Write a function to collect the required hashes and construct the final answer.

The vast majority of the functionality for this solution is in the MD5 algorithm, which I won't describe here again.

Collection function

Once I had an MD5 algorithm, I used the Seq module and its library functions to collect the required hashes and calculate the door password.

The strategy for the function is simple.

  1. Start by producing indices starting at 0.
  2. Map all the strings to hashes.
  3. Remove all hashes that don't match the criteria (i.e. hashes that don't start with 00000).
  4. Once we have eight hashes, stop producing indices and get the final answer.
let day5part1input = "cxdnnyjw"

let day5part1criteria = "00000"

let day5part1 input crit (md5sum:string -> string) =
  Seq.initInfinite (fun i -> input + string i)
  |> Seq.map md5sum
  |> Seq.filter (fun m -> m.StartsWith crit)
  |> Seq.take 8
  |> Seq.map (Seq.item 5 >> string)
  |> Seq.reduce ( + )

When I ran the function, I got F77A0E6E, which is the correct answer.

Problem - Part 2

The problem statement for Part 2 still uses the MD5 function, but adds a twist to the password computation. We still have the same information from the problem.

  1. The door ID
  2. An increasing integer index, starting at 0.

We are still only interested in hashes that start with five 0's, i.e. 00000. The sixth character of the hash now indicates the position in the solution and the seventh character is what goes into that position.

If we get multiple hashes that meet the criteria and indicate the same position, all but the first match are ignored. Hashes that meet the criteria but indicate an invalid position, i.e. anything other than 0-7, are also ignored.

Solution - Part 2

To solve Part 2, I wanted to use a similar approach to Part 1. However, I needed to way to determine when the algorithm had collected enough values to form a complete answer.

My overall strategy was as follows.

  1. Start by producing indices starting at 0.
  2. Map all the strings to hashes.
  3. Remove all hashes that don't match the criteria.
  4. Once I have a character for each position of the answer, stop producing indices and get the final answer.

Storing the pieces of the answer

I chose to use a Dictionary<int, string> collection to hold the values that would form the answer. The keys represent the position in the answer and the values are the single-character strings that, when concatenated, would form the answer.

First, I created a value that would hold the relevant pieces.

let answer = Dictionary<int, string> 8

Second, I created an add function that would only add values to the Dictionary if a value for the key was not already present.

let add (d:Dictionary<_,_>) kv =
  if not <| d.ContainsKey(fst kv) then
    d.Add(fst kv, snd kv)
  else ()

Third, I created a notComplete function to check whether I had collected enough values for an answer.

let notComplete (d:Dictionary<_,_>) = d.Count < 8

Collecting the pieces of the answer

Once I had decided on a storage strategy, I wrote a function to add values to the Dictionary until I had enough to get an answer.

The overall strategy for this part of the logic depends on the Seq.takeWhile function. This function will terminate the infinite sequence generator once the Dictionary is "full".

Seq.initInfinite (fun i -> input + string i)
|> Seq.map md5sum
|> Seq.filter
  (fun m ->
    (m.StartsWith crit)
      && (m.[5] = '0' || m.[5] = '1' || m.[5] = '2' || m.[5] = '3'
      || m.[5] = '4' || m.[5] = '5' || m.[5] = '6' || m.[5] = '7'))
|> Seq.map (fun m -> m.[5] |> string |> System.Int32.Parse, m.[6] |> string)
|> Seq.takeWhile (fun _ -> notComplete answer)
|> Seq.iter (fun m -> add answer m)

Producing the answer

As the last step, I pulled the values from the Dictionary and concatenated the strings to get the final answer.

I chose to explicitly pull the values based on key lookups because, while there are library functions to get a collection of the values, I did not see any guarantees about the order that the items would be in.

[
  for i in 0 .. 7 do
    yield answer.[i]
]
|> List.reduce ( + )

When I ran this function, I got back 999828EC which is, in fact, the correct answer.

Some side notes

Mutable vs. Immutable data structures

As you may have noticed, I am using a mutable data structure to store the needed values for the Part 2 solution. At the time, I struggled to find a "pure" functional way of solving the problem but couldn't come up with something off hand (and, to tell you the truth, I didn't try too hard since I had just finished the MD5 algorithm and wanted to be done with this problem).

However, the Part 2 algorithm can easily be rewritten using a tail-recursive function and immutable data structures. The tail-recursive part is important because, while I don't know how many hashes the algorithm has to go through, I do know that even the OOTB MD5 algorithm takes over 2 minutes to calculate the answer (and my functional MD5 algorithm takes over 7 minutes). From that, I assume that the stack would get very deep if the function was not tail-recursive.

Seq vs. ParallelSeq

In order to speed up the calculation, I attempted to use the ParallelSeq library to take advantage of my CPU's multiple cores. Using the ParallelSeq library did speed up my calculations considerably.

However, one thing I learned is that the ParallelSeq library does not take care of putting the calculated values back in their original order. Instead, it is the client's responsibility to track the order of values (if that is important - and in this case it was).

As an example, when I used the ParallelSeq library, the system produced C9E29889 as the answer for Part 2. However, the correct answer for Part 2 is 999828EC, which are the same characters but in the correct order.

In the end, I removed ParallelSeq from this solution but it is a library that I intend to investigate in the future.

Lessons learned

Here are the lessons I learned from this exercise.

  1. Take the time to implement a solution the way you want, so that you do not later have regrets about it. In this instance, I regret choosing a mutable data structure and imperative algorithm to solve Part 2 and it's something I hope to rectify in the near future.
  2. Just as when I implemented the MD5 algorithm in F#, it is critical to be extra vigilant when reading code. Much more so than any other technique, reading and understanding code is the fastest way to find and fix most bugs.

See you next time!

PS: Solution - Part 2 (Pure functional implementation)

After I wrote this blog post, but before posting it, I decided to rewrite the Part 2 solution using immutable data structures.

Rewriting the function was quite simple, and relied on the following strategy.

  1. Define a completion check to terminate the recursive function.
  2. Define a tail-recursive function to calculate the 8 characters of the answer.
  3. Combine the 8 characters to form the final answer.

Here is the code for the pure functional implementation of Part 2, broken into 3 chunks.

let day5part2alt input crit (md5sum : string -> string) =
  let complete l = l |> List.filter ((=) "") |> List.isEmpty

  let rec collector i acc =
    if complete acc then acc
    else
      let m = md5sum (input + string i)
      if (m.StartsWith crit) then
        match m.[5] with
        | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ->
          List.mapi (fun idx elt ->
            if elt = "" && idx = (m.[5] |> string |> System.Int32.Parse) then
              string m.[6]
            else elt) acc
          |> collector (i + 1)
        | _ -> collector (i + 1) acc
      else collector (i + 1) acc

  [ ""; ""; ""; ""; ""; ""; ""; "" ]
  |> collector 0
  |> List.reduce (+)

Running this function provides the same answer as above, i.e. 999828EC.

Performance comparison with day5part2

I compared day5part2 and day5part2alt to see whether my new implementation matches up to the previous one.

On average, I received the following results.

day5part2 vs. day5part2alt, raw numbers.
Function Real CPU Gen0 Gen1 Gen2

day5part2

174.758

174.689

21623

18

2

day5part2alt

166.528

166.437

20590

17

1

And here are the same results, converted to percentages.

day5part2 vs. day5part2alt, percentages.
Function Real CPU Gen0 Gen1 Gen2

day5part2

105%

105%

105%

106%

200%

day5part2alt

100%

100%

100%

100%

100%

Ignoring the Gen2 result, the mutable version is actually 5% worse across the board, in both processing time and increased garbage collection. It pays to be immutable.

See you next time!

22 April 2017

MD5 in F#, Functionally

Table of Contents

MD5 in F#, Functionally

Code for this post can be found on GitHub.

The Advent of Code Day 5 problem presented a very interesting challenge - it required an MD5 hash function to get the final answer.

As my goal with the AOC challenges is to increase my knowledge of F#, I chose to:

  • Implement my own version of the MD5 algorithm
  • Implement it functionally (no mutation, if possible)

Out-of-the-box implementation

.NET has an out-of-the-box implementation of MD5, which can be accessed quite easily from F#. It resides in the System.Security.Cryptography namespace. Here is a function that takes in a string and outputs its hash as a string.

let md5ootb (msg: string) =
  use md5 = System.Security.Cryptography.MD5.Create()
  msg
  |> System.Text.Encoding.ASCII.GetBytes
  |> md5.ComputeHash
  |> Seq.map (fun c -> c.ToString("X2"))
  |> Seq.reduce ( + )

This helps immensely in two ways:

  • It makes it very easy to check whether my hash function is working correctly because its output must match the OOTB MD5 algorithm's output.
  • It provides a baseline against which to compare performance.

Planning the implementation

First, let me just say that I have never implemented a "cryptographic" algorithm, such as a hash function, encryption algorithm, etc. I knew that the implementation would be more difficult than a normal algorithm and so I spent a considerable amount of time reading the following resources.

  1. RFC 1321: The MD5 Message-Digest Algorithm
  2. Wikipedia: MD5
  3. Rosetta Code: MD5/Implementation

The first two resources helped me understand the flow of logic. The last resource is a compilation of MD5 implementations collected in one spot and, luckily, written in multiple languages. This gave me different perspectives on how the code can be broken up, alternatives for implementation, etc.

I relied especially heavily on the Java, C#, Python, and Haskell implementations because those are the languages I am most familiar with.

For reference, here is the entire listing from Wikipedia.

'//Note: All variables are unsigned 32 bit and wrap modulo 2^32 when calculating
var int[64] s, K

'//s specifies the per-round shift amounts
s[ 0..15] := { 7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22 }
s[16..31] := { 5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20 }
s[32..47] := { 4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23 }
s[48..63] := { 6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21 }

'//Use binary integer part of the sines of integers (Radians) as constants:
for i from 0 to 63
    K[i] := floor(2^32 × abs(sin(i + 1)))
end for
'//(Or just use the following precomputed table):
'// ... (I did not use a pre-computed table, so I've removed it for brevity sake)

'//Initialize variables:
var int a0 := 0x67452301   //A
var int b0 := 0xefcdab89   //B
var int c0 := 0x98badcfe   //C
var int d0 := 0x10325476   //D

'//Pre-processing: adding a single 1 bit
append "1" bit to message
'// Notice: the input bytes are considered as bits strings,
'//  where the first bit is the most significant bit of the byte.[48]


'//Pre-processing: padding with zeros
append "0" bit until message length in bits ≡ 448 (mod 512)
append original length in bits mod (2 pow 64) to message


'//Process the message in successive 512-bit chunks:
for each 512-bit chunk of message
    break chunk into sixteen 32-bit words M[j], 0 ≤ j ≤ 15
'//Initialize hash value for this chunk:
    var int A := a0
    var int B := b0
    var int C := c0
    var int D := d0
'//Main loop:
    for i from 0 to 63
        if 0 ≤ i ≤ 15 then
            F := (B and C) or ((not B) and D)
            g := i
        else if 16 ≤ i ≤ 31
            F := (D and B) or ((not D) and C)
            g := (5×i + 1) mod 16
        else if 32 ≤ i ≤ 47
            F := B xor C xor D
            g := (3×i + 5) mod 16
        else if 48 ≤ i ≤ 63
            F := C xor (B or (not D))
            g := (7×i) mod 16
'//Be wary of the below definitions of a,b,c,d
        dTemp := D
        D := C
        C := B
        B := B + leftrotate((A + F + K[i] + M[g]), s[i])
        A := dTemp
    end for
'//Add this chunk's hash to result so far:
    a0 := a0 + A
    b0 := b0 + B
    c0 := c0 + C
    d0 := d0 + D
end for

var char digest[16] := a0 append b0 append c0 append d0 '//(Output is in little-endian)

'//leftrotate function definition
leftrotate (x, c)
    return (x << c) binary or (x >> (32-c));

Considerations

There are a few concerns I had about this implementation.

  1. The algorithm assumes that all values are stored in the little-endian format.
  2. The algorithm requires a bitwise-left-rotate function, which F# does not provide.
  3. How to store a 128-bit hash.

Initially, and I'm not sure why I thought this, I was under the assumption that Intel CPUs use the big-endian format. However, they actually use little-endian, which makes that consideration irrelevant.

It took me an embarrassing amount of time to hunt down a bug where I forgot that the F# left-shift operator <<< is NOT a left-rotate operator. Once I narrowed down on the bug, it was easy enough to implement a function to perform the rotations. However, this was a good lesson in never assuming anything when reading code - my eyes were just glazing over the <<< operator, even though the problem was staring me in the face.

Finally, I chose to store the values as 4 uint32 integers. This is similar to what the RFC authors, Wikipedia authors, and other implementations did on Rosetta Code. While F# does have the uint64 type, it would have required translating all the 32-bit-based pseudo-code / code (introducing an element of risk) and I'm not sure what benefits, if any, it would have provided.

Mapping from Wikipedia to F#

I started my analysis and design phase by taking the pseudo-code from Wikipedia and mapping it to how I wanted to break up my implementation. I tried to use the same names for variables and, for consistency's sake, I ended up using the Wikipedia naming scheme.

Oddly enough, Wikipedia uses different variable names compared to the RFC pseudo-code and I did not find a good reason for why the original writers did this.

Mapping from Wikipedia's pseudo-code to my F# implementation (function and value signatures).
# Wikipedia F#

1

'//s specifies the per-round shift
'//amounts
var int[64] s
val s : int list

2

'//Use binary integer part of the sines
'//of integers (Radians) as constants:
var int[64] K
val k : uint32 list

3

'//Initialize variables:
var int a0
var int b0
var int c0
var int d0
type MD5 =
  {a: uint32;
   b: uint32;
   c: uint32;
   d: uint32;}
val initialMD5 : MD5

4

'//Pre-processing: adding a single 1 bit
append "1" bit to message

'//Pre-processing: padding with zeros
append "0" bit until message length in
bits ≡ 448 (mod 512)

append original length in bits mod (2
pow 64) to message
val padMessage : msg:byte [] -> byte []

5

'//Process the message in successive
'//512-bit chunks:
for each 512-bit chunk of message
// Array.chunkBySize and Array.fold
val md5sum : msg:string -> string

6

'//Initialize hash value for this chunk:
var int A, B, C, D

'//Main loop:
for i from 0 to 63
  '// NOT THE CONTENTS OF THE LOOP
end for

'//Add this chunk's hash to result so
'//far:
a0, b0, c0, d0
// List.fold
val md5plus : m:MD5 -> bs:byte [] -> MD5

7

'//Main loop: (contents)
if ...
else if ...
else if ...
else if ...

dTemp, D, C, B, A
val md5round : msg:uint32 [] -> MD5 ->
i:int -> MD5

// Includes:
val fghi : (uint32 -> uint32 -> uint32
-> uint32) list

val gIdxs : int list

Implementation

Step 1: Define the shift in each of the 64 rounds

On the Wikipedia page, the per-round shift amounts are hard-coded as a 64-value int array.

s[ 0..15] := { 7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22 }
s[16..31] := { 5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20 }
s[32..47] := { 4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23 }
s[48..63] := { 6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21 }

In F#, I used some convenience functions in the List module to achieve the same result.

let s =
  [[7; 12; 17; 22]; [5; 9; 14; 20]; [4; 11; 16; 23]; [6; 10; 15; 21]]
  |> List.collect (List.replicate 4)
  |> List.concat

val s : int list

Step 2: Calculate the binary integer portion of integer sines

Wikipedia provides two methods of calculating the constants that are based on the sine function. First, here is the description from the RFC.

... constructed from the sine function. Let T[i] denote the i-th element of the table, which is equal to the integer part of 4294967296 times abs(sin(i)), where i is in radians.

Wikipedia's two methods of defining K[i] (Wikipedia's name for T[i]) are as follows:

for i from 0 to 63
    K[i] := floor(2^32 × abs(sin(i + 1)))
end for
'//(Or just use the following precomputed table):
K[ 0.. 3] := { 0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee }
K[ 4.. 7] := { 0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501 }
K[ 8..11] := { 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be }
K[12..15] := { 0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821 }
K[16..19] := { 0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa }
K[20..23] := { 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8 }
K[24..27] := { 0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed }
K[28..31] := { 0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a }
K[32..35] := { 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c }
K[36..39] := { 0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70 }
K[40..43] := { 0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05 }
K[44..47] := { 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665 }
K[48..51] := { 0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039 }
K[52..55] := { 0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1 }
K[56..59] := { 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1 }
K[60..63] := { 0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391 }

For my solution, I chose to go the function route instead of hard-coding the table into the source code. This was primarily for brevity's sake. A side benefit was that it allowed me to put in a long and oddly satisfying function chain.

let k =
  [1. .. 64.] |> List.map (sin >> abs >> (( * ) (2.**32.)) >> floor >> uint32)

val k : uint32 list

Step 3: Define the types

I defined one new type for this problem, MD5. This record holds an MD5 hash as 4 uint32 values.

type MD5 =
  {
    a : uint32
    b : uint32
    c : uint32
    d : uint32
  }

let initialMD5 =
  {
    a = 0x67452301u
    b = 0xefcdab89u
    c = 0x98badcfeu
    d = 0x10325476u
  }

Step 4: Padding the message

The MD5 algorithm doesn't require a message of a certain length, but instead adds padding so that the final message length is a multiple of 512 bits (or 64 bytes).

There are 3 steps involved in padding:

'//Pre-processing: adding a single 1 bit
append "1" bit to message

'//Pre-processing: padding with zeros
append "0" bit until message length in bits ≡ 448 (mod 512)
append original length in bits mod (2 pow 64) to message

This part actually gave me some trouble because I thought that I had to add one bit (containing 1) to my message in the MSB (most significant bit) slot, and then start appending 0s or the message length immediately after that. That turns out not to be the case, and reading the RFC and other implementations on Rosetta Code provided a good solution to the problem.

Essentially, following the message, we need to add a 0x80 16-bit word followed by the length (restricted to 8 bytes). The intervening space, if any, is filled with 0 bytes.

let padMessage (msg : byte []) =
  let msgLen = Array.length msg
  let msgLenInBits = (uint64 msgLen) * 8UL

  let lastSegmentSize =
    let m = msgLen % 64
    if m = 0 then 64
    else m

  let padLen =
    64 - lastSegmentSize + (if lastSegmentSize >= 56 then 64
                            else 0)

  [|
    yield 128uy
    for i in 2..padLen - 8 do
      yield 0uy
    for i in 0..7 do
      yield ((msgLenInBits >>> (8 * i)) |> byte)
  |]
  |> Array.append msg

val padMessage : msg:byte [] -> byte []

This is the one part of the implementation that had the highest potential to be "impure". Specifically, the part that constructs the padding. However, F# array comprehensions removed the need to mutate an array to construct the padding before it is appended to the msg.

Step 7: The core of the main loop (Yes, I know this is out of order)

The Wikipedia pseudo-code is written in an imperative manner where the main loops are encountered before we get to the meat of the hash construction algorithm. However, in order to explain my construction, I will start by looking at the implementation for a single iteration of the "Main loop", and then build my way out from there.

The Wikipedia pseudo-code for this step is as follows.

        if 0 ≤ i ≤ 15 then
            F := (B and C) or ((not B) and D)
            g := i
        else if 16 ≤ i ≤ 31
            F := (D and B) or ((not D) and C)
            g := (5×i + 1) mod 16
        else if 32 ≤ i ≤ 47
            F := B xor C xor D
            g := (3×i + 5) mod 16
        else if 48 ≤ i ≤ 63
            F := C xor (B or (not D))
            g := (7×i) mod 16
'//Be wary of the below definitions of a,b,c,d
        dTemp := D
        D := C
        C := B
        B := B + leftrotate((A + F + K[i] + M[g]), s[i])
        A := dTemp

In F#, I have three separate constructs that we need to look at. First, I followed the technique used by a few other implementations and collected the functions represented by F in a function list. This allows index-based access to the correct function.

let fxyz x y z : uint32 = (x &&& y) ||| (~~~x &&& z)
let gxyz x y z : uint32 = (z &&& x) ||| (~~~z &&& y)
let hxyz x y z : uint32 = x ^^^ y ^^^ z
let ixyz x y z : uint32 = y ^^^ (x ||| ~~~z)

let fghi =
  [fxyz; gxyz; hxyz; ixyz]
  |> List.collect (List.replicate 16)

val fxyz : x:uint32 -> y:uint32 -> z:uint32 -> uint32
val gxyz : x:uint32 -> y:uint32 -> z:uint32 -> uint32
val hxyz : x:uint32 -> y:uint32 -> z:uint32 -> uint32
val ixyz : x:uint32 -> y:uint32 -> z:uint32 -> uint32
val fghi : (uint32 -> uint32 -> uint32 -> uint32) list

Second, I turned g into a set of functions and, as above, collected the pre-computed indices in a list.

let g1Idx = id
let g2Idx i = (5*i + 1) % 16
let g3Idx i = (3*i + 5) % 16
let g4Idx i = (7*i) % 16

let gIdxs =
  [g1Idx; g2Idx; g3Idx; g4Idx]
  |> List.collect (List.replicate 16)
  |> List.map2 (fun idx func -> func idx) [0..63]

val g1Idx : ('a -> 'a)
val g2Idx : i:int -> int
val g3Idx : i:int -> int
val g4Idx : i:int -> int
val gIdxs : int list

Finally, I implemented a single round of the MD5 computation.

let md5round (msg:uint32[]) {MD5.a=a; MD5.b=b; MD5.c=c; MD5.d=d} i =
  let rotateL32 r x = (x<<<r) ||| (x>>>(32-r))
  let f = fghi.[i] b c d
  let a' = b + (a + f + k.[i] + msg.[gIdxs.[i]] |> rotateL32 s.[i])
  {a=d; b=a'; c=b; d=c}

val md5round : msg:uint32 [] -> MD5 -> i:int -> MD5

The md5round function takes in the computed-so-far MD5 hash and an index, calculates the updated hash, and passes it back to the caller.

Step 6: Hash a single 512-bit chunk

This step actually implements the "Main loop", using the logic we coded for a single iteration of the loop.

Here is Wikipedia's pseudo-code, although I have removed the text (replaced by a comment) that is part of Step 7.

for each 512-bit chunk of message
    break chunk into sixteen 32-bit words M[j], 0 ≤ j ≤ 15
//Initialize hash value for this chunk:
    var int A := a0
    var int B := b0
    var int C := c0
    var int D := d0
//Main loop:
    for i from 0 to 63
        '// SEE STEP 7
    end for
'//Add this chunk's hash to result so far:
    a0 := a0 + A
    b0 := b0 + B
    c0 := c0 + C
    d0 := d0 + D
end for

Note that although Wikipedia uses the labels a0, b0, c0, and d0, these are mutable values that permanently change with each 512-bit chunk that is processed. Thus, each new 512-bit chunk starts with these 4 values that have been changed by all the previous chunks that have been processed.

The flow in this part of the code, without the details from Step 7, can be broken down into 3 parts:

  1. Start with an initial value.
  2. Run an algorithm a limited number of times, wherein each execution of the algorithm uses the results from the previous run as the input.
  3. Take the final value and continue with the overall flow.

This is a perfect scenario for a fold operation. Here is the F# code from my implementation.

let md5plus m (bs:byte[]) =
  let msg =
    bs
    |> Array.chunkBySize 4
    |> Array.take 16
    |> Array.map (fun elt -> System.BitConverter.ToUInt32(elt, 0))
  let m' = List.fold (md5round msg) m [0..63]
  {a=m.a+m'.a; b=m.b+m'.b; c=m.c+m'.c; d=m.d+m'.d}

val md5plus : m:MD5 -> bs:byte [] -> MD5

md5plus expects the caller to pass in the initial value for the fold and the message that must be processed.

md5round operates on a message (that will not change for a single 512-bit chunk), an MD5 value, and an index. Those are the values that md5plus is using for the fold.

At the end, md5plus passes back an updated MD5 hash which incorporates the changes derived from processing the given 512-bit chunk.

Step 5: Process the entire message and derive a single MD5 hash

The last step is to bring together all the individual pieces into a single algorithm that can:

  • Take in a message of any length
  • Pad the message correctly
  • Calculate the MD5 hashes for each 512-bit chunk
  • Produce a single MD5 hash.

Wikipedia's pseudo-code for this step is quite simple.

var char digest[16] := a0 append b0 append c0 append d0
'//(Output is in little-endian)

However, since I built the solution as a set of functions, this final function must do a little more work.

let md5sum (msg: string) =
  System.Text.Encoding.ASCII.GetBytes msg
  |> padMessage
  |> Array.chunkBySize 64
  |> Array.fold md5plus initialMD5
  |> (fun {MD5.a=a; MD5.b=b; MD5.c=c; MD5.d=d} ->
    System.BitConverter.GetBytes a
    |> (fun x -> System.BitConverter.GetBytes b |> Array.append x)
    |> (fun x -> System.BitConverter.GetBytes c |> Array.append x)
    |> (fun x -> System.BitConverter.GetBytes d |> Array.append x))
  |> Array.map (sprintf "%02X")
  |> Array.reduce ( + )

val md5sum : msg:string -> string

The main part of the algorithm is just the first 4 lines:

let md5sum (msg: string) =
  System.Text.Encoding.ASCII.GetBytes msg
  |> padMessage
  |> Array.chunkBySize 64
  |> Array.fold md5plus initialMD5

The rest of the function is devoted to converting the MD5 hash into a user-friendly representation.

Testing

Based on the testing I've done so far, this algorithm works correctly and produces exactly the same results as the OOTB MD5 algorithm.

I first started by implementing the "standard" tests (originally specified in the RFC) and comparing the results with the OOTB algorithm.

Then, I added an FsCheck test that compared my algorithm against the OOTB one for 10,000 tests - all passed without any problems. Now that I've finally learned how to properly use FsCheck at a basic level, I'm starting to find it difficult to completely trust my results unless all the FsCheck tests pass. I don't think that this is necessarily a bad thing, because it tends to make explicit assumptions that I made in my code (e.g. for my MD5 algorithm, I assume that the original message is an actual string and not just a NULL).

Performance

Raw data for performance measurements on GitHub.

I measured performance by using the AOC Day 5, Part 1 problem, which computes thousands of MD5 hashes. Here are the results, derived using F# interactive's #time directive. I ran each test 3 times and averaged the results.

Average run-time and garbage collection performance of the algorithms, in seconds
Algorithm Real CPU Gen0 Gen1 Gen2
OOTB 53.37 53.37 6780.33 6.00 0.67
Functional 172.79 172.76 34754.00 17.67 1.67

Here is the same chart, but with percentages.

Average run-time and garbage collection performance of the algorithms as a percentage, with the OOTB algorithm as the baseline.
Algorithm Real CPU Gen0 Gen1 Gen2
OOTB 100% 100% 100% 100% 100%
Functional 324% 324% 513% 294% 250%

Obviously, my functional version of this algorithm is a LOT worse than the OOTB implementation.

Lessons Learned

  1. Be extremely careful of assumptions when reading code. Each character and symbol should be analyzed to ensure that it is appropriate for the task at hand.
  2. Define the tee function at the beginning of each project because it WILL be useful at some point.
  3. It's been a goal of mine to implement a "cryptographic" algorithm for a few years now, but I never found the time before. I have to say that it was a lot of fun and I can't wait to do it again!
    • Not strictly a lesson, unless that lesson is "have fun with hobby projects".

See you next time!