Clojure Markdown Parsing Benchmarks

Jul 13th, 2020

I am working on setting up a new system for publishing content. I have a few different categories of content that I’m interested in creating. I’ll have to determine exactly what the taxonomy will be, but the broad categories will probably be computers, mountain biking, and more personal stuff including relationships and religion. The first step towards this new system is just to replace the technology behind this website.

This website is currently generated statically using a very old version of jekyll/octopress. Static site generation is really nice, but I think I’m going to want to add some more interactive features like small applications. Therefore, I decided to replace this static site generation approach with a Clojure application.

Since these posts currently are all written in markdown and then parsed and rendered into HTML before being served statically via nginx, I wanted to check to see how expensive it would be to parse and render the markdown into HTML on every page load. To evaluate, I used a couple of handy Clojure libraries – markdown-clj and criterium. Using markdown-clj it was fairly trivial to replicate the functionality of the markdown processing of octopress. It is even has the ablility to parse the metadata at the top of the markdown files. For example, this is the metadata that I have at the top of this post:

---
layout: post
title: "Clojure Markdown Parsing Benchmarks"
date: 2020-07-13 10:45:32 -0700
comments: true
categories: clojure programming
---

To parse that metadata, I simply had to pass in the :parse-meta? true option when parsing, like this:

(md/md-to-html file-name writer :parse-meta? true :reference-links? true)

Then the metadata is parsed nicely into a map for me:

:metadata #ordered/map ([:layout "post"] [:title "Clojure Markdown Parsing Benchmarks"] [:date "2020-07-13 10:45:32 -0700"] [:comments true] [:categories "clojure programming"])

You can see more detail in the source code on github.

Finally, I created an uberjar using lein uberjar, uploaded it to the DigitalOcean machine I intend to use, and ran the benchmark using criterium:

(crit/with-progress-reporting (crit/bench (md/parse-post "posts/2012-03-13-on-the-uncertainty-of-everything.md") :verbose))

Because of this issue, I also had to call flush afterwards to get the output to display correctly. Again, you can see more detail on github.

Once I ran the benchmark, criterium gave me some useful results:

Evaluation count : 3240 in 60 samples of 54 calls.
      Execution time sample mean : 19.447409 ms
             Execution time mean : 19.449686 ms
Execution time sample std-deviation : 909.567764 µs
    Execution time std-deviation : 928.443124 µs
   Execution time lower quantile : 18.431349 ms ( 2.5%)
   Execution time upper quantile : 21.718663 ms (97.5%)
                   Overhead used : 2.936410 ns

Found 5 outliers in 60 samples (8.3333 %)
        low-severe       4 (6.6667 %)
        low-mild         1 (1.6667 %)
 Variance from outliers : 33.6000 % Variance is moderately inflated by outliers

I can see there that it takes about 20ms to parse a typical markdown file for one of my blog posts. That would mean that, ignoring other overhead for serving a webpage, I could serve about 50 pages per second. That seems more than acceptable for the amount of traffic I expect to receive on this blog.

New Habits

Jul 12th, 2020

This blog is ancient, but let’s start a new habit with it. Let’s publish something. Every day. It doesn’t have to be on the blog, but writing is one of the easier things I can do. I also have a bunch of videos on my harddrive that I could edit and upload to YouTube. I just want to be creating something that I can point to every day.

I am actually a bit surprised that I was able to get the tooling for this blog back up and running after all these years. All it took was installing rbenv, an old version of Ruby – 1.9.3 – and an old version of bundler – 1.0.14 – and then everything worked. I hope to get a new site up before too long. I’ve started working on it here, but as you can see it has stalled. It’s hard to find time for things like that with a baby!

The Great Dvorak Distraction

Mar 24th, 2014

I am pretty fast at typing. I am by no means the fastest ever, but I can hold my own. I don’t really worry about how fast I can type, but I do slightly worry about repetitive strain injuries. Therefore, I have allowed myself to be distracted from learning to read faster. I have been playing around (yet again) with learning the Dvorak Simplified Keyboard layout, and I am (still) convinced that it is dramatically better than the traditional QWERTY layout.

The Dvorak layout is basically a keyboard layout that makes sense. Instead of the letters and symbols being more or less randomly placed as in the common QWERTY layout, the keyboard is laid out keeping two goals in mind:

Frequently used keys should be easier to reach with the stronger fingers (e.g. not the pinky).
Keys that are frequently pressed sequentually (e.g. consonants are usually followed by a vowel and vice versa) should be pressed by opposite hands.

These two goals make everything ergonomic and efficient. If you would like more details, then I strongly suggest you read this comic about it. It is very informative and mildly entertaining and by the end of it you may share my opinion of the superiority of Dvorak. Or you may not. But you should still read it. Of course, you can always find more detail in the wikipedia entry.

Despite it’s rationality, learning the Dvorak layout is difficult. I have spent a considerable amount of time on it at one point or another, but I have spent almost 20 years learning the normal layout… This makes it difficult for me to switch over to Dvorak completely, which is what I would need to do in order to really get good at it. This isn’t the first time I’ve tried. In fact, the das keyboards that I normally use (yes, I have two), I selected with alternative keyboard layouts in mind. There are no markings on the keyboard, so I feel like it doesn’t matter as much that I keep the QWERTY layout.

Even with prior practice, it is very hard to type at half speed during my everyday tasks. To illustrate, I’ll show you some of my results from typeracer.com (which is awesome). This is how fast I type normally:

normal typeracer

And this is how fast I type with Dvorak:

Dvorak typeracer

If you can imagine with me, typing with Dvorak basically feels like one of those dreams where you are trying to escape bigfoot, but you can’t run any faster than slow motion. It’s slightly depressing.

It gets worse. Shortcut keys. You have to develop completely new habits of using shortcut keys. Control-c for copying and control-v for pasting are no longer right next to each other. You could remap them, but I would not recommend it. Most shortcut keys start with letters for the thing you want to do (e.g. c for copy), so it’s not too hard to remember them. It’s just hard to develop the new habits.

Furthermore, all of the symbols ((), [], +, =, etc.) are in different locations, and, since I don’t type most of those very often, they are more difficult to learn. Coding is especially difficult. Coding is just typing, after all. Unfortunately, coding does slow my typing down, and when I’m using Dvorak it’s even worse. As evidence, I’ll show you some test results from typing.io, which has typing tests with real code and is also awesome. Here is normal layout:

normal typing.io

And this is with Dvorak:

Dvorak typing.io

As you can see, slightly depressing. In order to improve the situation, I thought it would be good to find a layout that put the commonly used coding symbols within easy reach. I did find such a layout. It’s called Programmer Dvorak Keyboard Layout. Don’t use it. I abandoned it for a couple of reasons. First of all, it rearranges all of the numbers. Why? I’m not sure (I guess the Dvorak layout normally had them arranged that way), but it makes it incredibly confusing because instead of 1, 2, 3 we now have 7, 5, 3… Not only that, but it puts the numbers in a shift position! So, to type a 7, I have to press shift and what used to be the 2 key. This was the deal breaker for me. All keyboard shortcuts that included a number were unusable for me on my Mac. I literally could not do the shortcut because command-shift-4 (for example) is its own shortcut combo.

So, I am still learning the Dvorak layout, but it’s hard. I was originally going to write this post using it, but by the time I had done the typing tests, I figured I didn’t want to waste the time. It has taken me a while to get up around 40 wpm, but not as long as I thought — probably a week for the initial learning and then a couple of weeks to improve speed. I plan on continuing to get faster at it here and there, and hopefully I can replace QWERTY completely one day. Because I really do believe in it.

One Week of Reading

Feb 19th, 2014

Well, I have been trying to read faster for about a week. My progress has been … disappointing. Granted, I have not actually “practiced” that much, but I have done a fair bit of regular reading, and most of the time I am conscious of trying to read faster. However, according to the first resource I consumed since embarking on this speed-reading adventure, I should spend time practicing if I really want to get fast.

I believe I have at least identified my main problem areas. These areas are subvocalization (“speaking” words in your head) and backtracking. I think subvocalization is definitely my biggest hurdle, and I’m really not sure how I will overcome it, but I am certainly going to try. I believe the first thing I’m going to try doing is just looking at words as fast as I can for a while, and see if that improves my speed of comprehension.

For reference, below is a screenshot of another test I took at freereadingtest.com. I only got 303 wpm and 75% accuracy. Ever so slightly better than last week, but nothing remarkable. Hopefully next week is more encouraging!

Freereadingtest.com week two

On Learning to Read … Faster

Feb 10th, 2014

I am a slow reader. I have known this for a while now. I have even tried to do something about it at one point, but a recent post by my cousin made me realize just how much potential reading speed I’m missing out on. She took the staples reading test that has been going around recently and found that she reads at 748 words per minute.

My cousin's reading test

By comparison, I took some tests and found that I read somewhere between 250 and 300 words per minute. And my reading comprehension is not always so good…

My staples test

My freereadingtest.com test

I took tests both at the staples website and at freereadingtest.com (I did the Level 9 test). I want to improve my reading speed considerably. I figure that there are two ways to read more: spend more time reading and read faster. I’ll start with reading faster, since that allows for reading more no matter the time spent reading.

I plan on posting weekly on my progress and what I’ve learned and practiced. As a sneak peek for next week, here is a screenshot from a reading test I took when I was reading out loud. Tune in next week for my progress report. Hopefully I will be over 300 words per minute, but I guess we’ll see!

Reading out loud

On the Uncertainty of Everything

Mar 13th, 2012

I watched a series of videos of an interview with Richard Feynman today. I highly recommend that you make the time to check them out. (If you don’t know who Richard Feynman is, you should look him up.) While watching the interview, I gained insight into a number of ideas.

One of these is the nature of knowing things. Feynman shares an anecdote of one interaction with his father. He asked his father why a ball in a wagon will roll to the back of the wagon when the wagon is pulled forward. When I heard this question, I immediately thought to myself, “Inertia.” But Feynman’s father answered, “Noone knows… The accepted belief is that things that are moving tend to stay moving and things that are at rest tend to stay at rest unless something pushes them, but noone knows why that is.” I found that very interesting and enlightening. There is a difference between knowing the name of something or being able to describe it, and knowing the why and the how of something. Why do objects at rest tend to stay at rest and objects in motion tend to stay in motion? Feynman expounds further on what it means to Know something throughout the interview.

I gained the other insight from Feynman’s discussion of his beliefs. He comments on both the ideas he believes in and the certainty with which he believes them. In fact, he has some very interesting things to say about his beliefs in God and religion in general. At one point, he says that the view of a scientist is that “we don’t know what’s true, everything is possibly wrong.” He then goes on to imply that this is a problem for religion. I agree with his premise. I don’t think we can know anything with absolute certainty. I am much more certain of some things than others, but I am not absolutely certain of anything. All of my beliefs depend on how my mind processes information; how my senses perceive and experience the world. I can be certain of nothing.

I do not see uncertainty as a problem for religion. This is how I see it: once one has accepted that there is no certainty, then there are two fundamental choices: either adopt faith in something or embrace the reality of not knowing. In reality, almost everyone chooses the former to one extent or another. Some choose a faith in the existence of a creator God and His love of mankind while others accept a faith in the absence of such a being, believing only in what they can sense and perceive (for even that is a type of faith, and as with all faith, some have more of it than others). Richard Feynman chose to accept the reality of not knowing. He states in the interview that being uncertain about things doesn’t scare him. Neither does it scare me, but yet I choose to have faith in something that cannot be studied by science.

My beliefs are based on my faith in three things. First of all, that God exists. Second, that the Bible was inspired by God and, interpreted correctly (ay, there’s the rub), is true. Third, that I can experience reality and reason about it using my mind and body. Everything else I believe (to greater or lesser degrees of certainty) depends on and sprouts forth from these three things.

I find it useful to break things down until they are as clear as I can make them in my mind. I hope that may be the case for some of you and that this post will encourage you to do the same. I would be very interested to hear your thoughts on the matter.

Why No Touchie: Upgrading to Ubuntu 11.10

Nov 2nd, 2011

I upgraded to Ubuntu 11.10 this evening and the first thing I noticed was that my touchpad no longer worked… Of course, I immediately suspected that “touchpad-indicator”, the application I installed to disable my touchpad when I have a normal mouse plugged in, was to blame. However, I decided to do some searching around first and found a couple of forum posts about similar problems. After reading about their woes, I decided to just try removing touchpad-indicator (sudo apt-get remove touchpad-indicator). After running that command line and restarting, everything worked fine. I don’t know what changed in the upgrade. Perhaps it would work again after reinstalling, but I’ll probably wait until having the touchpad enabled annoys me again before trying.

I also ran into a problem with the “ffi” Ruby gem. As I mentioned in the previous post, I am running this blog on Octopress, which is built on Ruby. When I would try to run rake generate (“generate” is the Octopress Rakefile target that builds all of the HTML from the markdown that I write these posts in), I would get something like the following error:

in `require': libffi.so.5: cannot open shared object file: No such file or directory

After some investigation, I found that this was easily fixed by running sudo gem pristine ffi.

Other than these two little issues, Ubuntu 11.10 has been pretty good to me so far. Of course, I’ve only used it for an hour or so, but it appears that it hasn’t totally hosed my system, so that’s a start. I have noticed that the alt-tab behavior between workspaces is now different. It will now switch to the next most recently used application no matter what workspace that application is in whereas before it would restrict alt-tab switching to the current workspace. I’ll have to see if I get used to that. If I don’t, I may end up looking around for a setting to change it (I would imagine it would be configurable). I guess we’ll see what other issues I run up against in the following weeks.

First!

Sep 30th, 2011

First post! Aww yeeeeah, I win. Seriously, though, these are the beginnings of a new blog which I have formed using the Octopress platform (some of you may have already figured that out from the theme). You can see the source of any changes I’ve made (mostly just creating my own posts/pages) here: https://github.com/xonev/StevenOxley.com.

If you would just like to learn more about me, you can go to the About page.

I still haven’t figured out exactly what I’m going to be making this blog about. Undoubtedly, the majority of the posts will be technical in nature, but I may also throw in the random “just-for-fun” post now and then. Hopefully we’ll all be enriched by whatever it turns into.

Blog Archives Newer →

Bytes and Bikes

Plus other interesting stuff. But mostly computer software and mountain biking.