Toggle a track in iTunes with a hotkey

I have a lot of music in iTunes, particularly because I collect full albums instead of individual songs. The checkbox next to a song in iTunes controls whether or not the track is automatically played in sequential or shuffle mode.

iTunes checkboxes

I wanted a way to toggle the checkboxes easily, so I could uncheck and skip a track without opening the iTunes window, therefore tuning my iTunes library to what I like.

The solution is toggletrack.

When the toggletrack AppleScript runs, the currently playing track in iTunes is toggled. If unchecked (disabled from playing automatically), a Growl notification appears like this:

toggletrack unchecked

If I continue listening and decide I like the track after all, I can rerun the script to re-check the track:

toggletrack checked

I assigned the script to a hotkey using Quicksilver.

Quicksilver triggers

Downloads

git:

git clone git://github.com/msparks/toggletrack.git

or: .zip | .tar.gz

RGB LED Library

In my last post, I showed off the rgblinear demo. As the next step to making this more usable in Arduino projects, I’ve abstracted the RGB LED control code for setting colors and fading into a separate library, rgbled.

The updated rgblinear example is included in examples/rgblinear/.

Scanning the color spectrum

As part of a future project, I wrote some Arduino demo code that fades through the color spectrum using an RGB LED. Here’s a video:

The code is available on GitHub.

DS1620 digital thermometer library for Arduino

Working on a DS1620 temperature sensor library for the Arduino

My evening project last night was a C++ library to read the Maxim DS1620 digital thermometer. I’ve released the code on GitHub, and documented the project on my website.

Currently, I have a DS1620 connected to an Arduino, which is in turn connected to my laptop. I wrote a Python script to read the output from the Arduino, and submits data to pachube.com, a site that collects data from sensors and things. It has an easy to use API, and it’s definitely a lot easier than managing data collection and graphing manually.

Here’s a graph of my room’s temperature, using data from pachube:

Using my laptop as a bridge between the Arduino and the Internet is obviously inconvenient. Right now, my Python script automatically handles serial disconnection and reconnection, so I don’t need to restart my script if I disconnect the Arduino and reconnect it at a later time. However, I naturally lose data in the meantime. I’m waiting on some parts from Sparkfun, namely an Arduino Ethernet Shield, so that I can make the Arduino post to pachube directly.

Today’s neat thing: autossh+rscreen

I use irssi and bitlbee for my IRC and IM connections. I run irssi on a remote server, so I log in via ssh and use screen to keep the process persistent. Whenever I go somewhere (classes, meetings, etc.), I have to log into my server again and resume my screen session. I wanted a way to do this automatically.

Enter autossh. It’s a nice utility that reconnects ssh tunnels. There’s an included script, ‘rscreen’, that reattaches remote screen sessions, which is just what I wanted.

Now whenever I move around, I type ‘~.’ in the terminal window that was previously attached to screen to force the connection to die, and autossh/rscreen will reestablish the ssh connection and reattach to my screen session.

I recommend trying it. Beware, though, the default ‘rscreen’ script will remap your screen escape key (default is ^A) to something else. Take a quick look at the script before you run it.

New Infinite Machine and site design

Today I’m launching two things simultaneously: a new site design of quadpoint.org and a rewrite of Infinite Machine, the software behind quadpoint.org.

Old: Old

New: New

Back in February, I implemented something I called ‘Infinite Caching’ in infmx. It was basically a feature that allowed for virtually infinite cache expire times. Thus, all of the pages on my site would stay in memory, completely rendered, and requests theoretically never hit the disk because the cache was always warm. Whenever a content update was made, I ran a little utility to update the cache asynchronously. I had a git hook set up to run the utility whenever I pushed into the production content repository, so that changes would show up soon right after the push. It was pretty slick, or so I thought.

A couple of months ago I got the idea of an even better caching scheme: the persistent cache. The idea was that static HTML pages should be generated and served by the webserver directly instead of inserting the pages into a memory cache and served by Django. I hacked a persistent cache implementation into my cache-updating utility in July.

Not even a week later I had an epiphany: why is my setup so complicated? Why am I running Django when all of my pages are pre-rendered and in a cache? My dynamic web application was actually just generating the same static content over and over. So a week after implementing the pcache, I started ripping out all of the dynamic bits and rewriting Infinite Machine to implement Jekyll’s interface. Jekyll has the same idea: produce static pages from formatted content. (Aside: I considered switching to Jekyll, but it lacks some of the features that I use in infmx, like breadcrumbs and intrawiki linking.) Now, infmx produces static content just like it did before, except it writes it all to an output directory on the filesystem that can be served up by a webserver. The only dependency on Django now is for the template engine, which is only required during document compilation, and not required on the server.

There are a few major (and obvious) benefits to this. Firstly, it’s static content served directly by the webserver. Even though I was previously serving pages straight from memcached, the fastcgi + django + python + memcached overhead was significant. The same content served from a static file is a full order of magnitude faster: ~800 requests/second for django+memcached versus 8000 requests/second with static. For a 6KB page, that’s over 50MB/s. In other words, I’ll saturate my network link before the webserver (nginx) becomes a bottleneck.

Second, no more Django on my server. That means one less thing to fail and one less thing to make sure is running when the box reboots. My deployment process is now much easier, too. To update, I basically do:

% infmx && ./deploy.sh prod

where deploy.sh is a simple wrapper around rsync to mirror my local copy to my server. Simple.

The takeaway is this: think hard about what you’re doing with your web application. In the early days of the web, everything was static HTML. Then, scripting languages like PHP came along and everything was dynamic. Now that browsers and JavaScript no longer suck, we’re converging to a reasonable hybrid of the two. Try to avoid doing anything ‘smart’ on the fly. Preprocess as much as you can and let your webserver do the work. For fancy dynamic things, JavaScript and a lightweight server-side backend work well.

launching libgithub v1.0

Today I’m launching libgithub, a JavaScript library for displaying interesting GitHub data on websites.

Right now, it can generate badges for one or more commits to a project, and looks something like this:

libgithub badge

Commit badges are great for showing recent activity on a project page. A user can quickly get a feel for a project’s freshness without going to GitHub to see the commit history. For instance, I use commit badges on my projects index for this reason. Badges are also useful for a sidebar/widget display. My friend Jesse Crouch made a demo showing how libgithub badges can be used in a sidebar.

Badges are just a start; more widgets belong in libgithub. Read more about the project, how to use it, and how to contribute at the libgithub project page.

Engine Yard hashing contest post-mortem

It’s over. Finally. On July 20-21, 2009, Engine Yard held a contest to find the smallest hamming distance to a given SHA-1 hash by hashing permutations of 12 words from a given dictionary, and optionally 5 ASCII characters appended to the end. I’ve spent the last several evenings writing, rewriting, and optimizing my C implementation and distributed infrastructure.

This is my post-mortem of the project.

About the contest

Before I explain what I did specifically, I’ll explain what the above paragraph means. A hash function simply takes some input and produces some fixed-size output. They are deterministic, too, so the same input will always produce the same output. Hash values are essentially just very large numbers that input values map to. In the case of SHA-1, the output is 160 bits long, meaning that SHA-1 hash values range from 0 to 2160 - 1 = 1461501637330902918203684832716283019655932542975 (base 10).

The given phrase for the contest was “I would much rather hear more about your whittling project” which hashes to 1249c4b7f578204f10798c0269f8488280fb9981 in hexadecimal (base 16). In binary (base 2), this is:

0001001001001001110001001011011111110101
0111100000100000010011110001000001111001
1000110000000010011010011111100001001000
1000001010000000111110111001100110000001

To reiterate, the goal of the contest was to find a phrase of 12 words (and optionally 5 ASCII characters at the end) that would produce a SHA-1 hash with the fewest bit flips away from the challenge hash. The number of bit flips between hashes is called the hamming distance. For example, the hamming distance between “00011” and “10010” is 2; two bits need to be flipped in “00011” to turn it into “10010”.

Hash functions are supposed to generate fairly well-distributed hashes such that even a very small change (say, one bit flip in the input) will produce an entirely different hash output. This ‘randomness’ makes the math quite easy: if we assume that if we have a 50% chance of correctly guessing each of the 160 bits when generating the hash for a random phrase, the result is a binomial distribution. This means that for each phrase, we’re essentially flipping 160 coins and hoping to get as many heads as possible.

Some math

So, the question is, how hard is it, really, to get all heads, or even close? The short answer is: hard. For this contest, the winning entry had a hamming distance of 30. Generating hashes at random, the probability of getting a 30 is 1 in:

To even come close to a 30, one would need a very large number of powerful processors, or some hardware designed to do calculations. Steve Worley released a CUDA implementation for the contest on nVidia’s forums a few days ago. High-end nVidia GPUs were able to process hundreds of millions of hashes per second, whereas lowly CPU-only implementations could only do a few million per second per core.

My implementation

I wrote my implementation in C. It started out in C++, but after simplifying my searching algorithm, I removed a lot of complexity and dropped down to pure C. The fastest I was able to get out of my implementation was 3.2M hashes/sec/core on a Xeon 2.13GHz.

My code is available on GitHub: http://github.com/msparks/sham

My general searching algorithm:

  1. Pick a random set of 12 words
  2. For 1,000,000 iterations, randomize one of the ASCII characters at the end of the phrase per iteration.
  3. Calculate SHA-1.
  4. Repeat

My main goal was to do as little work as possible each iteration, and just crank out as many hashes as the processor could compute. Thus, I did not vary capitalization of letters, and only randomized one ASCII character per iteration, instead of all 5.

Here are some other things I did to reduce time per iteration, in no particular order:

  • Used OpenSSL for hashing. In the interest of time, correctness, and speed, I did not try to implement SHA-1 for this contest.
  • Compiler optimizations. -O3. A given.
  • Kept a SHA_CTX structure around for the current phrase. SHA1_Update() in libcrypto will calculate chunks of input at a time, and the result context can be stored in memory to be updated later and finalized. I picked a new set of 12 words every million iterations, and when I picked a new set, I calculated the SHA_CTX for that set and kept it in memory. Thus, each iteration only needed to run the characters since the last SHA chunk boundary and the following 5 ASCII characters, instead of the whole string.
  • Pick sets ‘intelligently’ based on length. pick_minimal_set() in my implementation would repeatedly pick a random set of 12 words until the length of the resulting phrase was within 5 characters of a chunk boundary (512 bits; 64 bytes). Thus, each iteration would have to SHA a maximum of 10 characters.
  • Use gcc builtin. Counting bits manually is slow. Use __builtin_popcount*() to do it for you.
  • Profile. I used Shark.

Distributed Infrastructure

Calculating SHA-1s from a well-known fixed dictionary is an embarrassingly parallel problem. Nevertheless, I still wanted an easy way to monitor all of the running instances. I wrote a simple Python client to send result phrases to a matching Python server sitting on my server. It computed the hamming distance and showed the new minimum distances as they came in.

Here’s the log from my monitoring server, showing timestamps and lowest distances. Hostnames removed.

2009-07-20 12:04:53,357 - *** shamserver starting ***
2009-07-20 12:04:53,357 - challenge phrase: I would much rather hear more about your whittling project
2009-07-20 12:04:53,357 - challenge hash: 1249c4b7f578204f10798c0269f8488280fb9981
2009-07-20 12:05:46,233 - [xxxxx] distance: 46
2009-07-20 12:05:46,233 - [xxxxx] phrase:
pieprzyk sass git utc regex command sqlserver sysoev threading newman 
sockets lampson ][;?=
2009-07-20 12:05:56,453 - [xxxxx] distance: 45
2009-07-20 12:05:56,453 - [xxxxx] phrase:
record rake oscon rfc1157 clone processes applet procs frozen inline chain 
resource ^QvH5
2009-07-20 12:06:12,061 - [xxxxx] distance: 44
2009-07-20 12:06:12,061 - [xxxxx] phrase:
debug chore amp renders behlendorf reports kurtz url joy icon response 
bigtable &CVpu
2009-07-20 12:06:30,838 - [xxxxx] distance: 43
2009-07-20 12:06:30,838 - [xxxxx] phrase:
lampson dependency key xkcd syslogs assoc mailers uniq json atbash helper 
browser 4c/3Q
2009-07-20 12:06:51,466 - [xxxxx] distance: 38
2009-07-20 12:06:51,466 - [xxxxx] phrase:
rfc959 error hillis lib debug href proc watson capistrano hawkes deprecate 
iverson 1,*D@
2009-07-20 12:11:49,624 - [xxxxx] distance: 36
2009-07-20 12:11:49,624 - [xxxxx] phrase:
textmate ubuntu montulli winograd paths joshp binary rails friedman tests 
kindi oikarinen dg'7O
2009-07-20 14:00:27,171 - [xxxxx] distance: 35
2009-07-20 14:00:27,171 - [xxxxx] phrase:
stearns sub nodoc whitfield thread parse beta solo erb kurtz torvalds 
nitems E,giH
2009-07-20 14:23:22,954 - [xxxxx] distance: 34
2009-07-20 14:23:22,954 - [xxxxx] phrase:
dystopia services syntax regexp heinemeier computes metaprogramming 
rescue summit yin record wadler 3[G^_
2009-07-20 20:44:23,431 - [xxxxx] distance: 32
2009-07-20 20:44:23,431 - [xxxxx] phrase:
blocks rfc2810 shannon nodes host rdoc wep joshp size perform applet 
rescue HlKLr

The lowest distance I found was 32. I estimate that I calculated about 25-30 trillion SHA-1s in the 30-hour period.

Final thoughts

Overall, the contest was a blast. I’ve done nothing else for the last few evenings, but writing the code and optimizing was a great learning experience. Using compiler builtins and Apple’s Shark were new to me. Shark is especially awesome and generally more accessible than gprof for profiling.

In hindsight, I should have focused on a CUDA implementation early on. Maybe for the next challenge in September.

Now that it’s over, I can go back to being productive for a while. :)

alphasign 1.0!

I just pushed out a new version of alphasign, my API for talking to LED signs like the Betabrite. With this new version I added a lot of documentation and examples, which are fairly useful for an API. The generated documentation can be found here. This is also my first project on PyPI.

Unfortunately, while I’m at college, I am away from my Betabrite, so I can’t add support for the DOTS files (essentially bitmaps for the sign) for quite a while. Maybe someone will send me a patch. :)