A powerful Big Data trio: Spark, Parquet and Avro

Note: A cleaner, more efficient way to handle Avro objects in Spark can be seen in this gist I love open-source projects that play nicely with others; no one likes to be locked into a single data processing framework or programming language. Mature open-source projects build software with integration and openness in mind to allow engineers to attack Big Data problems from a number of different angles using the most appropriate tool for the job. [Read More]

Playing with matches and CIGARs

Aligned reads in a SAM or BAM file typically have a Compact Idiosyncratic Gapped Alignment Report (CIGAR) string that expresses how the read is mapped to the reference genome.

Table of Cigar Operators

When I first read the CIGAR operator table (above), I was confused by two things:

  1. the match, M, operator description, “alignment match (can be a sequence match or mismatch)“, struck me as odd.
  2. the relationship between the M, = and X operators isn’t explained in the spec.

I hope this blog post helps others with the same questions.

[Read More]

Chabot 50K Trail Run Race Report

I ran my first 50K today – the Chabot 50K Trail Run. The volunteers at Inside Trail Racing impressed me with their professionalism, friendliness and genuine concern for my well-being. Inside Trails Racing put on one of the best trail runs I’ve been a part of. As an example, a volunteer at the Two Rocks aid station (~mile 23) gave me one of her personal water bottles when she saw I wasn’t carrying one (since I forgot it at home). [Read More]

Late 2009 iMac HDD Replacement

The harddrive in my iMac (Late 2009 27”) died last weekend and I decided to replace it myself. Here’s some quick tips if you find yourself in the same situation. Not sure if disk errors are your problem? Boot you Mac and press “Command-V” during startup for verbose boot output. You’ll see messages about “Disk I/O Error” during boot.

There’s a great tutorial on iFixit that explains step-by-step how to replace the drive. I found my iMac had a 3.5” Hitachi Model HDE721010SLA330 SATA 3.0 Gb/s drive once I cracked it open. You can replace the drive with any 3.5” SATA drive you like. I chose to replace it with a comparable Western Digital drive that had more cache.

[Read More]

Marin Headlands Marathon Race Report

I ran the Headlands Marathon today. It was my favorite marathon to date – beautiful scenery, wildlife, the sound of crashing waves, cool ocean breezes and friendly runners and staff. On the way out, there was an owl sitting on a branch close enough to the trail to touch. It sat there calmly on the branch looking at us pass as if to say, “What are you doing running around out here at 7am? [Read More]

St. Louis Stadium Run Race Report

.gallery .img1 {background-image: url('/img/stadium_run.png');} Stadium Run T-Shirt The picture above is from an old race t-shirt that I found in my closet. Luckily, it had the date of the run on it so it made it easier to search the web for information about the race. The race was a staggered start based on age. Older racers got to start first and younger racers had to try and catch them. [Read More]