I love open-source projects that play nicely with others; no one likes to be locked into a single data processing framework or programming language. Mature open-source projects build software with integration and openness in mind to allow engineers to attack Big Data problems from a number of different angles using the most appropriate tool for the job. This post explains how to combine Spark, Parquet and Avro to create a fast, flexible and scalable data analysis system.
Aligned reads in a SAM or BAM file typically have a Compact Idiosyncratic Gapped Alignment Report (CIGAR) string that expresses how the read is mapped to the reference genome.
When I first read the CIGAR operator table (above), I was confused by two things:
- the match,
M, operator description, “alignment match (can be a sequence match or mismatch)”, struck me as odd.
- the relationship between the
Xoperators isn’t explained in the spec.
I hope this blog post helps others with the same questions.
I ran my first 50K today — the Chabot 50K Trail Run. The volunteers at Inside Trail Racing impressed me with their professionalism, friendliness and genuine concern for my well-being. Inside Trails Racing put on one of the best trail runs I’ve been a part of. As an example, a volunteer at the Two Rocks aid station (~mile 23) gave me one of her personal water bottles when she saw I wasn’t carrying one (since I forgot it at home).
The weather couldn’t have been better too. At race start (8:30am PST), the temperature was 45F and was soon in the mid-50s, sunny with mild, cool breezes. Perfect.
The harddrive in my iMac (Late 2009 27”) died last weekend and I decided to replace it myself. Here’s some quick tips if you find yourself in the same situation. Not sure if disk errors are your problem? Boot you Mac and press “Command-V” during startup for verbose boot output. You’ll see messages about “Disk I/O Error” during boot.
There’s a great tutorial on iFixit that explains step-by-step how to replace the drive. I found my iMac had a 3.5” Hitachi Model HDE721010SLA330 SATA 3.0 Gb/s drive once I cracked it open. You can replace the drive with any 3.5” SATA drive you like. I chose to replace it with a comparable Western Digital drive that had more cache.
Open-source has been a part of Berkeley culture since the 1970′s when Bill Joy assembled the original Berkeley Software Distribution (BSD). As a reader of this blog, you probably know first-hand the time and effort it takes to create quality open-source software.
Over the last year, the AMPLab has seen exciting growth in the number of users and contributors. In order to keep code quality high, I’ve been hired to build a team of full-time engineers. I need to fill two software engineering positions immediately. Both positions require strong Linux skills and familiarity with EC2 and git. One position requires experience with one or more of Scala, Java, C++, Hadoop, Hive and NoSQL databases; while the other position will focus on automation where knowledge of scripting, Maven, Jenkins, and rpm/deb packaging is important.
This post is now hosted directly on the Cloudera blog. Cloudera will be able to provide much better support and more timely answers to your questions than I can.
The picture to the right is from an old race t-shirt that I found in my closet. Luckily, it had the date of the run on it so it made it easier to search the web for information about the race.
The race was a staggered start based on age. Older racers got to start first and younger racers had to try and catch them. The race started outside of Busch stadium and ended inside the stadium at center field.