Using ffmpeg to convert videos for import into iPhoto

If iPhoto is unable to import a video, you can convert it into a format it understands using e.g. $ ffmpeg -i "$input_path" -vcodec libx264 -preset medium \ -acodec aac -pix_fmt yuv420p "$output_path" $ timestamp=`GetFileInfo -m "$input_path"` $ SetFile -d "$timestamp" -m "$timestamp" "$output_path" You can set the preset to slow, if you want more compression, or fast if you want it to convert faster. The GetFileInfo and SetFile utilities are part of ffmpeg and are used to extract and add metadata. [Read More]

Easter Eggs, Bats, and Bubba

Yes, his name was Bubba. Yes, this story is true.

“You boys get over here!”, my stepfather shouted. The boys stopped and turned. My stomach turned too. These four boys always wanted to fight me. I just had to dodge them on the weekends I visited my Mom, and for months this had worked. But today they surprised me as I left Easter services — they cornered me and beat me with a baseball bat. I know how terrible that sounds but you should know it was a measured violence. [Read More]

Your DNA holds over 60 zettabytes of data

Your DNA holds over 60 zettabytes of data. That’s about 5,000 times the estimated information content of all human knowledge. There are four nucleobases in DNA, adenine [A], cytosine [C], thymine [T] and guanine [G], which require 2 bits each to store Each haploid cell (sperm or egg) in your body is made of 3,234.83 million base pairs Your somatic cells have twice as many base pairs with one set coming from your dad and the other coming from dear old mom There are an estimated 37. [Read More]

Introduction to Base Quality Score Recalibration (BQSR)

Thanks to Chris Hartl for writing the initial implementation of BQSR for ADAM and for taking the time to share his knowledge of BQSR with me over cappuccino at People’s Cafe. Hopefully this post will help others who are trying to understand how BQSR works. Drop a comment if you have any questions. DNA sequencing machines provide an estimate of the quality of each base (e.g. A, C, T or G) that they read. [Read More]

A powerful Big Data trio: Spark, Parquet and Avro

Note: A cleaner, more efficient way to handle Avro objects in Spark can be seen in this gist I love open-source projects that play nicely with others; no one likes to be locked into a single data processing framework or programming language. Mature open-source projects build software with integration and openness in mind to allow engineers to attack Big Data problems from a number of different angles using the most appropriate tool for the job. [Read More]