Notes on CCRMA MIR Workshop, Summer 2011

These are my personal notes on the MIR Workshop that I attended at CCRMA at Stanford University in the summer of 2011. There is a course wiki that has much more detailed info. https://ccrma.stanford.edu/wiki/MIR_workshop_2011

Note: I am not going to link to much in my notes, as all of those links are in the wiki.

The Knoll – home of the Center for Computer Research in Music and Acoustics (CCRMA) at Stanford University

Day 1

Today’s lectures were presented by Jay LeBouef and Rebecca Fiebrink. The early lecture dealt with the three main components of any MIR process: segmentation, feature extraction, and analysis/decision making. My main take away from this lecture was that selecting the method of segmentation and types of features extracted are important decisions driven by the goal of the project. Different segmentation methods and features extracted work best for different types of applications. My note on segmentation was that the segments should have musically relevant boundaries, and be separated by some perceptual cue.

We looked at 5 different features to extract. Zero crossing rate is pretty easy to calculate and seemed more useful than I imagined in certain contexts. We also looked at the spectral moments: centroid, bandwidth/spread, skewness, and kurtosis. We did some Matlab tutorials using some custom toolboxes that extracted these features. The labs are on the wiki.

Prof. Fiebrink talked about Weka and her program based on Weka, the Wekinator. She explained some supervised learning algorithms, specifically k-NN (k nearest neighbor). The Wekinator looks to be a very useful tool, that may prove crucial to my dissertation project.

Day 1 has made me confident that I will find the tools I need for my dissertation project, and that I will be able to understand them enough to use them.

posted Monday, 27 June 2011, 10 PM PDT

Day 2

The Day 2 lectures were by Stephen Pope and Leigh Smith. Most of today’s code examples were in C/C++, and the general topic was a more in depth look at feature extraction. We started with a demo of MusicEngine, which is a similarity filter that uses content derived metadata. My big take away from the demo was that the selection and weighting of feature vectors is one of the most significant factors that determines the effectiveness of a program. Knowing what perceived qualities you want to sort, and the feature vectors that best discriminate those perceived qualities is crucial. I think developing an idea of those correlations is one of the keys to being good at MIR.

In the later lecture Leigh Smith talked about onset detection and beat mapping. It was an excellent presentation but pretty complicated. Lots of Day 2 lecture slides on the wiki.

Much of the lab time was spent installing libraries that are used in the C/C++ code. Going through that process a number of times was good for me. I am much more confident with it now. I got the examples to work, and I think some tweaking of those examples will end up being part of my final project this week.

Day 3

The Day 3 lectures were by Stephen Pope and Steve Tjoa. In the morning Pope talked about 2nd stage processing. Once we have a huge set of feature data we need to prune it and/or smooth it. The bottom line is determining what features give us information gain.

In the later lecture Tjoa explained Non-negative Matrix Factorization, which is used for source separation in polyphonic audio files. He explained the math to us, and we ran the code in Matlab, and it still seems like magic. In the simpler examples the amount of source separation is really astounding. Check out his slides from Day 3 on the wiki.

NB- John Chowning was hanging out at CCRMA today. We spoke briefly. He remembered me from his visit to LSU, and we had a fun conversation about subwoofers. What a nice man.

Updated Wednesday 29 June 2011, 10 PM PDT

Days 4 & 5

As the week went on, we moved to some higher level concepts. Day 4’s lecturer was George Tzanetakis. He started the day talking about hearing and pitch perception, and symbolic musical representations (like notation, MIDI, etc). Most of the technical aspects of his lecture dealt with pitch detection, and the idea of mapping chroma (pitch class) as opposed to actual pitch. In the afternoon we looked at Marsyas, which is a powerful and flexible (and fairly complex) MIR toolset that George wrote.

The Day 5 lecture was by Douglas Eck, from Google Research. He talked about music recommendation methods, based on various models of both human and machine generated data. He made a great arguement for the inclusion of content generated data in the recommender algorithms. In the lab we did some comparisons of various ways of looking at and making recommendations from the CAL500 dataset.

We closed the day with a nice tour of CCRMA. It is quite a facility. There is a 3D listening room with ambisonic capabilities. We listened to some music by Fernando Lopez-Lezcano that was made for that space and took great advantage of the speakers above and below the listening position. We also got a bit of a history lesson, hearing about some of the seminal work done at CCRMA, and seeing one of the late Max Matthews radio batons.

All in all, it has been a great week. I learned a lot, and feel like I have found a starting point with some of the tools for my dissertation project. If you are considering attending a CCRMA workshop (or studying at CCRMA as a degree seeking student), I highly recommend it.

Updated Friday 1 July 2011, 10:40 PM PDT