(The following article is the result of joint collaboration between DailyTech, Last.fm, and Sun Microsystems.)
What’s it like to run what is perhaps the largest repository of information on the music people listen to? DailyTech recently had the opportunity to sit down with Last.fm cofounder Richard Jones to talk about operations, Last.fm’s future plans, and the challenges of cleaning up millions of misspelled artist names.
Last.fm is a music portal with impressively large repository of information on music, musicians, and the habits of those that listen to them. It originally started out as two projects: Audioscrobbler, which allowed users to chart their music-listening habits through a media-player plugin, and Last.fm, an internet radio station and music community site.
After working together closely for some time, Audioscrobbler and Last.fm joined forced and moved into the same East London, UK-based office. In 2007 CBS purchased Last.fm for £140 million, keeping current management in place and allowing the site to continue with its own identity.
Sitting atop a Mountain of Data
Last.fm users’ listening data, the size of which numbers into the hundreds of terabytes, is Last.fm’s “greatest asset,” says Jones – and playing with that data is one of the most fun things about working at the company.
“There’s so much knowledge and so many things that you can extract from that database,” he says. “We’re always looking at it in different ways and always sort of thinking, ‘What happened if we tried this, or what happened if we tried that?’, and we can actually go back to the raw data and runs some numbers and come up with some other ideas.”
As information from the Audioscrobbler plugin reports song names and artists as they’re entered in users’ music tags, dealing with all the different variations and spellings for a single artist or song is one of Last.fm’s “biggest challenges”. Staying on top of the so-called cleanliness problem proves is an important, but ultimately never-ending battle: “For everything we fix, another 10,000 people scrobble the song with the wrong spelling,” says Jones.
To that end, Last.fm says it recently added music fingerprinting to the data that Audioscrobbler submits: in addition to the text names of music, the scrobbler now reports an audio fingerprint which has, according to Jones, provided immense assistance in helping to clean up user-submitted data.
“It is a huge challenge; the common numbers are something like 300 million different tracks that we’ve recorded (that’s in tons of different spellings), and about 20 million different artists – but obviously not all of those are valid,” says Jones. “That’s the challenge: we still haven’t quite answered the question of how many unique artists there really are – there’s obviously much less than what we actually have because of all the misspellings. It’s an ongoing problem and it will never be solved, because there’s always new music being released as well and so you have to constantly keep updating the system.”
Power-Sipping Servers to Run it All
Powering the site’s massive number-crunching and storage requirements is a server farm of roughly 350 to 400 machines, consisting mostly of off-the-shelf Intel and AMD hardware. Finding adequate amounts of electricity to power the site’s growth is increasingly difficult, says Jones, and to that end he’s switched from local suppliers for his server hardware to a more power-efficient blade architecture form Sun Microsystems.
“We just got some new low-power blades that we’ve put in to do web serving, and our main database – with which we use PostgreSQL – is also on Sun hardware,” says Jones. “Sun seems to make a good range of servers that are quite conscious on the power requirements.”
Last.fm’s controversial “Recently listened tracks” feature
One of Last.fm’s more controversial features is its ability to display music that a user is listening to nearly real-time: songs appear in a profile’s “recently listened tracks” list seconds after they’re submitted. There are a number of privacy concerns over such a feature: bosses checking up on employees, ex boy/girlfriends stalking former partners, or people just checking to see if someone’s at their computer.
The feature’s been with Audioscrobbler since the very beginning, says Jones, and despite privacy concerns the “recently listened tracks” is still one of the site’s “most popular features that people actually talk about.”
“Some people are a bit concerned about it, but part of our service is to broadcast your music tastes to the world. So it’s [a big part] part of what we do: [users are] actually saying to the world, ‘this is what I am listening to right now,’ and Last.fm wouldn’t be the same without it.”
That being said, Last.fm this year rolled out the ability to hide all real-time data on a user’s profile – so those with privacy concerns can time-delay the world’s view of their listening habits.
One particularly interesting side-effect of the service is in the case of stolen laptops: “We get emails once or twice a month saying, ‘my laptop was stolen, and I can see the person who stole it is playing music on my iTunes right now,’ and then we have actually helped the police track down people’s laptops … from the scrobbling feed on their account.”
“We don’t make a point of logging the IP address,” he says, “but when [thefts have] happened we put a watch on the account, allowing us to collect the IP address the next time it’s used.”
While it’s not really the intended use of the service, says Jones, thieves listening to music on an Audioscrobbler-powered media player have helped police in the U.S., UK, and other countries track down users’ stolen laptops.
To Be Continued…
A full transcript of the interview, which includes hints at Last.fm’s future plans, insight into how it aggregates user submissions, and some behind-the-scenes thoughts on its controversial July redesign, will appear within the next few days. Stay tuned…
quote: One of Last.fm’s more controversial features is its ability to display music that a user is listening to nearly real-time: songs appear in a profile’s “recently listened tracks” list seconds after they’re submitted. There are a number of privacy concerns over such a feature: bosses checking up on employees, ex boy/girlfriends stalking former partners, or people just checking to see if someone’s at their computer.
quote: Finding adequate amounts of electricity to power the site’s growth is increasingly difficult, says Jones, and to that end he’s switched from local suppliers for his server hardware to a more power-efficient blade architecture form Sun Microsystems.