Apache Mahout Performance Issues

With the gracious help of the Mahout community via its mailing list, we have found a solution to my problem. All of the code related to the solution was committed into Mahout 0.6. More details can be found in the corresponding JIRA ticket.

Using VisualVM I found that the performance bottleneck was in the computation of item-item similarities. This was addressed by @Sean using a very simple but effective fix (see the SVN commit for more details)

Additionally, we have discussed how to improve the SamplingCandidateItemsStrategy to allow finer control over the sampling rate.

Finally, I did some testing with my application with the aforementioned fixes. All the recommendations took less than 1.5 seconds with the overwhelming majority taking less than 500ms. Mahout could easily handle 100 recommendations per second (I did not try to stress it more than that).


Small suggestion: your last snippet should use GenericBooleanPrefItemBasedRecommender.

For your data set, the item-based algorithm should be best.

This sounds a little slow, and minutes is way too long. The culprit is lumpy data; time can scale with the number of ratings a user has provided.

Look at SamplingCandidateItemsStrategy. This will let you limit the amount of work done in this regard by sampling in the face of particularly dense data. You can plug this in to GenericBooleanPrefItemBasedRecommender instead of using the default. I think this will give you a lever to increase speed and also make response time more predictable.