How do you use the Cassandra tool sstableloader?

The Austin Cassandra Users Group just had a presentation on this: http://www.slideshare.net/alex_araujo/etl-with-cassandra-streaming-bulk-loading/


Played around with sstableloader, read the source code, and finally figured out how to run sstableloader on the same machine that hosts a running cassandra node. There are two key points to get this running. First you need to create a copy of the cassandra install folder for sstableloader. This is becase sstableloader reads the yaml file to figure out what ipaddress to use for gossip, and the existing yaml file is being used by Cassandra. The second point is that you'll need to create a new loopback ipaddress (something like 127.0.0.2) on your machine. Once this is done, change the yaml file in the copied Cassandra install folder to listen to this ipaddress.

I wrote a tutorial going more into detail about how to do this here: http://geekswithblogs.net/johnsPerfBlog/archive/2011/07/26/how-to-use-cassandrs-sstableloader.aspx

Tags:

Cassandra