2012-08-06

clojure-hbase with filter

The project I'm working on got switched from MySQL to HBase in order to react against some massive prediction of data. Thank David Santiago for his brilliant clojure-hbase project, I can access HBase in Clojure. Here are some tips if you have to work in some unfriendly enterprise environment like I am. After all, Clojure is our rejoice! :)
  • Tip #1. Customised HBase CLASSPATH
Forget about resources/* in your Leiningen project. Just lein jar and make a run.sh:
#!/bin/bash
CP=$(lein classpath)
java -cp "/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/etc/hbase/conf:/usr/lib/hbase/hbase.jar:/usr/lib/hbase/lib/*:./target/*:$CP" myproject.core $*
  • Tip #2. Filter
You may use filter in hb/get and hb/scan. Just follow this, and you get all rows "U-*" whose columns begin with "M-d5".
(defn prefix-filter
  [prefix]
  (ColumnPrefixFilter. (Bytes/toBytes prefix)))

(defn test-scanner
  [^HTable mr]
  (hb/with-scanner [results (hb/scan mr
                                     :start-row "U-"
                                     :stop-row "U."
                                     :filter (prefix-filter "M-d5")
                                     )]
                   (doall (map #(println %)
                               (seq results)))))