miaoski on cybersecurity: 2012-08

The project I'm working on got switched from MySQL to HBase in order to react against some massive prediction of data. Thank David Santiago for his brilliant clojure-hbase project, I can access HBase in Clojure. Here are some tips if you have to work in some ~~unfriendly~~ enterprise environment like I am. After all, Clojure is our rejoice! :)

Tip #1. Customised HBase CLASSPATH

Forget about resources/* in your Leiningen project. Just lein jar and make a run.sh:

#!/bin/bash
CP=$(lein classpath)
java -cp "/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/etc/hbase/conf:/usr/lib/hbase/hbase.jar:/usr/lib/hbase/lib/*:./target/*:$CP" myproject.core $*

Tip #2. Filter

You may use filter in hb/get and hb/scan. Just follow this, and you get all rows "U-*" whose columns begin with "M-d5".

(defn prefix-filter
  [prefix]
  (ColumnPrefixFilter. (Bytes/toBytes prefix)))

(defn test-scanner
  [^HTable mr]
  (hb/with-scanner [results (hb/scan mr
                                     :start-row "U-"
                                     :stop-row "U."
                                     :filter (prefix-filter "M-d5")
                                     )]
                   (doall (map #(println %)
                               (seq results)))))

miaoski on cybersecurity

2012-08-06

clojure-hbase with filter

Corpora

網誌存檔

關於我自己