2013-03-08

Access HBase in a map-reduce job (Clojure)

Sometimes a Hadoop cluster is secured and a map-reduce job needs to obtain an authentication token to access HBase. Here's how we do it in Java:
if (User.isSecurityEnabled()) {
    try {
        User.getCurrent().obtainAuthTokenForJob(job.getConfiguration(), job);
    } catch (IOException ioe) {
        LOG.error(job.getJobName()+ ": Failed to obtain current user.");
    } catch (InterruptedException ie) {
        LOG.info(job.getJobName()+ ": Interrupted obtaining user authentication token");
        Thread.interrupted();
    }
}
So, how to do it in Clojure? First, use clojure-hadoop to create a map-reduce job. You may not use (defjob/defjob), because we need to add something before job submission. Second, we have to make Job job = new Job(HBaseConfiguration.create()). An example follows.
(defn- hbase-set-kerberos [#^Job job]
  (if (User/isSecurityEnabled)
    (try
      (.obtainAuthTokenForJob (User/getCurrent) (.getConfiguration job) job)
      (catch IOException e
        (throw (IOException. "Failed to obtain current user.")))
      (catch InterruptedException e
        (throw (InterruptedException. "Interrupted obtaining user authentication token"))))))
 
(defn tool-run [^Tool this args]
  (doto (Job. (HBaseConfiguration/create))
    (.setJarByClass (.getClass this))
    (.setJobName (str "hbfeeder.mrjob: " (second args)))
    (.setMapperClass (Class/forName "hbfeeder.mrjob_mapper"))
    (.setReducerClass (Class/forName "hbfeeder.mrjob_reducer"))
    (.setNumReduceTasks 0)
    (.setInputFormatClass TextInputFormat)
    (.setOutputFormatClass NullOutputFormat)
    (FileInputFormat/setInputPaths ^String (second args))
    (hbase-set-kerberos)       ; THIS LINE PLAYS THE MAGIC
    (.waitForCompletion true))
  0)
A runnable example can be found at clj-hbase-mapper-example on Github.

沒有留言: