If you run Hadoop shell commands on console or use them to write a script, you will hate that because it loads and starts JVM for every command. A command like "hadoop fs -ls /tmp/abc" usually takes 3~4 seconds on my VirtualBox VM running CentOS 6.5 with 8 virtual core and 12GB.
$ time hadoop fs -ls /tmp/abc
Found 2 items
drwxrwxrwx - bwang supergroup 0 2014-03-10 16:25 /tmp/abc/2014-03-10
drwxr-xr-x - bwang supergroup 0 2014-03-14 14:57 /tmp/abc/567
real 0m3.632s
user 0m4.146s
sys 0m2.650s
I have been curious whether Nailgun can help me save time by running those commands. I just figured out today. It turns out pretty easy.
- Install nailgun: Just clone from github, and follow the instruction in README.md. I only ran "mvn clean package" and "make".
$ cd ~/git/nailgun
$ mvn clean package
$ make
$ ls
Makefile nailgun-examples ng README.md
nailgun-client nailgun-server pom.xml
$ ls nailgun-server/target/
apidocs nailgun-server-0.9.2-SNAPSHOT.jar
classes nailgun-server-0.9.2-SNAPSHOT-javadoc.jar
javadoc-bundle-options nailgun-server-0.9.2-SNAPSHOT-sources.jar
maven-archiver surefire
maven-status - Start Nailgun server: the trick is you need to put Hadoop classpath.
$ java -cp `hadoop classpath`:/home/bwang/git/nailgun/nailgun-server/target/nailgun-server-0.9.2-SNAPSHOT.jar com.martiansoftware.nailgun.NGServer
NGServer 0.9.2-SNAPSHOT started on all interfaces, port 2113. - Setup aliases: you can setup aliases so that you can run the same hadoop shell command just like with nailgun.
$ alias hadoop='$HOME/git/nailgun/ng'
$ hadoop ng-alias fs org.apache.hadoop.fs.FsShell
$ hadoop ng-alias
fs org.apache.hadoop.fs.FsShell
ng-alias com.martiansoftware.nailgun.builtins.NGAlias
Displays and manages command aliases
ng-cp com.martiansoftware.nailgun.builtins.NGClasspath
Displays and manages the current system classpath
ng-stats com.martiansoftware.nailgun.builtins.NGServerStats
Displays nail statistics
ng-stop com.martiansoftware.nailgun.builtins.NGStop
Shuts down the nailgun server
ng-version com.martiansoftware.nailgun.builtins.NGVersion
Displays the server version number.
$ time hadoop fs -ls /tmp/abc
Found 2 items
drwxrwxrwx - bwang supergroup 0 2014-03-10 16:25 /tmp/abc/2014-03-10
drwxr-xr-x - bwang supergroup 0 2014-03-14 14:57 /tmp/abc/567
real 0m0.046s
user 0m0.000s
sys 0m0.008s - create some shell script so that you won't remember those long command.