Compile and install Thrift 0.8.0 on CentOS 5.8 and 6

September 9, 2012, 9:44 pm

When you want to install Thrift on CentOS 5.8 or 6, it is better to download the tallbar from Apache thrift website. Don't checkout the source code from SVN or git because CenOS 5.8 and 6 don't have the correct version of AutoConf so that you cannot run ./bootstrap.sh of Thrift. The tarball doesn't need ./bootstrap.sh because ./configure is already generated. Just follow the installation guide.
I had troubles to compile thrift on CentOS 5.8 with ruby 1.9.3 and 1.8.7 in RVM.

For 1.9.3, I ran "bundle install", but failed in compiling mongrel-1.1.5. mongrel-1.1.5 is not compatible with Ruby 1.9.3. Using mongrel-1.2.0pre2, it will be ok. You need to rerun "./configure" if you change the ruby version in RVM. And when you install, you have to pass your ANT_HOME and rvm_path if root doesn't have them.
```
vi thrift-0.8.0/lib/rb/thrift.gemspec

  s.add_development_dependency "mongrel", "1.2.0pre2"

cd thrift-0.8.0/lib/rb
bundle install
cd ../../..
./configure
make
sudo ANT_HOME=$ANT_HOME rvm_path=$rvm_path bash -c "source /home/bewang/.rvm/scripts/rvm && rvm use 1.9.3 && make install"
```

For 1.8.7, I ran "bundle install" without any problem, but the compilation failed sometimes with this rspec error.

Spec::Mocks::MockExpectationError in 'Thrift::UNIXSocket should raise an error when read times out'
 expected :select with (any args) once, but received it twice
/home/bewang/temp/thrift-0.8.0/lib/rb/spec/socket_spec_shared.rb:94:

You can disable ruby if you don't want it or have troubles in building it successfully.

sudo yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel
tar zxvf thrift-0.8.0.tar.gz
./configure --without-ruby
make
sudo ANT_HOME=$ANT_HOME make install

↧

Hive Server 2 in CDH4.1

October 11, 2012, 3:06 pm

≫ Next: Setup Tomcat on CentOS for Windows Authentication using SPNEGO

≪ Previous: Compile and install Thrift 0.8.0 on CentOS 5.8 and 6

Just gave a try Hive server 2 in CDH 4.1 and encounter couple of issues:

Hive server 2 support LDAP and Kerberos, but only supports Simple bind. Unfortunately our LDAP server only supports SASL. If you get an error saying "Error validating the login", you may have the same issue I had. Just try to use ldapsearch in the command line to verify if you can access the ladp server. And you'd better take a look of /etc/ldap.conf if you use CentOS.

ldapsearch -Z -x "uid=bewang"

Setting up JDBC driver seems straight forward. Unfortunately it is not. I didn't find a document saying which jars should be copied. I put hive-jdbc, hive-service, and libthrift. Unfortunately, I get an error saying "java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory" in eclipse. If I dismissed the error dialog, and click "Test Connection" again, I got this confusing error saying "java.lang.NoClassDefFoundError: Could not initialize class org.apache.thrift.transport.TSocket" which is in libthrift.jar. Actually, you need to add slf4j-api int Jar list in eclipse. You also need to add commons-logging if you want to run query.

You can use a new tool called beeline/usr/lib/hive/bin/beeline

Start Hive Server 2

Install hive-server2 from cloudera cdh4 yum repository, and sudo /sbin/service hive-server2 start
Or hive --service hiveserver2

Hive Server 2 JDBC driver doesn't work well with Eclipse data tools. You cannot see databases because the error below. Also when I run a select statement, I got a lot of "Method not support" in SQL status window, and it seemed that it will never compelete. But you can cancel the query and see the result.


java.sql.SQLException: Method not supported
        at org.apache.hive.jdbc.HiveDatabaseMetaData.supportsMixedCaseIdentifiers(HiveDatabaseMetaData.java:922)

↧

Setup Tomcat on CentOS for Windows Authentication using SPNEGO

October 12, 2012, 5:08 pm

≫ Next: DB2 Client Install on Linux

≪ Previous: Hive Server 2 in CDH4.1

I setup a Tomcat server running on a Linux box with SPNEGO, so that the users can Single-Sign-On the server without typing their password. You can follow the instructions on http://spnego.sourceforge.net/spnego_tomcat.html. Although this tutorial uses Windows for example, but the steps are same as the ones on Linux.

The big problem I faced was my company's network settings:

There are two networks: corp.mycompany.com and lab.mycompany.com.
lab trusts corp, but corp doesn't trust lab

The goal is "the users from a Windows machine in corpcan access the tomcat server in lab without typing username and password."

Here is a question: where should you create the pre-auth account: in lab or corp?
I tried to create a service account in lab's AD, and registered SPNs in lab. It didn't work. When I accessed hello_spnego.jsp page on a Windows machine in corp, I always got the dialog asking for username and password. This is because I enabled downgrade to basic authentication for NTLM. If I disabled basic authentication, I would get 500 error.
I used wireshark to catch the packets and found out the traffic as bellow:

Browser sends GET /hello_spnego.jsp
Server returns 401 Unauthorized with Negotiate

HTTP/1.1 401 Unauthorized
Server: Apache-Coyote/1.1\r\n
WWW-Authenticate: Negotiate\r\n
WWW-Authenticate: Basic realm="LAB.MYCOMPANY.COM"\r\n

Client sends KRB5 TGS-REQ
Client receives KRB5 KRB Error: KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN

Kerberos KRB-ERROR
  Pvno: 5
  MSG Type: KRB-ERROR(30)
  stime: 2012-10-10 23:04:48 (UTC)
  susec: 394362
  error_code: KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN
  Realm: CORP.MYCOMPANY.COM
  Server Name (Service and Instance): HTTP/tomcat.lab.mycompany.com

Browser sends GET /hello_spnego.jsp HTTP/1.1, NTLMSSP_NEGOTIATE

Obviously, the machine in corp tries to query its own realm CORP.MYCOMPANY.COM to find the server SPN HTTP/tomcat.lab.mycompany.com. That means we should register SPNs in corp. After creating a new service account and registering SPNs in corp, I changed the pre-auth account in web.xml to serviceaccount@CORP.COMPANY.COM, then everything worked.
Then I tried to use keytab method because I don't like to put username/password in plaintext in web.xml. There are still a log of pitfalls in this step. Here are the working version of my login.conf

spnego-client {
  com.sun.security.auth.module.Krb5LoginModule required;
};

spnego-server {
  com.sun.security.auth.module.Krb5LoginModule
    required
    useKeyTab=true
    keyTab="conf/appserver.keytab"
    principal="serviceaccount@CORP.MYCOMPANY.COM"
    storeKey=true
    isInitiator=false;
};

and krb5.conf.

[libdefaults]
  default_realm = LAB.MYCOMPANY.COM
  default_tgs_enctypes = arcfour-hmac-md5 des-cbc-crc des-cbc-md5 des3-hmac-sha1
  default_tkt_enctypes = arcfour-hmac-md5 des-cbc-crc des-cbc-md5 des3-hmac-sha1
  clockskew = 300

[realms]
  LAB.MYCOMPANY.COM = {
    kdc = kdc1.lab.mycompany.com
    kdc = kdc2.lab.mycompany.com
    default_domain = lab.mycompany.com
  }                         

[default_realm]             
  lab.mycompany.com = LAB.MYCOMPANY.COM
  .lab.mycompany.com = LAB.MYCOMPANY.COM

You may encounter different issues if something is wrong. Here is my experience:

If I don't quote the principal like this principal=serviceaccount@CORP.MYCOMPANY.COM, I will get the configuration error. And the message is misleading because line 9 is keyTab.

Caused by: java.io.IOException: Configuration Error:
        Line 9: expected [option key], found [null]

When you use ktab, the first thing you need to know is only windows version has this tool, while Linux RPM from oracle doesn't have it.
You should use the service account in corp network, not lab, to generate the keytable file like this:

ktab -a serviceaccount@CORP.MYCOMPANY.COM -k appserver.keytab

Make sure your
I also encountered this error: KrbException: Specified version of key is not available (44). It turns out that the keytab file I generated with kvno=1 and the expected is 2. You can use wireshark to catch the packet for KRB5 TGT-REP, and it will tell you what kvno is expected.

Ticket
  Tkt-vno: 5
  Realm: LAB.MYCOMPANY.COM
  Server Name ....
  enc-part rc5-hmac
    Encryption type: ...
    Kvno: 2 *** Here it is
    enc-part: ...

You have to run ktab command multiple times to achieve the correct kvno just like this page http://dmdaa.wordpress.com/2010/05/08/how-to-get-needed-kvno-for-keytab-file-created-by-java-ktab-utility/. Use can just use ktab -l to find the kvno:


ktab -l -k appserver.keytab

Which version of JDK seems not important. A keytab file generated by JDK 7 worked in JDK 1.6.0_32.
I also got this Checksum error if I used my lab service account (serviceaccount@lab.mycompany.com) in pre-auth fields or keytab.

SEVERE: Servlet.service() for servlet [jsp] in context with path [] threw exception [GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)] with root cause
java.security.GeneralSecurityException: Checksum failed
        at sun.security.krb5.internal.crypto.dk.ArcFourCrypto.decrypt(ArcFourCrypto.java:388)
        at sun.security.krb5.internal.crypto.ArcFourHmac.decrypt(ArcFourHmac.java:74)

↧

DB2 Client Install on Linux

October 26, 2012, 1:26 pm

≫ Next: Import users to Cloudera Hue 2 from Hue 1.2.0

≪ Previous: Setup Tomcat on CentOS for Windows Authentication using SPNEGO

db2ls can show the current installation

[bewang@logs]$ db2ls
Install Path                       Level   Fix Pack   Special Install Number   Install Date                  Installer UID 
---------------------------------------------------------------------------------------------------------------------
/opt/ibm/db2/V9.7                 9.7.0.2        2                            Tue Oct  4 17:08:48 2011 PDT             0

and the installed components


[bewang@logs]$ db2ls -q -b /opt/ibm/db2/V9.7

Install Path : /opt/ibm/db2/V9.7

Feature Response File ID             Level   Fix Pack   Feature Description  
---------------------------------------------------------------------------------------------------------------------
BASE_CLIENT                         9.7.0.2          2   Base client support 
JAVA_SUPPORT                        9.7.0.2          2   Java support 
LDAP_EXPLOITATION                   9.7.0.2          2   DB2 LDAP support

db2idrop to remove, then db2_deinstall

↧

Import users to Cloudera Hue 2 from Hue 1.2.0

October 29, 2012, 10:51 pm

≫ Next: Hue 2.0 failed when syncdb with a MySQL database

≪ Previous: DB2 Client Install on Linux

I thought it should not be too hard to import users from Hue 1.2.0 into Hue 2 (CDH 4.1) because this page doesn't mention special steps: https://ccp.cloudera.com/display/CDH4DOC/Hue+Installation#HueInstallation-UpgradingHuefromCDH3toCDH4.

But I was wrong. After importing auth_user, auth_group, and auth_user_groups and successfully log on using my old username and password, I got "Server Error (500)" and I couldnot find any error message in the log files.

It turns out that you have to create records in useradmin_grouppermission and useradmin_userprofile tables for each user in Hue 2(CDH4.1). Here are the queries:


mysql> insert into useradmin_grouppermission(hue_permission_id,group_id) select hp.id, g.id from useradmin_huepermission hp inner join auth_group g;
mysql> insert into useradmin_userprofile (user_id, creation_method, home_directory) select u.id, 'HUE', concat('/user/', u.username) from auth_user u;

You may need to put id1 if you already create the super user. It is better not to create the superuser when you do syncdb at the first time. You can do like this

drop database hue
create database hue
build/env/bin/hue syncdb
answer no when you are asked if creating a superuser


You just installed Django's auth system, which means you don't have any superusers defined.
Would you like to create one now? (yes/no): no

mysqldump -uxxx -pxxxx -h old_db --compact --no-create-info --disable-keys hue auth_group auth_user auth_user_groups auth_user_user_permissions > hue_user.sql
mysql -uyyy -pyyyy -h new_db -D hue
run the above insert queries

↧

Hue 2.0 failed when syncdb with a MySQL database

October 30, 2012, 12:50 pm

≫ Next: Tips of Jenkins and Nexus

≪ Previous: Import users to Cloudera Hue 2 from Hue 1.2.0

If you use MySQL as HUE 2.0 database, and the default charset of the database is UTF8, you will have trouble to run syncdb because the migration ${HUE_DIR}/apps/jobsub/src/jobsub/migrations/0002_auto__add_ooziestreamingaction__add_oozieaction__add_oozieworkflow__ad.py of jobsub defines some columns as varchar(32678), which is not supported in MySQL using UTF8. The maximum length is 21844. Here is how you can fix it.

Don't use utf8 as default. Change /etc/my.cnf to use latin1. For latin1, varchar can hold 64K. After syncdb, just modify the tables charset to use utf8 if you still want utf8.
Edit the migration to replace all 32678 to 12678 before running syncdb. It doesn't matter to what value you change because the developers already have the fix in migration 0003_xxx of the same directory and the table will be fixed anyway. Don't forget to delete 0002_.pyc in the directory.

↧

Tips of Jenkins and Nexus

November 7, 2012, 2:56 pm

≫ Next: Start Puppet Master As Non-Root User

≪ Previous: Hue 2.0 failed when syncdb with a MySQL database

Fingerprint Issue in Jenkins: delete the dir d:\.jenkins\fingerprints


Waiting for Jenkins to finish collecting data
ERROR: Asynchronous execution failure
java.util.concurrent.ExecutionException: hudson.util.IOException2: Unable to read D:\.jenkins\fingerprints\42\e9\40d5d2d822f4dc04c65053e630ab.xml
   ...
Caused by: hudson.util.IOException2: Unable to read D:\.jenkins\fingerprints\42\e9\40d5d2d822f4dc04c65053e630ab.xml
   ...
Caused by: com.thoughtworks.xstream.io.StreamException:  : only whitespace content allowed before start tag and not \u0 (position: START_DOCUMENT seen \u0... @1:1)
   ...
Caused by: org.xmlpull.v1.XmlPullParserException: only whitespace content allowed before start tag and not \u0 (position: START_DOCUMENT seen \u0... @1:1)

Maven (3.0.3) cannot resolve snapshot artifacts in Nexus: enable snapshots for the repository in ~/.m2/settings.xml. Only one thing is not clear, if it also enable snapshots too for other repositories because of the mirror setting.


    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
    http://maven.apache.org/xsd/settings-1.0.0.xsd">
/home/bewang/temp/maven/.m2/repository






nexus

true



my-repo
http://nexus-server:8080/nexus/content/groups/public/url>

true


true







nexus-public
*,!eclipselink-repo
http://nexus-server:8080/nexus/content/groups/pu
blic


nexus-eclipselink
eclipselink-repo
http://nexus-server:8080/nexus/content/repositor
ies/eclipselink-maven-mirror

↧

Start Puppet Master As Non-Root User

November 19, 2012, 3:08 pm

≫ Next: Using Puppet to Manage Java Application Deployment

≪ Previous: Tips of Jenkins and Nexus

My company has a lengthy process if we need root permission to deploy something into a Linux server, 3-4 days for approval. Sound impossible? Unfortunately it is true. And I don't want to run my puppet modules under root permission because they have to be managed by IT team and it could take even longer to deploy and maintain those modules. My goal is to avoid any root permission as possible as I can.

Actually it is possible that run multiple puppet agents in one machine. One agent is managed by IT, who handles the stuffs I don't care, such as NTP. One agent can run under my control with a non-root user with some sudo privileges, like run /sbin/service, etc. And I can manage a puppet master and config my agents to connect to this master to do the deployment.

It is actually pretty simple, even without any configuration, just provide confdir, vardir and server in the command line.


puppet master --confdir=/home/bewang/.puppet --vardir=/home/bewang/.puppet/var --modulepath=/home/bewang/modules --pluginsync --no-deamonize --debug --verbose
puppet agent --confdir=/home/bewang/.puppet --vardir=/home/bewang/.puppet/var server=my-pp-master --test

Of course, when you run puppet agent for the first time, it will fail because the certificate is not signed yet. After running the following command, everything works.


puppet cert --confdir=/home/bewang/.puppet --vardir=/home/bewang/.puppet/var list
puppet cert --confdir=/home/bewang/.puppet --vardir=/home/bewang/.puppet/var sign my-pp-agent-host-name

For service type resource, you can provide customized start command like this:


service { "my-service":
  ensure => running,
  start => "sudo /sbin/service my-service start",
}

↧

Using Puppet to Manage Java Application Deployment

November 21, 2012, 2:19 pm

≫ Next: CDH4.1.0 Hive Debug Option Bug

≪ Previous: Start Puppet Master As Non-Root User

Deployment of a Java application can be done in a different ways. We can do

Build one jar with all exploded dependent jars and your application. You don't have to worry about your classpath.
Build a tarball. It is much easier using Maven assembly and Maven AppAssembly plugin to build a tarball/zip to include all dependent jars and the wrapper scripts for running application or service. AppAssembly generate the classpath in the wrapper scripts.
Build a war and deploy it to the container.

All above methods have the same issue: you will repeatedly deploy the same jars again and again. One of my project uses a lot of third-party libraries, Spring, Drools, Quartz, EclipseLink and Apache CXF. The whole zip file is about 67MB with all transitive dependent jars, but my application is actually less than 1MB. Every time I have to build and store such large files, and include all dependent jars without any change. Even the hard drive is much cheaper today, doing like this is still not a good practice.

It is easy to deploy my application jar and replace the old jar only each time if I don't update any dependency. Otherwise, it is still inconvenient because you have to change multiple jars.

The ideal way is to deploy the application jar and dependent jars in a directory other than the application directory on the target server, and build the classpath each time when a new version or dependency change is deployed. There are the following advantages:

Dependent jars only needs to be deployed once.
Different applications can share the same dependencies.
Multiple versions exist on the target server. It is much easier to rollback to the old version.
You save a lot of space and network traffic

You probably get the answer: Maven. Using maven, maven dependency plugin, and maven local repository, you can simply and easily implement such a system.

I have a working puppet module can do the followings:

setup maven from tarball;
setup maven settings.xml, local repository, and you maven repository on the intranet;
resolve the dependencies of a maven artifact: download jars into the local repository;
setup symlinks of dependent jars into a directory so that you don't have different copies for different applications;
generate a file having the classpath.

I will publish it to github once I get the time.

You can find similar puppet modules in github, like puppet-nexus and puppet-maven. They just copy the specified jars to a directory.

↧

CDH4.1.0 Hive Debug Option Bug

November 27, 2012, 12:05 pm

≫ Next: Bad performance of Hive meta store for tables with large number of partitions

≪ Previous: Using Puppet to Manage Java Application Deployment

Hive 0.9.0 supports remote debug.Run the following command, and Hive will suspend and listening on 8000.


hive --debug

But there is a bug in Hive CDH4.1.0 which blocks you from using this option. You will get this error message:


[bewang@myserver ~]$ hive --debug
ERROR: Cannot load this JVM TI agent twice, check your java command line for duplicate jdwp options.
Error occurred during initialization of VM
agent library failed to init: jdwp

By setting xtrace, I found "-XX:+UseParallelGC -agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=y" is actually add twice. In this commit 54abc308164314a6fae0ef0b2f2241a6d4d9f058, HADOOP_CLIENT_OPTS is appended to HADOOP_OPTS, unfortunately this is done in $HADOOP_HOME/bin/hadoop".


--- a/bin/hive
+++ b/bin/hive
@@ -216,6 +216,7 @@ if [ "$DEBUG" ]; then
   else
     get_debug_params "$DEBUG"
     export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS $HIVE_MAIN_CLIENT_DEBUG_OPTS"
+    export HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
   fi
 fi

Removing this line will fix the issue.

↧

Bad performance of Hive meta store for tables with large number of partitions

November 30, 2012, 12:22 pm

≫ Next: Hive Metastore Configuration

≪ Previous: CDH4.1.0 Hive Debug Option Bug

Just found this article Batch fetching - optimizing object graph loading

We have some tables with 15K ~ 20K partitions. If I run a query scanning a lot of partitions, Hive could use more than 10 minutes to commit the mapred job.

The problem is caused by ObjectStore.getPartitionsByNames when Hive semantic analyzer tries to prune partitions. This method sends a lot of queries to our MySQL database to retrieve ALL information about partitions. Because MPartition and MStroageDescriptor are converted into Partition and StorageDescriptor, every field will be accessed during conversion, in other words, even the fields has nothing to do with partition pruning, such as BucketCols. In our case, 10 queries for each partition will be sent to the database and each query may take 40ms.

This is known ORM 1+N problem. But it is really bad user experience.

Actually we assembly Partition objects manually, it would only need about 10 queries for a group of partitions (default size is 300). In our environment, it only needs 40 seconds for 30K partitions: 30K / 300 * 10 * 40.

I tried to this way:

Fetch MPartition with fetch group and fetch_size_greedy, so one query can get MPartition's primary fields and MStorageDescriptor cached.
Get all descriptors into a list "msds", run another query to get MStorageDescriptor with filter like "msds.contains(this)", all cached descriptors will be refreshed in one query instead of n queries.

This works well for 1-1 relations, but not on 1-N relation like MPartition.values. I didn't find a way to populate those fields in just one query.

Because JDO mapping doesn't work well in the conversion (MPartition - Partition), I'm wondering if it is worth doing like this:

Query each table in SQL directly PARTITIONS, SDS, etcs.
Assembly Partition objects

This is a hack and the code will be really bad. But I didn't find JDO support "FETCH JOIN" or "Batch fetch".

↧

Hive Metastore Configuration

December 6, 2012, 1:18 am

≫ Next: Puppet require vs. include vs. class

≪ Previous: Bad performance of Hive meta store for tables with large number of partitions

Recently I wrote a post for Bad performance of Hive meta store for tables with large number of partitions. I did tests in our environment. Here is what I found:

Don't configure a hive client to access remote MySQL database directly as follows. The performance is really bad, especially when you query a table with a large number of partitions.


javax.jdo.option.ConnectionURL
jdbc:mysql://mysql_server/hive_meta


javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver


javax.jdo.option.ConnectionUserName
hive_user


javax.jdo.option.ConnectionPassword
password

Must start Hive metastore service on the same server where Hive MySQL database is running.

On database server, use the same configuration as above
Start the hive metasore service

hive --service metastore
# If use CDH
yum install hive-metastore
/sbin/service hive-metastore start

On hive client machine, use the following configuration.


hive.metastore.uris
thrift://mysql_server:9083

Don't worry if you see this error message.

ERROR conf.HiveConf: Found both hive.metastore.uris and javax.jdo.option.ConnectionURL Recommended to have exactly one of those config keyin configuration

The reason is: when Hive does partition pruning, it will read a list of partitions. The current metastore implementation uses JDO to query the metastore database:

Get a list of partition names using db.getPartitionNames()
Then call db.getPartitionsByName(List<Strin> partNames). If the list is too large, it will load in multiple times, 300 for each load by default. The JDO calls like this
- For one MPartition object.
- Send 1 query to retrieve MPartition basic fields.
- Send 1 query to retrieve MStorageDescriptors
- Send 1 query to retrieve data from PART_PARAMS.
- Send 1 query to retrieve data from PARTITION_KEY_VALS.
- ...
- Totally 10 queries for one MPartition. Because MPartition will be converted into Partition before send by, all fields will be populated
If one query takes 40ms in my environment. And you can calculate how long does it take for thousands partitions.
Using remote Hive metastore service, all those queries happens locally, it won't take that long for each query, so you can get performance improved significantly. But there are still a lot of queries.

I also wrote ObjectStore using EclipseLink JPA with @BatchFetch. Here is the test result, it will at least 6 times faster than remote metastore service. It will be even faster.

Partitions	JDO Remote MySQL	Remote Service	EclipseLink Remote MySQL
10	6,142	353	569
100	57,076	3,914	940
200	116,216	5,254	1,211
500	287,416	21,385	3,711
1000	574,606	39,846	6,652
3000		132,645	19,518

↧

Puppet require vs. include vs. class

December 11, 2012, 2:05 pm

≫ Next: Have to read all from Ruby PTY output

≪ Previous: Hive Metastore Configuration

According to puppet reference include and require:

Both include and require are functions;
Both include and require will "Evaluate one or more classes";
Both include and require cannot handle parametered classes
require is a superset of include, because it "adds the required class as a dependency";
require could cause "nasty dependency cycle";
require is "largely unnecessary"; See puppet language guide.
Puppet also has a require function, which can be used inside class definitions and which does implicitly declare a class, in the same way that the include function does. This function doesn’t play well with parameterized classes. The require function is largely unnecessary, as class-level dependencies can be managed in other ways.

We can include a class multiple times, but cannot declare a class multiple times.


class inner {
  notice("I'm inner")

  file {"/tmp/abc":
    ensure => directory
  }
}

class outer_a {
  # include inner
  class { "inner": }

  notice("I'm outer_a")
}

class outer_b {
  # include inner
  class { "inner": }

  notice("I'm outer_b")
}

include outer_a
include outer_b


Duplicate declaration: Class[Inner] is already declared in file /home/bewang/temp/puppet/require.pp at line 11; cannot redeclare at /home/bewang/temp/puppet/require.pp:18 on node pmaster.puppet-test.com

You can safely include a class, first two examples pass, but you cannot declare class inner after outer_a or outer_b like the third one:


class inner {
}

class outer_a {
  include inner
}

class outer_b {
  include inner
}

class { "inner": }
include outer_a
include outer_b


class { "inner": }
class { "outer_a": }
class { "outer_b": }


include outer_a
include outer_b
class { "inner": } # Duplicate redeclaration error

↧

Have to read all from Ruby PTY output

December 27, 2012, 5:02 pm

≫ Next: Bash single quote escape in running a Hive query

≪ Previous: Puppet require vs. include vs. class

I wrote a Ruby script to call ssh-copy-id to distribute my public key to a list of hosts. It took me a while to make it work. The code is simple: read the password, and call ssh-copy-id for each host to copy the public key. Of course, the code doesn't consider all scenarios like the key is already copied, and wrong password. The tricky parts are "cp_out.readlines" and "Process.wait(pid)". If you don't read all the data from cp_out (comment out cp_out.readlines), the spawned process won't return.


#!/usr/bin/env ruby
require 'rubygems'

require 'pty'
require 'expect'
require 'io/console'

hosts_file = ARGV[0] || "hosts"

print "Password:"
password = $stdin.noecho(&:gets)
password.chomp!
puts

$expect_verbose = true
File.open(hosts_file).each do |host|
  host.chomp!
  print "Copying id to #{host} ... "
  begin
    PTY.spawn("ssh-copy-id #{host}") do |cp_out, cp_in, pid|
      begin
        pattern = /#{host}'s password:/
        cp_out.expect(pattern, 10) do |m|
          cp_in.printf("#{password}\n")
        end
        cp_out.readlines
      rescue Errno::EIO
      ensure
        Process.wait(pid)
      end
    end
  rescue PTY::ChildExited => e
    puts "Exited: #{e.status}"
  end
  status = $?
    if status == 0 
      puts "Done!"
  else
    puts "Failed with exit code #{status}!"
  end
end

↧

Bash single quote escape in running a Hive query

January 11, 2013, 6:07 pm

≫ Next: HBase LeaseException issue

≪ Previous: Have to read all from Ruby PTY output

If you want to run this hive query using SSH, you probably will be headache how to escape:


select * from my_table where date_column = '2013-01-11';

The correct command looks like this:


ssh myserver 'hive -e "select * from my_table where date_column = '"'2013-01-11'"';"'

What hack? Here is how bash read this:


1. ssh
2. myserver
3. hive -e "select * from my_table where date_column = 
4. '2013-01-01'
5. ;"

3,4,5 will be concatenated to one line


hive -e "select * from my_table where date_column = '2013-01-01';"

↧

HBase LeaseException issue

February 15, 2013, 10:36 am

≫ Next: Scala SBT

≪ Previous: Bash single quote escape in running a Hive query

Excerpt from http://hbase.apache.org/book.html. If you google "hbase LeaseException", this page may not be on the first page.

12.5.2. `LeaseException` when calling `Scanner.next`

In some situations clients that fetch data from a RegionServer get a LeaseException instead of the usual Section 12.5.1, “ScannerTimeoutException or UnknownScannerException”. Usually the source of the exception is org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) (line number may vary). It tends to happen in the context of a slow/freezing RegionServer#next call. It can be prevented by having hbase.rpc.timeout > hbase.regionserver.lease.period. Harsh J investigated the issue as part of the mailing list thread HBase, mail # user - Lease does not exist exceptions

↧

Scala SBT

February 15, 2013, 12:38 pm

≫ Next: SVN color diff

≪ Previous: HBase LeaseException issue

Requirements:

Include scala-library.jar in the zip file because the machine may not have scala installed.
Include scripts and configure files into the zip just like Maven assembly.
Include the jar in the zip file.

Lessons:

Key.

SettingKey and TaskKey
Cannot reference Key directory. Need to use tuple like this:
```
(sbt_key) => { key_val => }
```
projectBin is the jar
managed

Must have a Project

libraryDependencies and resolvers defined in Build not effect for the project

IO.zip doesn't support permission of shell scripts

Build with java code perfectly.

custom cleanFiles

Define a custom task like dist

Useful commands of sbt


settings // list settings keys
tasks // list all tasks
show clean-files // check value of settings keys
inspect clean-files // check more information than show

↧

SVN color diff

February 20, 2013, 3:48 pm

≫ Next: Git-SVN

≪ Previous: Scala SBT


svn diff somefile | vim -

↧

Git-SVN

March 1, 2013, 1:46 pm

≫ Next: Extract compressed Tar

≪ Previous: SVN color diff

Here are some notes when I play with git-svn:

Clone a project. You can run the following commands if the project follows the standard layout with trunk/tags/branches.
```
root
  |- ProjectA
  |   |- branches
  |   |- tags
  |   \- trunk
  |- OtherProject
```
Download trunk, tags and branches. You will find the same directories just like SVN repository.
```
git svn clone -r XXXX https://subversion.mycompany.com/ProjectA
```
init a new project. This will create dir projectA and initializ it as a git repository.
```
git svn init https://subversion.mycompany.com/projectA projectA
```

Download trunk only. The directory have the files under trunk only.

git svn clone -s -r XXXX https://subversion.mycompany.com/ProjectA

Provide the revision number using "-r xxxx", otherwise have to wait for a long time until git-svn finishes scanning all revisions.
Use the revision of the project (96365) instead of the latest revision (96381). Otherwise, you will get the working directory empty. You can find the revision number in viewvc like this:
```
Index of /ProjectA

Files shown: 0
Directory revision: 96365 (of 96381)
Sticky Revision:    
```

↧