My Tech Notes

↧

Yarn MapReduce Log Level

February 27, 2014, 12:47 am

When you write a Pig or Hive UDF, a debug log may be very useful. You don't have to ask your administrator for help. Setting property 'mapreduce.map.log.level' or 'mapreduce.reduce.log.level' to...

View Article

Parquet "java.lang.NoClassDefFoundError: org/apache/thrift/TEnum"

March 13, 2014, 4:20 pm

If you encounter this problem using Cloudera parcels, here is the solution according to this pageorg.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoClassDefFoundError:...

View Article

Run hadoop shell command Super Fast

March 14, 2014, 11:04 pm

If you run Hadoop shell commands on console or use them to write a script, you will hate that because it loads and starts JVM for every command. A command like "hadoop fs -ls /tmp/abc" usually takes...

View Article

Add new hard disk on CentOS6

March 23, 2014, 1:49 pm

Add a new hard disk.disk tool format the hard disk using "Master boot record".start system-config-lvminitialize entityadd to a volumn groupselect the volume in "Volumn Groups->group->Logical...

View Article

"Cannot get schema from loadFunc parquet.pig.ParquetLoader"

March 25, 2014, 2:23 pm

If you get this error 2014-03-25 14:17:48,933 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245: Cannot get schema from loadFunc parquet.pig.ParquetLoaderDetails at logfile:...

View Article

Git subtree to manage vim plugins

April 10, 2014, 11:34 am

I found those blogs http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree and http://endot.org/2011/05/18/git-submodules-vs-subtrees-for-vim-plugins/ about how to use Git...

View Article

Run Spark Shell locally

July 21, 2014, 10:52 am

If you want to run spark to access the local file system, here is the simple way: HADOOP_CONF_DIR=. MASTER=local spark-shell If you don't give HADOOP_CONF_DIR, spark will use /etc/hadoop/conf which may...

View Article

HCatalog and Parquet

July 22, 2014, 11:00 am

I'm trying to use Sqoop to import Teradata tables into Impala's Parquet tables. Because Sqoop doesn't support to write Parquet files directly, it SEEMS very promising to use Sqoop HCatalog to write...

View Article

Teradata create a table like

July 24, 2014, 2:29 pm

create table target_db.target_table as src_db.src_table with no data; create table target_db.target_table as (select * from src_db.src_view) with no data;

View Article

Links of Hadoop Cluster Hardware

July 28, 2014, 3:48 pm

HardWareCloudera: How-to: Select the Right Hardware for Your New Hadoop ClusterHortonworks: Best Practices for Selecting Apache Hadoop HardwareHortonworks: Chapter 1. Hardware Recommendations For...

View Article

Parquet Schema Incompatible between Pig and Hive

August 14, 2014, 10:31 pm

When you use Pig to process data and put into a Hive table, you need to be careful about the Pig namespace. It is possible that your complex Pig scripts may have namespace after group-by. And...

View Article

Set replication for files in Hadoop

August 15, 2014, 1:57 pm

Change the existing files's replications hadoop fs -setrep -R -w 2 /data-dirSet replication when loading a file hadoop fs -Ddfs.replication=2 -put local_file dfs_file

View Article

Impala-shell may have control sequence in its output

August 28, 2014, 2:45 pm

Assume you have a table has three rows, what is the result in the output file /tmp/my_table_count? 3? Actually it is not. There is a control sequence "ESC[?1034h" on my terminal. $ impala-shell -B -q...

View Article

Add Terminals View

October 16, 2014, 11:47 am

I really like Eclipse Luna's new feature "Terminals". It is convenient so that you don't have to leave Eclipse to open a terminal to type your command. And Mylyn task still record the time you spend on...

View Article

Solarized Light (Scala) for Eclipse

October 22, 2014, 1:09 pm

Just install ScalaIDE 4.0.0-RC1 on my Eclipse Luna. I really like "Solarized Light (Scala)" color theme as shown in the release note. You need to install "Eclipse Color Theme Plugin", otherwise you...

View Article

Build scalatest-eclipse-plugin for Scala-IDE 4.0.0 RC1

October 24, 2014, 10:34 am

I upgraded Scala-IDE in Eclipse Luna to 4.0.0 RC1, but didn't find Scalatest plugin. It is inconvenient and time consuming to run "mvn test" every time I change the test. I decided to build Scalatest...

View Article

"NoSuchMethodError:...

November 3, 2014, 9:43 am

When I tried to update cassandra table using spark-cassandra-connector in a Spark application, I encountered this problem. The reason is that there are multiple versions of com.google.guava:guava...

View Article

Spark and Parquet with large block size

December 23, 2014, 4:22 pm

One of issue when I run a Spark application in yarn cluster mode is that my executor container is killed because the memory exceeds memory limits. NodeManager's log shows the folowing messages:...

View Article

How to resolve "A schema cannot contain two global components with the same...

April 13, 2015, 11:29 pm

When I have a project using spring-data-cassandra, the XML configuration applicationContext.xml's header looks like below. xmlns:context="http://www.springframework.org/schema/context"...

View Article

Create Fedora 22 bootable USB

June 16, 2015, 3:39 pm

Download netinst ISO file:Log in as root: su -Find out which device maps to the USB flash drive: using dmesg | tail using fdisk -l. NOTE: use /dev/sdd instead of /dev/sdd1 because the later one is the...

View Article