Yarn MapReduce Log Level
When you write a Pig or Hive UDF, a debug log may be very useful. You don't have to ask your administrator for help. Setting property 'mapreduce.map.log.level' or 'mapreduce.reduce.log.level' to...
View ArticleParquet "java.lang.NoClassDefFoundError: org/apache/thrift/TEnum"
If you encounter this problem using Cloudera parcels, here is the solution according to this pageorg.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoClassDefFoundError:...
View ArticleRun hadoop shell command Super Fast
If you run Hadoop shell commands on console or use them to write a script, you will hate that because it loads and starts JVM for every command. A command like "hadoop fs -ls /tmp/abc" usually takes...
View ArticleAdd new hard disk on CentOS6
Add a new hard disk.disk tool format the hard disk using "Master boot record".start system-config-lvminitialize entityadd to a volumn groupselect the volume in "Volumn Groups->group->Logical...
View Article"Cannot get schema from loadFunc parquet.pig.ParquetLoader"
If you get this error 2014-03-25 14:17:48,933 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245: Cannot get schema from loadFunc parquet.pig.ParquetLoaderDetails at logfile:...
View ArticleGit subtree to manage vim plugins
I found those blogs http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree and http://endot.org/2011/05/18/git-submodules-vs-subtrees-for-vim-plugins/ about how to use Git...
View ArticleRun Spark Shell locally
If you want to run spark to access the local file system, here is the simple way: HADOOP_CONF_DIR=. MASTER=local spark-shell If you don't give HADOOP_CONF_DIR, spark will use /etc/hadoop/conf which may...
View ArticleHCatalog and Parquet
I'm trying to use Sqoop to import Teradata tables into Impala's Parquet tables. Because Sqoop doesn't support to write Parquet files directly, it SEEMS very promising to use Sqoop HCatalog to write...
View ArticleTeradata create a table like
create table target_db.target_table as src_db.src_table with no data; create table target_db.target_table as (select * from src_db.src_view) with no data;
View ArticleLinks of Hadoop Cluster Hardware
HardWareCloudera: How-to: Select the Right Hardware for Your New Hadoop ClusterHortonworks: Best Practices for Selecting Apache Hadoop HardwareHortonworks: Chapter 1. Hardware Recommendations For...
View ArticleParquet Schema Incompatible between Pig and Hive
When you use Pig to process data and put into a Hive table, you need to be careful about the Pig namespace. It is possible that your complex Pig scripts may have namespace after group-by. And...
View ArticleSet replication for files in Hadoop
Change the existing files's replications hadoop fs -setrep -R -w 2 /data-dirSet replication when loading a file hadoop fs -Ddfs.replication=2 -put local_file dfs_file
View ArticleImpala-shell may have control sequence in its output
Assume you have a table has three rows, what is the result in the output file /tmp/my_table_count? 3? Actually it is not. There is a control sequence "ESC[?1034h" on my terminal. $ impala-shell -B -q...
View ArticleAdd Terminals View
I really like Eclipse Luna's new feature "Terminals". It is convenient so that you don't have to leave Eclipse to open a terminal to type your command. And Mylyn task still record the time you spend on...
View ArticleSolarized Light (Scala) for Eclipse
Just install ScalaIDE 4.0.0-RC1 on my Eclipse Luna. I really like "Solarized Light (Scala)" color theme as shown in the release note. You need to install "Eclipse Color Theme Plugin", otherwise you...
View ArticleBuild scalatest-eclipse-plugin for Scala-IDE 4.0.0 RC1
I upgraded Scala-IDE in Eclipse Luna to 4.0.0 RC1, but didn't find Scalatest plugin. It is inconvenient and time consuming to run "mvn test" every time I change the test. I decided to build Scalatest...
View Article"NoSuchMethodError:...
When I tried to update cassandra table using spark-cassandra-connector in a Spark application, I encountered this problem. The reason is that there are multiple versions of com.google.guava:guava...
View ArticleSpark and Parquet with large block size
One of issue when I run a Spark application in yarn cluster mode is that my executor container is killed because the memory exceeds memory limits. NodeManager's log shows the folowing messages:...
View ArticleHow to resolve "A schema cannot contain two global components with the same...
When I have a project using spring-data-cassandra, the XML configuration applicationContext.xml's header looks like below. xmlns:context="http://www.springframework.org/schema/context"...
View ArticleCreate Fedora 22 bootable USB
Download netinst ISO file:Log in as root: su -Find out which device maps to the USB flash drive: using dmesg | tail using fdisk -l. NOTE: use /dev/sdd instead of /dev/sdd1 because the later one is the...
View Article