Quantcast
Browsing all 90 articles
Browse latest View live

Yarn MapReduce Log Level

When you write a Pig or Hive UDF, a debug log may be very useful. You don't have to ask your administrator for help. Setting property 'mapreduce.map.log.level' or 'mapreduce.reduce.log.level' to...

View Article


Parquet "java.lang.NoClassDefFoundError: org/apache/thrift/TEnum"

If you encounter this problem using Cloudera parcels, here is the solution according to this pageorg.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoClassDefFoundError:...

View Article


Run hadoop shell command Super Fast

If you run Hadoop shell commands on console or use them to write a script, you will hate that because it loads and starts JVM for every command. A command like "hadoop fs -ls /tmp/abc" usually takes...

View Article

Add new hard disk on CentOS6

Add a new hard disk.disk tool format the hard disk using "Master boot record".start system-config-lvminitialize entityadd to a volumn groupselect the volume in "Volumn Groups->group->Logical...

View Article

"Cannot get schema from loadFunc parquet.pig.ParquetLoader"

If you get this error 2014-03-25 14:17:48,933 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245: Cannot get schema from loadFunc parquet.pig.ParquetLoaderDetails at logfile:...

View Article


Git subtree to manage vim plugins

I found those blogs http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree and http://endot.org/2011/05/18/git-submodules-vs-subtrees-for-vim-plugins/ about how to use Git...

View Article

Run Spark Shell locally

If you want to run spark to access the local file system, here is the simple way: HADOOP_CONF_DIR=. MASTER=local spark-shell If you don't give HADOOP_CONF_DIR, spark will use /etc/hadoop/conf which may...

View Article

HCatalog and Parquet

I'm trying to use Sqoop to import Teradata tables into Impala's Parquet tables. Because Sqoop doesn't support to write Parquet files directly, it SEEMS very promising to use Sqoop HCatalog to write...

View Article


Teradata create a table like

create table target_db.target_table as src_db.src_table with no data; create table target_db.target_table as (select * from src_db.src_view) with no data;

View Article


Links of Hadoop Cluster Hardware

HardWareCloudera: How-to: Select the Right Hardware for Your New Hadoop ClusterHortonworks: Best Practices for Selecting Apache Hadoop HardwareHortonworks: Chapter 1. Hardware Recommendations For...

View Article

Parquet Schema Incompatible between Pig and Hive

When you use Pig to process data and put into a Hive table, you need to be careful about the Pig namespace. It is possible that your complex Pig scripts may have namespace after group-by. And...

View Article

Set replication for files in Hadoop

Change the existing files's replications hadoop fs -setrep -R -w 2 /data-dirSet replication when loading a file hadoop fs -Ddfs.replication=2 -put local_file dfs_file

View Article

Impala-shell may have control sequence in its output

Assume you have a table has three rows, what is the result in the output file /tmp/my_table_count? 3? Actually it is not. There is a control sequence "ESC[?1034h" on my terminal. $ impala-shell -B -q...

View Article


Add Terminals View

I really like Eclipse Luna's new feature "Terminals". It is convenient so that you don't have to leave Eclipse to open a terminal to type your command. And Mylyn task still record the time you spend on...

View Article

Solarized Light (Scala) for Eclipse

Just install ScalaIDE 4.0.0-RC1 on my Eclipse Luna. I really like "Solarized Light (Scala)" color theme as shown in the release note. You need to install "Eclipse Color Theme Plugin", otherwise you...

View Article


Build scalatest-eclipse-plugin for Scala-IDE 4.0.0 RC1

I upgraded Scala-IDE in Eclipse Luna to 4.0.0 RC1, but didn't find Scalatest plugin. It is inconvenient and time consuming to run "mvn test" every time I change the test. I decided to build Scalatest...

View Article

"NoSuchMethodError:...

When I tried to update cassandra table using spark-cassandra-connector in a Spark application, I encountered this problem. The reason is that there are multiple versions of com.google.guava:guava...

View Article


Spark and Parquet with large block size

One of issue when I run a Spark application in yarn cluster mode is that my executor container is killed because the memory exceeds memory limits. NodeManager's log shows the folowing messages:...

View Article

How to resolve "A schema cannot contain two global components with the same...

When I have a project using spring-data-cassandra, the XML configuration applicationContext.xml's header looks like below. xmlns:context="http://www.springframework.org/schema/context"...

View Article

Create Fedora 22 bootable USB

Download netinst ISO file:Log in as root: su -Find out which device maps to the USB flash drive: using dmesg | tail using fdisk -l. NOTE: use /dev/sdd instead of /dev/sdd1 because the later one is the...

View Article
Browsing all 90 articles
Browse latest View live