It took me a while to figure out how to write an Avro file which can be imported into Hive and Impala.
- There are a lot of OutputFormat in avro 1.7.3: AvroOutputFormat, AvroKeyOutputFormat, AvroKeyValueOutputFormat and AvroSequenceFileOutputFormat. Which one can be imported into Hive? You should use AvroKeyOutputFormat in MapReduce Job to output Avro Container Files.
- You cannot specified any above output format in hive create table "stored as" clause because they don't implement HiveOutputFormat.
- Follow the example on this page: https://cwiki.apache.org/confluence/display/Hive/AvroSerDe. Unfortunately if you use "Avro Hive" to search, google shows your this page https://cwiki.apache.org/Hive/avroserde-working-with-avro-from-hive.html which has a wrong example, and you will get error message like:
What's wrong? The serde name should be org.apache.hadoop.hive.serde2.avro.AvroSerDe
FAILED: Error in metadata: Cannot validate serde: org.apache.hadoop.hive.serde2.AvroSerDe
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask - You don't have to define the columns because it can get from avro schema.