Quantcast
Channel: My Tech Notes
Viewing all articles
Browse latest Browse all 90

Avro

$
0
0

It took me a while to figure out how to write an Avro file which can be imported into Hive and Impala.

  • There are a lot of OutputFormat in avro 1.7.3: AvroOutputFormat, AvroKeyOutputFormat, AvroKeyValueOutputFormat and AvroSequenceFileOutputFormat. Which one can be imported into Hive? You should use AvroKeyOutputFormat in MapReduce Job to output Avro Container Files.
  • You cannot specified any above output format in hive create table "stored as" clause because they don't implement HiveOutputFormat.
  • Follow the example on this page: https://cwiki.apache.org/confluence/display/Hive/AvroSerDe. Unfortunately if you use "Avro Hive" to search, google shows your this page https://cwiki.apache.org/Hive/avroserde-working-with-avro-from-hive.html which has a wrong example, and you will get error message like:

    FAILED: Error in metadata: Cannot validate serde: org.apache.hadoop.hive.serde2.AvroSerDe
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
    What's wrong? The serde name should be org.apache.hadoop.hive.serde2.avro.AvroSerDe
  • You don't have to define the columns because it can get from avro schema.

Viewing all articles
Browse latest Browse all 90

Trending Articles