Quantcast
Channel: My Tech Notes
Viewing all articles
Browse latest Browse all 90

Scala and Java collection interoperability in MapReduce job

$
0
0
In Scala, you can import scala.collections.JavaConversions._ to make collections interoperabable between Scala and Java. for example

scala.collection.Iterable <=> java.lang.Iterable
Usually I prefer Scala collection API because it is concise and powerful. But be careful, this may not work in all cases. I encountered this problem when I wrote a Scala MapReduce job:

// do something on values(1)
values.drop(1).foreach { v =>
...
}
The code tries to handle the first element and the rest differently. This piece of code worked in the combiner perfectly, but failed in the reducer. Both the combiner and reducer use values.drop(1).foreach The reason is, I believe, that the iterable in reducer is based on a file, the file position cannot go back. When you call drop(1) in Scala, the file position moves to next, then two elements are actually dropped.

Viewing all articles
Browse latest Browse all 90

Trending Articles