map对每个元素进行操作,示例代码如下:
val a = sc.parallelize(1 to 9, 3) def mapDoubleFunc(a : Int) : (Int,Int) = { (a,a*2) } val mapResult = a.map(mapDoubleFunc) println(mapResult.collect().mkString)
mapPartitions对迭代器进行操作,示例代码如下:
val a = sc.parallelize(1 to 9, 3) def doubleFunc(iter: Iterator[Int]) : Iterator[(Int,Int)] = { var res = List[(Int,Int)]() while (iter.hasNext) { val cur = iter.next; res .::= (cur,cur*2) } res.iterator } val result = a.mapPartitions(doubleFunc) println(result.collect().mkString)