目录

简介

本文简单介绍如何使用spark程序读取parquet格式数据和以文本格式保存。

代码

代码如下:

val hdfsPath:scala.Predef.String = args(0)
val savePath = args(1)

val sparkConf = new SparkConf().setAppName(appName)
val sqlContext = new SQLContext(new SparkContext(sparkConf))
val parquet = sqlContext.read.parquet(logPath)
parquet.select(parquet("someField")).rdd.saveAsTextFile(savePath)

问题

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrameReader.load(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame;

解决办法

需要用scala.Predef.String

val hdfsPath:scala.Predef.String = args(0)

完整代码

import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}


object ParseParquet {
  def main(args: Array[String]): Unit = {
    val hdfsPath:scala.Predef.String = args(0)
    val savePath = args(1)

    val sparkConf = new SparkConf().setAppName("Extract")

    val sqlContext = new SQLContext(new SparkContext(sparkConf))
    val parquet = sqlContext.read.load(hdfsPath)
    parquet.select(parquet("field")).rdd.coalesce(10).saveAsTextFile(savePath)
  }
}