目录
简介
本文简单介绍如何使用spark程序读取parquet格式数据和以文本格式保存。
代码
代码如下:
val hdfsPath:scala.Predef.String = args(0)
val savePath = args(1)
val sparkConf = new SparkConf().setAppName(appName)
val sqlContext = new SQLContext(new SparkContext(sparkConf))
val parquet = sqlContext.read.parquet(logPath)
parquet.select(parquet("someField")).rdd.saveAsTextFile(savePath)
问题
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrameReader.load(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame;
解决办法
需要用scala.Predef.String
val hdfsPath:scala.Predef.String = args(0)
完整代码
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
object ParseParquet {
def main(args: Array[String]): Unit = {
val hdfsPath:scala.Predef.String = args(0)
val savePath = args(1)
val sparkConf = new SparkConf().setAppName("Extract")
val sqlContext = new SQLContext(new SparkContext(sparkConf))
val parquet = sqlContext.read.load(hdfsPath)
parquet.select(parquet("field")).rdd.coalesce(10).saveAsTextFile(savePath)
}
}