本文共 1830 字,大约阅读时间需要 6 分钟。
SparkSession?Spark 2.0?????????
SparkSession??Spark 2.0??????????????Spark???????????????????????????DataFrame?Dataset???API??????????????????????????????????SparkSession????
SparkSession?Spark 2.0?????????????????????????????????????SparkConf?SparkContext?SQLContext?????????????SparkSession??????
?Spark 2.0??????????SparkSession?????????????????SparkSession????
val warehouseLocation = "file:${system:user.dir}/spark-warehouse"val spark = SparkSession .builder() .appName("SparkSessionZipsExample") .config("spark.sql.warehouse.dir", warehouseLocation) .enableHiveSupport() .getOrCreate() ???SparkSession?????????API??????????????????
spark.conf.set("spark.sql.shuffle.partitions", "6")spark.conf.set("spark.executor.memory", "2g") SparkSession????????????????????????????
spark.catalog.listDatabases.show(false)spark.catalog.listTables.show(false)
??SparkSession?????????JSON????????DataFrame?
val jsonFile = args(0)val zipsDF = spark.read.json(jsonFile)zipsDF.filter(zipsDF.col("pop") > 40000).show(10) SparkSession????SQL???????????????????????
zipsDF.createOrReplaceTempView("zips_table")zipsDF.cache()val resultsDF = spark.sql("SELECT city, pop, state, zip FROM zips_table")resultsDF.show(10) ???SparkContext?????????????SparkSession?????????????????????????
SparkSession??????Spark????????????????????????????????????????SparkSession?????????????????????Hive????????????Spark 2.0???????????
?????http://www.raincent.com/content-85-7196-1.html
转载地址:http://rdgy.baihongyu.com/