Thanks.  I added that as well.
I also needed to add  a hard coded import previously
spark.implicits._
 val flightDf =
spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")

flights.show(3)

+-----------------+-------------------+-----+
|DEST_COUNTRY_NAME|ORIGIN_COUNTRY_NAME|count|
+-----------------+-------------------+-----+
|    United States|            Romania|    1|
|    United States|            Ireland|  264|
|    United States|              India|   69|
+-----------------+-------------------+-----+
only showing top 3 rows


Process finished with exit code 0

Backbutton.co.uk
¯\_(ツ)_/¯
♡۶Java♡۶RMI ♡۶
Make Use Method {MUM}
makeuse.org
<http://www.backbutton.co.uk>


On Thu, 26 Mar 2020 at 19:18, Reynold Xin <r...@databricks.com> wrote:

> bcc dev, +user
>
> You need to print out the result. Take itself doesn't print. You only got
> the results printed to the console because the Scala REPL automatically
> prints the returned value from take.
>
>
> On Thu, Mar 26, 2020 at 12:15 PM, Zahid Rahman <zahidr1...@gmail.com>
> wrote:
>
>> I am running the same code with the same libraries but not getting same
>> output.
>> scala>  case class flight (DEST_COUNTRY_NAME: String,
>>      |                      ORIGIN_COUNTRY_NAME:String,
>>      |                      count: BigInt)
>> defined class flight
>>
>> scala>     val flightDf = spark.read.parquet
>> ("/data/flight-data/parquet/2010-summary.parquet/")
>> flightDf: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string,
>> ORIGIN_COUNTRY_NAME: string ... 1 more field]
>>
>> scala> val flights = flightDf.as <http://flightdf.as/>[flight]
>> flights: org.apache.spark.sql.Dataset[flight] = [DEST_COUNTRY_NAME:
>> string, ORIGIN_COUNTRY_NAME: string ... 1 more field]
>>
>> scala> flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME !=
>> "Canada").map(flight_row => flight_row).take(3)
>>
>> *res0: Array[flight] = Array(flight(United States,Romania,1),
>> flight(United States,Ireland,264), flight(United States,India,69))*
>>
>>
>> <!------------------------------------------------------------------------------------------------------------------------------
>> 20/03/26 19:09:00 INFO SparkContext: Running Spark version 3.0.0-preview2
>> 20/03/26 19:09:00 INFO ResourceUtils:
>> ==============================================================
>> 20/03/26 19:09:00 INFO ResourceUtils: Resources for spark.driver:
>>
>> 20/03/26 19:09:00 INFO ResourceUtils:
>> ==============================================================
>> 20/03/26 19:09:00 INFO SparkContext: Submitted application:  chapter2
>> 20/03/26 19:09:00 INFO SecurityManager: Changing view acls to: kub19
>> 20/03/26 19:09:00 INFO SecurityManager: Changing modify acls to: kub19
>> 20/03/26 19:09:00 INFO SecurityManager: Changing view acls groups to:
>> 20/03/26 19:09:00 INFO SecurityManager: Changing modify acls groups to:
>> 20/03/26 19:09:00 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users  with view permissions: Set(kub19);
>> groups with view permissions: Set(); users  with modify permissions:
>> Set(kub19); groups with modify permissions: Set()
>> 20/03/26 19:09:00 INFO Utils: Successfully started service 'sparkDriver'
>> on port 46817.
>> 20/03/26 19:09:00 INFO SparkEnv: Registering MapOutputTracker
>> 20/03/26 19:09:00 INFO SparkEnv: Registering BlockManagerMaster
>> 20/03/26 19:09:00 INFO BlockManagerMasterEndpoint: Using
>> org.apache.spark.storage.DefaultTopologyMapper
>> <http://org.apache.spark.storage.defaulttopologymapper/> for getting
>> topology information
>> 20/03/26 19:09:00 INFO BlockManagerMasterEndpoint:
>> BlockManagerMasterEndpoint up
>> 20/03/26 19:09:00 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
>> 20/03/26 19:09:00 INFO DiskBlockManager: Created local directory at
>> /tmp/blockmgr-0baf5097-2595-4542-99e2-a192d7baf37c
>> 20/03/26 19:09:00 INFO MemoryStore: MemoryStore started with capacity
>> 127.2 MiB
>> 20/03/26 19:09:01 INFO SparkEnv: Registering OutputCommitCoordinator
>> 20/03/26 19:09:01 WARN Utils: Service 'SparkUI' could not bind on port
>> 4040. Attempting port 4041.
>> 20/03/26 19:09:01 INFO Utils: Successfully started service 'SparkUI' on
>> port 4041.
>> 20/03/26 19:09:01 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
>> http://localhost:4041
>> 20/03/26 19:09:01 INFO Executor: Starting executor ID driver on host
>> localhost
>> 20/03/26 19:09:01 INFO Utils: Successfully started service '
>> org.apache.spark.network.netty.NettyBlockTransferService
>> <http://org.apache.spark.network.netty.nettyblocktransferservice/>' on
>> port 38135.
>> 20/03/26 19:09:01 INFO NettyBlockTransferService: Server created on
>> localhost:38135
>> 20/03/26 19:09:01 INFO BlockManager: Using
>> org.apache.spark.storage.RandomBlockReplicationPolicy
>> <http://org.apache.spark.storage.randomblockreplicationpolicy/> for
>> block replication policy
>> 20/03/26 19:09:01 INFO BlockManagerMaster: Registering BlockManager
>> BlockManagerId(driver, localhost, 38135, None)
>> 20/03/26 19:09:01 INFO BlockManagerMasterEndpoint: Registering block
>> manager localhost:38135 with 127.2 MiB RAM, BlockManagerId(driver,
>> localhost, 38135, None)
>> 20/03/26 19:09:01 INFO BlockManagerMaster: Registered BlockManager
>> BlockManagerId(driver, localhost, 38135, None)
>> 20/03/26 19:09:01 INFO BlockManager: Initialized BlockManager:
>> BlockManagerId(driver, localhost, 38135, None)
>> 20/03/26 19:09:01 INFO SharedState: Setting hive.metastore.warehouse.dir
>> ('null') to the value of spark.sql.warehouse.dir
>> ('file:/home/kub19/spark-3.0.0-preview2-bin-hadoop2.7/projects/spark-warehouse').
>> 20/03/26 19:09:01 INFO SharedState: Warehouse path is
>> 'file:/home/kub19/spark-3.0.0-preview2-bin-hadoop2.7/projects/spark-warehouse'.
>> 20/03/26 19:09:02 INFO SparkContext: Starting job: parquet at
>> chapter2.scala:18
>> 20/03/26 19:09:02 INFO DAGScheduler: Got job 0 (parquet at
>> chapter2.scala:18) with 1 output partitions
>> 20/03/26 19:09:02 INFO DAGScheduler: Final stage: ResultStage 0 (parquet
>> at chapter2.scala:18)
>> 20/03/26 19:09:02 INFO DAGScheduler: Parents of final stage: List()
>> 20/03/26 19:09:02 INFO DAGScheduler: Missing parents: List()
>> 20/03/26 19:09:02 INFO DAGScheduler: Submitting ResultStage 0
>> (MapPartitionsRDD[1] at parquet at chapter2.scala:18), which has no missing
>> parents
>> 20/03/26 19:09:02 INFO MemoryStore: Block broadcast_0 stored as values in
>> memory (estimated size 72.8 KiB, free 127.1 MiB)
>> 20/03/26 19:09:02 INFO MemoryStore: Block broadcast_0_piece0 stored as
>> bytes in memory (estimated size 25.9 KiB, free 127.1 MiB)
>> 20/03/26 19:09:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>> memory on localhost:38135 (size: 25.9 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:02 INFO SparkContext: Created broadcast 0 from broadcast
>> at DAGScheduler.scala:1206
>> 20/03/26 19:09:02 INFO DAGScheduler: Submitting 1 missing tasks from
>> ResultStage 0 (MapPartitionsRDD[1] at parquet at chapter2.scala:18) (first
>> 15 tasks are for partitions Vector(0))
>> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
>> 20/03/26 19:09:02 INFO TaskSetManager: Starting task 0.0 in stage 0.0
>> (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7560 bytes)
>> 20/03/26 19:09:02 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
>> 20/03/26 19:09:02 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
>> 1840 bytes result sent to driver
>> 20/03/26 19:09:02 INFO TaskSetManager: Finished task 0.0 in stage 0.0
>> (TID 0) in 204 ms on localhost (executor driver) (1/1)
>> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose
>> tasks have all completed, from pool
>> 20/03/26 19:09:02 INFO DAGScheduler: ResultStage 0 (parquet at
>> chapter2.scala:18) finished in 0.304 s
>> 20/03/26 19:09:02 INFO DAGScheduler: Job 0 is finished. Cancelling
>> potential speculative or zombie tasks for this job
>> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Killing all running tasks in
>> stage 0: Stage finished
>> 20/03/26 19:09:02 INFO DAGScheduler: Job 0 finished: parquet at
>> chapter2.scala:18, took 0.332643 s
>> 20/03/26 19:09:03 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
>> localhost:38135 in memory (size: 25.9 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:04 INFO V2ScanRelationPushDown:
>> Pushing operators to parquet
>> file:/data/flight-data/parquet/2010-summary.parquet
>> Pushed Filters:
>> Post-Scan Filters:
>> Output: DEST_COUNTRY_NAME#0, ORIGIN_COUNTRY_NAME#1, count#2L
>>
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_1 stored as values in
>> memory (estimated size 290.0 KiB, free 126.9 MiB)
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_1_piece0 stored as
>> bytes in memory (estimated size 24.3 KiB, free 126.9 MiB)
>> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in
>> memory on localhost:38135 (size: 24.3 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 1 from take at
>> chapter2.scala:20
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_2 stored as values in
>> memory (estimated size 290.1 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_2_piece0 stored as
>> bytes in memory (estimated size 24.3 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in
>> memory on localhost:38135 (size: 24.3 KiB, free: 127.2 MiB)
>> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 2 from take at
>> chapter2.scala:20
>> 20/03/26 19:09:04 INFO CodeGenerator: Code generated in 159.155401 ms
>> 20/03/26 19:09:04 INFO SparkContext: Starting job: take at
>> chapter2.scala:20
>> 20/03/26 19:09:04 INFO DAGScheduler: Got job 1 (take at
>> chapter2.scala:20) with 1 output partitions
>> 20/03/26 19:09:04 INFO DAGScheduler: Final stage: ResultStage 1 (take at
>> chapter2.scala:20)
>> 20/03/26 19:09:04 INFO DAGScheduler: Parents of final stage: List()
>> 20/03/26 19:09:04 INFO DAGScheduler: Missing parents: List()
>> 20/03/26 19:09:04 INFO DAGScheduler: Submitting ResultStage 1
>> (MapPartitionsRDD[5] at take at chapter2.scala:20), which has no missing
>> parents
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_3 stored as values in
>> memory (estimated size 22.7 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_3_piece0 stored as
>> bytes in memory (estimated size 8.1 KiB, free 126.6 MiB)
>> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_3_piece0 in
>> memory on localhost:38135 (size: 8.1 KiB, free: 127.1 MiB)
>> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 3 from broadcast
>> at DAGScheduler.scala:1206
>> 20/03/26 19:09:04 INFO DAGScheduler: Submitting 1 missing tasks from
>> ResultStage 1 (MapPartitionsRDD[5] at take at chapter2.scala:20) (first 15
>> tasks are for partitions Vector(0))
>> 20/03/26 19:09:04 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
>> 20/03/26 19:09:04 INFO TaskSetManager: Starting task 0.0 in stage 1.0
>> (TID 1, localhost, executor driver, partition 0, PROCESS_LOCAL, 7980 bytes)
>> 20/03/26 19:09:04 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
>> 20/03/26 19:09:05 INFO FilePartitionReader: Reading file path:
>> file:///data/flight-data/parquet/2010-summary.parquet/part-r-00000-1a9822ba-b8fb-4d8e-844a-ea30d0801b9e.gz.parquet,
>> range: 0-3921, partition values: [empty row]
>> 20/03/26 19:09:05 INFO ZlibFactory: Successfully loaded & initialized
>> native-zlib library
>> 20/03/26 19:09:05 INFO CodecPool: Got brand-new decompressor [.gz]
>> 20/03/26 19:09:05 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1).
>> 1762 bytes result sent to driver
>> 20/03/26 19:09:05 INFO TaskSetManager: Finished task 0.0 in stage 1.0
>> (TID 1) in 219 ms on localhost (executor driver) (1/1)
>> 20/03/26 19:09:05 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose
>> tasks have all completed, from pool
>> 20/03/26 19:09:05 INFO DAGScheduler: ResultStage 1 (take at
>> chapter2.scala:20) finished in 0.235 s
>> 20/03/26 19:09:05 INFO DAGScheduler: Job 1 is finished. Cancelling
>> potential speculative or zombie tasks for this job
>> 20/03/26 19:09:05 INFO TaskSchedulerImpl: Killing all running tasks in
>> stage 1: Stage finished
>> 20/03/26 19:09:05 INFO DAGScheduler: Job 1 finished: take at
>> chapter2.scala:20, took 0.238010 s
>> 20/03/26 19:09:05 INFO CodeGenerator: Code generated in 17.77886 ms
>> 20/03/26 19:09:05 INFO SparkUI: Stopped Spark web UI at
>> http://localhost:4041
>> 20/03/26 19:09:05 INFO MapOutputTrackerMasterEndpoint:
>> MapOutputTrackerMasterEndpoint stopped!
>> 20/03/26 19:09:05 INFO MemoryStore: MemoryStore cleared
>> 20/03/26 19:09:05 INFO BlockManager: BlockManager stopped
>> 20/03/26 19:09:05 INFO BlockManagerMaster: BlockManagerMaster stopped
>> 20/03/26 19:09:05 INFO
>> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
>> OutputCommitCoordinator stopped!
>> 20/03/26 19:09:05 INFO SparkContext: Successfully stopped SparkContext
>> 20/03/26 19:09:05 INFO ShutdownHookManager: Shutdown hook called
>> 20/03/26 19:09:05 INFO ShutdownHookManager: Deleting directory
>> /tmp/spark-6d99677e-ae1b-4894-aa32-3a79fb0b4307
>>
>> Process finished with exit code 0
>>
>> <!----------------------------------------------------------------------------------------------------------
>>
>> import org.apache.spark.sql.SparkSession
>>
>> object chapter2 {
>>
>>   // define specific  data type class then manipulate it using the filter 
>> and map functions
>>   // this is also known as an Encoder
>>   case class flight (DEST_COUNTRY_NAME: String,
>>                      ORIGIN_COUNTRY_NAME:String,
>>                      count: BigInt)
>>
>>   def main(args: Array[String]): Unit = {
>>
>>     // using an inter active shell, spark session needed here to avoid 
>> Intellij errors
>>     val spark = SparkSession.builder.master("local[*]").appName(" 
>> chapter2").getOrCreate
>>
>>     // looks like a hard coded system work around
>>     import spark.implicits._
>>     val flightDf = 
>> spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/")
>>     val flights = flightDf.as <http://flightdf.as/>[flight]
>>     flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME != 
>> "Canada").map(flight_row => flight_row).take(3)
>>
>>     spark.stop()
>>   }
>> }
>>
>>
>>
>>
>>
>>
>>
>> Backbutton.co.uk
>> ¯\_(ツ)_/¯
>> ♡۶Java♡۶RMI ♡۶
>> Make Use Method {MUM}
>> makeuse.org
>>
>
>

Reply via email to