Hello. I am working to get a simple solution working using Spark SQL. I am writing streaming data to persistent tables using a HiveContext. Writing to a persistent non-partitioned table works well - I update the table using Spark streaming, and the output is available via Hive Thrift/JDBC.
I create a table that looks like the following: 0: jdbc:hive2://localhost:10000> describe windows_event; describe windows_event; +--------------------------+---------------------+----------+ | col_name | data_type | comment | +--------------------------+---------------------+----------+ | target_entity | string | NULL | | target_entity_type | string | NULL | | date_time_utc | timestamp | NULL | | machine_ip | string | NULL | | event_id | string | NULL | | event_data | map<string,string> | NULL | | description | string | NULL | | event_record_id | string | NULL | | level | string | NULL | | machine_name | string | NULL | | sequence_number | string | NULL | | source | string | NULL | | source_machine_name | string | NULL | | task_category | string | NULL | | user | string | NULL | | additional_data | map<string,string> | NULL | | windows_event_time_bin | timestamp | NULL | | # Partition Information | | | | # col_name | data_type | comment | | windows_event_time_bin | timestamp | NULL | +--------------------------+---------------------+----------+ However, when I create a partitioned table and write data using the following: hiveWindowsEvents.foreachRDD( rdd => { val eventsDataFrame = rdd.toDF() eventsDataFrame.write.mode(SaveMode.Append).saveAsTable("windows_event") }) The data is written as though the table is not partitioned (so everything is written to /user/hive/warehouse/windows_event/file.gz.paquet. Because the data is not following the partition schema, it is not accessible (and not partitioned). Is there a straightforward way to write to partitioned tables using Spark SQL? I understand that the read performance for partitioned data is far better - are there other performance improvements that might be better to use instead of partitioning? Regards, Bryan Jeffrey