OK fine but dt is the column used for partitioning the table. This is what I get in Hive itself
use test; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; drop table if exists amo_bi_events; CREATE EXTERNAL TABLE `amo_bi_events`( `event_type` string COMMENT '', `timestamp` string COMMENT '', `event_valid` int COMMENT '', `event_subtype` string COMMENT '', `user_ip` string COMMENT '', `user_id` string COMMENT '', `cookie_status` string COMMENT '', `profile_status` string COMMENT '', `user_status` string COMMENT '', `previous_timestamp` string COMMENT '', `user_agent` string COMMENT '', `referer` string COMMENT '', `uri` string COMMENT '', `request_elapsed` bigint COMMENT '', `browser_languages` string COMMENT '', `acamp_id` int COMMENT '', `creative_id` int COMMENT '', `location_id` int COMMENT '', `pcamp_id` int COMMENT '', `pdomain_id` int COMMENT '', `country` string COMMENT '', `region` string COMMENT '', `dma` int COMMENT '', `city` string COMMENT '', `zip` string COMMENT '', `isp` string COMMENT '', `line_speed` string COMMENT '', `gender` string COMMENT '', `year_of_birth` int COMMENT '', `behaviors_read` string COMMENT '', `behaviors_written` string COMMENT '', `key_value_pairs` string COMMENT '', `acamp_candidates` int COMMENT '', `tag_format` string COMMENT '', `optimizer_name` string COMMENT '', `optimizer_version` string COMMENT '', `optimizer_ip` string COMMENT '', `pixel_id` int COMMENT '', `video_id` string COMMENT '', `video_network_id` int COMMENT '', `video_time_watched` bigint COMMENT '', `video_percentage_watched` int COMMENT '', `conversion_valid_sale` int COMMENT '', `conversion_sale_amount` float COMMENT '', `conversion_commission_amount` float COMMENT '', `conversion_step` int COMMENT '', `conversion_currency` string COMMENT '', `conversion_attribution` int COMMENT '', `conversion_offer_id` string COMMENT '', `custom_info` string COMMENT '', `frequency` int COMMENT '', `recency_seconds` int COMMENT '', `cost` float COMMENT '', `revenue` float COMMENT '', `optimizer_acamp_id` int COMMENT '', `optimizer_creative_id` int COMMENT '', `optimizer_ecpm` float COMMENT '', `event_id` string COMMENT '', `impression_id` string COMMENT '', `diagnostic_data` string COMMENT '', `user_profile_mapping_source` string COMMENT '', `latitude` float COMMENT '', `longitude` float COMMENT '', `area_code` int COMMENT '', `gmt_offset` string COMMENT '', `in_dst` string COMMENT '', `proxy_type` string COMMENT '', `mobile_carrier` string COMMENT '', `pop` string COMMENT '', `hostname` string COMMENT '', `profile_ttl` string COMMENT '', `timestamp_iso` string COMMENT '', `reference_id` string COMMENT '', `identity_organization` string COMMENT '', `identity_method` string COMMENT '', `mappable_id` string COMMENT '', `profile_expires` string COMMENT '', `video_player_iframed` int COMMENT '', `video_player_in_view` int COMMENT '', `video_player_width` int COMMENT '', `video_player_height` int COMMENT '', `host_domain` string COMMENT '', `browser_type` string COMMENT '', `browser_device_cat` string COMMENT '', `browser_family` string COMMENT '', `browser_name` string COMMENT '', `browser_version` string COMMENT '', `browser_major_version` string COMMENT '', `browser_minor_version` string COMMENT '', `os_family` string COMMENT '', `os_name` string COMMENT '', `os_version` string COMMENT '', `os_major_version` string COMMENT '', `os_minor_version` string COMMENT '') PARTITIONED BY (`dt` timestamp) STORED AS PARQUET; desc formatted amo_bi_events; The output in hive is as follows: # col_name data_type comment event_type string timestamp string event_valid int event_subtype string user_ip string user_id string cookie_status string profile_status string user_status string previous_timestamp string user_agent string referer string uri string request_elapsed bigint browser_languages string acamp_id int creative_id int location_id int pcamp_id int pdomain_id int country string region string dma int city string zip string isp string line_speed string gender string year_of_birth int behaviors_read string behaviors_written string key_value_pairs string acamp_candidates int tag_format string optimizer_name string optimizer_version string optimizer_ip string pixel_id int video_id string video_network_id int video_time_watched bigint video_percentage_watched int conversion_valid_sale int conversion_sale_amount float conversion_commission_amount float conversion_step int conversion_currency string conversion_attribution int conversion_offer_id string custom_info string frequency int recency_seconds int cost float revenue float optimizer_acamp_id int optimizer_creative_id int optimizer_ecpm float event_id string impression_id string diagnostic_data string user_profile_mapping_source string latitude float longitude float area_code int gmt_offset string in_dst string proxy_type string mobile_carrier string pop string hostname string profile_ttl string timestamp_iso string reference_id string identity_organization string identity_method string mappable_id string profile_expires string video_player_iframed int video_player_in_view int video_player_width int video_player_height int host_domain string browser_type string browser_device_cat string browser_family string browser_name string browser_version string browser_major_version string browser_minor_version string os_family string os_name string os_version string os_major_version string os_minor_version string # Partition Information # col_name data_type comment dt timestamp # Detailed Table Information Database: test Owner: hduser CreateTime: Fri Jun 03 19:03:20 BST 2016 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://rhes564:9000/user/hive/warehouse/test.db/amo_bi_events Table Type: EXTERNAL_TABLE Table Parameters: EXTERNAL TRUE transient_lastDdlTime 1464977000 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.397 seconds, Fetched: 124 row(s) So effectively that table is partitioned by dt in notime Now what I don't understand whether that table is already partitioned as you said the table already exists! Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 3 June 2016 at 18:48, Benjamin Kim <bbuil...@gmail.com> wrote: > Mich, > > I am using .withColumn to add another column “dt” that is a reformatted > version of an existing column “timestamp”. The partitioned by column is > “dt”. > > We are using Spark 1.6.0 in CDH 5.7.0. > > Thanks, > Ben > > On Jun 3, 2016, at 10:33 AM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > what version of spark are you using > > Dr Mich Talebzadeh > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > http://talebzadehmich.wordpress.com > > > > On 3 June 2016 at 17:51, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> ok what is the new column is called? you are basically adding a new >> column to an already existing table >> >> >> >> Dr Mich Talebzadeh >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 3 June 2016 at 17:04, Benjamin Kim <bbuil...@gmail.com> wrote: >> >>> The table already exists. >>> >>> CREATE EXTERNAL TABLE `amo_bi_events`( >>> `event_type` string COMMENT '', >>> >>> `timestamp` string COMMENT '', >>> >>> `event_valid` int COMMENT '', >>> >>> `event_subtype` string COMMENT '', >>> >>> `user_ip` string COMMENT '', >>> >>> `user_id` string COMMENT '', >>> >>> `cookie_status` string COMMENT '', >>> >>> `profile_status` string COMMENT '', >>> >>> `user_status` string COMMENT '', >>> >>> `previous_timestamp` string COMMENT '', >>> >>> `user_agent` string COMMENT '', >>> >>> `referer` string COMMENT '', >>> >>> `uri` string COMMENT '', >>> >>> `request_elapsed` bigint COMMENT '', >>> >>> `browser_languages` string COMMENT '', >>> >>> `acamp_id` int COMMENT '', >>> >>> `creative_id` int COMMENT '', >>> >>> `location_id` int COMMENT '', >>> >>> `pcamp_id` int COMMENT '', >>> >>> `pdomain_id` int COMMENT '', >>> >>> `country` string COMMENT '', >>> >>> `region` string COMMENT '', >>> >>> `dma` int COMMENT '', >>> >>> `city` string COMMENT '', >>> >>> `zip` string COMMENT '', >>> >>> `isp` string COMMENT '', >>> >>> `line_speed` string COMMENT '', >>> >>> `gender` string COMMENT '', >>> >>> `year_of_birth` int COMMENT '', >>> >>> `behaviors_read` string COMMENT '', >>> >>> `behaviors_written` string COMMENT '', >>> >>> `key_value_pairs` string COMMENT '', >>> >>> `acamp_candidates` int COMMENT '', >>> >>> `tag_format` string COMMENT '', >>> >>> `optimizer_name` string COMMENT '', >>> >>> `optimizer_version` string COMMENT '', >>> >>> `optimizer_ip` string COMMENT '', >>> >>> `pixel_id` int COMMENT '', >>> >>> `video_id` string COMMENT '', >>> >>> `video_network_id` int COMMENT '', >>> >>> `video_time_watched` bigint COMMENT '', >>> >>> `video_percentage_watched` int COMMENT '', >>> >>> `conversion_valid_sale` int COMMENT '', >>> >>> `conversion_sale_amount` float COMMENT '', >>> >>> `conversion_commission_amount` float COMMENT '', >>> >>> `conversion_step` int COMMENT '', >>> >>> `conversion_currency` string COMMENT '', >>> >>> `conversion_attribution` int COMMENT '', >>> >>> `conversion_offer_id` string COMMENT '', >>> >>> `custom_info` string COMMENT '', >>> >>> `frequency` int COMMENT '', >>> >>> `recency_seconds` int COMMENT '', >>> >>> `cost` float COMMENT '', >>> >>> `revenue` float COMMENT '', >>> >>> `optimizer_acamp_id` int COMMENT '', >>> >>> `optimizer_creative_id` int COMMENT '', >>> >>> `optimizer_ecpm` float COMMENT '', >>> >>> `event_id` string COMMENT '', >>> >>> `impression_id` string COMMENT '', >>> >>> `diagnostic_data` string COMMENT '', >>> >>> `user_profile_mapping_source` string COMMENT '', >>> >>> `latitude` float COMMENT '', >>> >>> `longitude` float COMMENT '', >>> >>> `area_code` int COMMENT '', >>> >>> `gmt_offset` string COMMENT '', >>> >>> `in_dst` string COMMENT '', >>> >>> `proxy_type` string COMMENT '', >>> >>> `mobile_carrier` string COMMENT '', >>> >>> `pop` string COMMENT '', >>> >>> `hostname` string COMMENT '', >>> >>> `profile_ttl` string COMMENT '', >>> >>> `timestamp_iso` string COMMENT '', >>> >>> `reference_id` string COMMENT '', >>> >>> `identity_organization` string COMMENT '', >>> >>> `identity_method` string COMMENT '', >>> >>> `mappable_id` string COMMENT '', >>> >>> `profile_expires` string COMMENT '', >>> >>> `video_player_iframed` int COMMENT '', >>> >>> `video_player_in_view` int COMMENT '', >>> >>> `video_player_width` int COMMENT '', >>> >>> `video_player_height` int COMMENT '', >>> >>> `host_domain` string COMMENT '', >>> >>> `browser_type` string COMMENT '', >>> >>> `browser_device_cat` string COMMENT '', >>> >>> `browser_family` string COMMENT '', >>> >>> `browser_name` string COMMENT '', >>> >>> `browser_version` string COMMENT '', >>> >>> `browser_major_version` string COMMENT '', >>> >>> `browser_minor_version` string COMMENT '', >>> >>> `os_family` string COMMENT '', >>> >>> `os_name` string COMMENT '', >>> >>> `os_version` string COMMENT '', >>> >>> `os_major_version` string COMMENT '', >>> >>> `os_minor_version` string COMMENT '') >>> >>> PARTITIONED BY (`dt` timestamp) >>> >>> STORED AS PARQUET; >>> >>> Thanks, >>> Ben >>> >>> >>> On Jun 3, 2016, at 8:47 AM, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>> hang on are you saving this as a new table? >>> >>> Dr Mich Talebzadeh >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 3 June 2016 at 14:13, Benjamin Kim <bbuil...@gmail.com> wrote: >>> >>>> Does anyone know how to save data in a DataFrame to a table partitioned >>>> using an existing column reformatted into a derived column? >>>> >>>> val partitionedDf = df.withColumn("dt", >>>> concat(substring($"timestamp", 1, 10), lit(" "), substring($"timestamp", >>>> 12, 2), lit(":00"))) >>>> >>>> sqlContext.setConf("hive.exec.dynamic.partition", >>>> "true") >>>> sqlContext.setConf("hive.exec.dynamic.partition.mode", >>>> "nonstrict") >>>> partitionedDf.write >>>> .mode(SaveMode.Append) >>>> .partitionBy("dt") >>>> .saveAsTable("ds.amo_bi_events") >>>> >>>> I am getting an ArrayOutOfBounds error. There are 83 columns in the >>>> destination table. But after adding the derived column, then I get an 84 >>>> error. I assumed that the column used for the partition would not be >>>> counted. >>>> >>>> Can someone please help. >>>> >>>> Thanks, >>>> Ben >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >>> >> > >