OK fine but dt is the column used for partitioning the table.

This is what I get in Hive itself

use test;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
drop table if exists amo_bi_events;
CREATE EXTERNAL TABLE `amo_bi_events`(
   `event_type` string COMMENT '',
   `timestamp` string COMMENT '',
   `event_valid` int COMMENT '',
   `event_subtype` string COMMENT '',
   `user_ip` string COMMENT '',
   `user_id` string COMMENT '',
   `cookie_status` string COMMENT '',
   `profile_status` string COMMENT '',
   `user_status` string COMMENT '',
   `previous_timestamp` string COMMENT '',
   `user_agent` string COMMENT '',
   `referer` string COMMENT '',
   `uri` string COMMENT '',
   `request_elapsed` bigint COMMENT '',
   `browser_languages` string COMMENT '',
   `acamp_id` int COMMENT '',
   `creative_id` int COMMENT '',
   `location_id` int COMMENT '',
   `pcamp_id` int COMMENT '',
   `pdomain_id` int COMMENT '',
   `country` string COMMENT '',
   `region` string COMMENT '',
   `dma` int COMMENT '',
   `city` string COMMENT '',
   `zip` string COMMENT '',
   `isp` string COMMENT '',
   `line_speed` string COMMENT '',
   `gender` string COMMENT '',
   `year_of_birth` int COMMENT '',
   `behaviors_read` string COMMENT '',
   `behaviors_written` string COMMENT '',
   `key_value_pairs` string COMMENT '',
   `acamp_candidates` int COMMENT '',
   `tag_format` string COMMENT '',
   `optimizer_name` string COMMENT '',
   `optimizer_version` string COMMENT '',
   `optimizer_ip` string COMMENT '',
   `pixel_id` int COMMENT '',
   `video_id` string COMMENT '',
   `video_network_id` int COMMENT '',
   `video_time_watched` bigint COMMENT '',
   `video_percentage_watched` int COMMENT '',
   `conversion_valid_sale` int COMMENT '',
   `conversion_sale_amount` float COMMENT '',
   `conversion_commission_amount` float COMMENT '',
   `conversion_step` int COMMENT '',
   `conversion_currency` string COMMENT '',
   `conversion_attribution` int COMMENT '',
   `conversion_offer_id` string COMMENT '',
   `custom_info` string COMMENT '',
   `frequency` int COMMENT '',
   `recency_seconds` int COMMENT '',
   `cost` float COMMENT '',
   `revenue` float COMMENT '',
   `optimizer_acamp_id` int COMMENT '',
   `optimizer_creative_id` int COMMENT '',
   `optimizer_ecpm` float COMMENT '',
   `event_id` string COMMENT '',
   `impression_id` string COMMENT '',
   `diagnostic_data` string COMMENT '',
   `user_profile_mapping_source` string COMMENT '',
   `latitude` float COMMENT '',
   `longitude` float COMMENT '',
   `area_code` int COMMENT '',
   `gmt_offset` string COMMENT '',
   `in_dst` string COMMENT '',
   `proxy_type` string COMMENT '',
   `mobile_carrier` string COMMENT '',
   `pop` string COMMENT '',
   `hostname` string COMMENT '',
   `profile_ttl` string COMMENT '',
   `timestamp_iso` string COMMENT '',
   `reference_id` string COMMENT '',
   `identity_organization` string COMMENT '',
   `identity_method` string COMMENT '',
   `mappable_id` string COMMENT '',
   `profile_expires` string COMMENT '',
   `video_player_iframed` int COMMENT '',
   `video_player_in_view` int COMMENT '',
   `video_player_width` int COMMENT '',
   `video_player_height` int COMMENT '',
   `host_domain` string COMMENT '',
   `browser_type` string COMMENT '',
   `browser_device_cat` string COMMENT '',
   `browser_family` string COMMENT '',
   `browser_name` string COMMENT '',
   `browser_version` string COMMENT '',
   `browser_major_version` string COMMENT '',
   `browser_minor_version` string COMMENT '',
   `os_family` string COMMENT '',
   `os_name` string COMMENT '',
   `os_version` string COMMENT '',
   `os_major_version` string COMMENT '',
   `os_minor_version` string COMMENT '')
 PARTITIONED BY (`dt` timestamp)
 STORED AS PARQUET;
desc formatted amo_bi_events;

The output in hive is as follows:

# col_name              data_type               comment
event_type              string
timestamp               string
event_valid             int
event_subtype           string
user_ip                 string
user_id                 string
cookie_status           string
profile_status          string
user_status             string
previous_timestamp      string
user_agent              string
referer                 string
uri                     string
request_elapsed         bigint
browser_languages       string
acamp_id                int
creative_id             int
location_id             int
pcamp_id                int
pdomain_id              int
country                 string
region                  string
dma                     int
city                    string
zip                     string
isp                     string
line_speed              string
gender                  string
year_of_birth           int
behaviors_read          string
behaviors_written       string
key_value_pairs         string
acamp_candidates        int
tag_format              string
optimizer_name          string
optimizer_version       string
optimizer_ip            string
pixel_id                int
video_id                string
video_network_id        int
video_time_watched      bigint
video_percentage_watched        int
conversion_valid_sale   int
conversion_sale_amount  float
conversion_commission_amount    float
conversion_step         int
conversion_currency     string
conversion_attribution  int
conversion_offer_id     string
custom_info             string
frequency               int
recency_seconds         int
cost                    float
revenue                 float
optimizer_acamp_id      int
optimizer_creative_id   int
optimizer_ecpm          float
event_id                string
impression_id           string
diagnostic_data         string
user_profile_mapping_source     string
latitude                float
longitude               float
area_code               int
gmt_offset              string
in_dst                  string
proxy_type              string
mobile_carrier          string
pop                     string
hostname                string
profile_ttl             string
timestamp_iso           string
reference_id            string
identity_organization   string
identity_method         string
mappable_id             string
profile_expires         string
video_player_iframed    int
video_player_in_view    int
video_player_width      int
video_player_height     int
host_domain             string
browser_type            string
browser_device_cat      string
browser_family          string
browser_name            string
browser_version         string
browser_major_version   string
browser_minor_version   string
os_family               string
os_name                 string
os_version              string
os_major_version        string
os_minor_version        string
# Partition Information
# col_name              data_type               comment
dt                      timestamp
# Detailed Table Information
Database:               test
Owner:                  hduser
CreateTime:             Fri Jun 03 19:03:20 BST 2016
LastAccessTime:         UNKNOWN
Retention:              0
Location:
hdfs://rhes564:9000/user/hive/warehouse/test.db/amo_bi_events
Table Type:             EXTERNAL_TABLE
Table Parameters:
        EXTERNAL                TRUE
        transient_lastDdlTime   1464977000
# Storage Information
SerDe Library:
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
Time taken: 0.397 seconds, Fetched: 124 row(s)

So effectively that table is partitioned by dt in notime

Now what I don't understand whether that table is already partitioned as
you said the table already exists!

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 3 June 2016 at 18:48, Benjamin Kim <bbuil...@gmail.com> wrote:

> Mich,
>
> I am using .withColumn to add another column “dt” that is a reformatted
> version of an existing column “timestamp”. The partitioned by column is
> “dt”.
>
> We are using Spark 1.6.0 in CDH 5.7.0.
>
> Thanks,
> Ben
>
> On Jun 3, 2016, at 10:33 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> what version of spark are you using
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 3 June 2016 at 17:51, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> ok what is the new column is called? you are basically adding a new
>> column to an already existing table
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 3 June 2016 at 17:04, Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>>> The table already exists.
>>>
>>>  CREATE EXTERNAL TABLE `amo_bi_events`(
>>>    `event_type` string COMMENT '',
>>>
>>>    `timestamp` string COMMENT '',
>>>
>>>    `event_valid` int COMMENT '',
>>>
>>>    `event_subtype` string COMMENT '',
>>>
>>>    `user_ip` string COMMENT '',
>>>
>>>    `user_id` string COMMENT '',
>>>
>>>    `cookie_status` string COMMENT '',
>>>
>>>    `profile_status` string COMMENT '',
>>>
>>>    `user_status` string COMMENT '',
>>>
>>>    `previous_timestamp` string COMMENT '',
>>>
>>>    `user_agent` string COMMENT '',
>>>
>>>    `referer` string COMMENT '',
>>>
>>>    `uri` string COMMENT '',
>>>
>>>    `request_elapsed` bigint COMMENT '',
>>>
>>>    `browser_languages` string COMMENT '',
>>>
>>>    `acamp_id` int COMMENT '',
>>>
>>>    `creative_id` int COMMENT '',
>>>
>>>    `location_id` int COMMENT '',
>>>
>>>    `pcamp_id` int COMMENT '',
>>>
>>>    `pdomain_id` int COMMENT '',
>>>
>>>    `country` string COMMENT '',
>>>
>>>    `region` string COMMENT '',
>>>
>>>    `dma` int COMMENT '',
>>>
>>>    `city` string COMMENT '',
>>>
>>>    `zip` string COMMENT '',
>>>
>>>    `isp` string COMMENT '',
>>>
>>>    `line_speed` string COMMENT '',
>>>
>>>    `gender` string COMMENT '',
>>>
>>>    `year_of_birth` int COMMENT '',
>>>
>>>    `behaviors_read` string COMMENT '',
>>>
>>>    `behaviors_written` string COMMENT '',
>>>
>>>    `key_value_pairs` string COMMENT '',
>>>
>>>    `acamp_candidates` int COMMENT '',
>>>
>>>    `tag_format` string COMMENT '',
>>>
>>>    `optimizer_name` string COMMENT '',
>>>
>>>    `optimizer_version` string COMMENT '',
>>>
>>>    `optimizer_ip` string COMMENT '',
>>>
>>>    `pixel_id` int COMMENT '',
>>>
>>>    `video_id` string COMMENT '',
>>>
>>>    `video_network_id` int COMMENT '',
>>>
>>>    `video_time_watched` bigint COMMENT '',
>>>
>>>    `video_percentage_watched` int COMMENT '',
>>>
>>>    `conversion_valid_sale` int COMMENT '',
>>>
>>>    `conversion_sale_amount` float COMMENT '',
>>>
>>>    `conversion_commission_amount` float COMMENT '',
>>>
>>>    `conversion_step` int COMMENT '',
>>>
>>>    `conversion_currency` string COMMENT '',
>>>
>>>    `conversion_attribution` int COMMENT '',
>>>
>>>    `conversion_offer_id` string COMMENT '',
>>>
>>>    `custom_info` string COMMENT '',
>>>
>>>    `frequency` int COMMENT '',
>>>
>>>    `recency_seconds` int COMMENT '',
>>>
>>>    `cost` float COMMENT '',
>>>
>>>    `revenue` float COMMENT '',
>>>
>>>    `optimizer_acamp_id` int COMMENT '',
>>>
>>>    `optimizer_creative_id` int COMMENT '',
>>>
>>>    `optimizer_ecpm` float COMMENT '',
>>>
>>>    `event_id` string COMMENT '',
>>>
>>>    `impression_id` string COMMENT '',
>>>
>>>    `diagnostic_data` string COMMENT '',
>>>
>>>    `user_profile_mapping_source` string COMMENT '',
>>>
>>>    `latitude` float COMMENT '',
>>>
>>>    `longitude` float COMMENT '',
>>>
>>>    `area_code` int COMMENT '',
>>>
>>>    `gmt_offset` string COMMENT '',
>>>
>>>    `in_dst` string COMMENT '',
>>>
>>>    `proxy_type` string COMMENT '',
>>>
>>>    `mobile_carrier` string COMMENT '',
>>>
>>>    `pop` string COMMENT '',
>>>
>>>    `hostname` string COMMENT '',
>>>
>>>    `profile_ttl` string COMMENT '',
>>>
>>>    `timestamp_iso` string COMMENT '',
>>>
>>>    `reference_id` string COMMENT '',
>>>
>>>    `identity_organization` string COMMENT '',
>>>
>>>    `identity_method` string COMMENT '',
>>>
>>>    `mappable_id` string COMMENT '',
>>>
>>>    `profile_expires` string COMMENT '',
>>>
>>>    `video_player_iframed` int COMMENT '',
>>>
>>>    `video_player_in_view` int COMMENT '',
>>>
>>>    `video_player_width` int COMMENT '',
>>>
>>>    `video_player_height` int COMMENT '',
>>>
>>>    `host_domain` string COMMENT '',
>>>
>>>    `browser_type` string COMMENT '',
>>>
>>>    `browser_device_cat` string COMMENT '',
>>>
>>>    `browser_family` string COMMENT '',
>>>
>>>    `browser_name` string COMMENT '',
>>>
>>>    `browser_version` string COMMENT '',
>>>
>>>    `browser_major_version` string COMMENT '',
>>>
>>>    `browser_minor_version` string COMMENT '',
>>>
>>>    `os_family` string COMMENT '',
>>>
>>>    `os_name` string COMMENT '',
>>>
>>>    `os_version` string COMMENT '',
>>>
>>>    `os_major_version` string COMMENT '',
>>>
>>>    `os_minor_version` string COMMENT '')
>>>
>>>  PARTITIONED BY (`dt` timestamp)
>>>
>>>  STORED AS PARQUET;
>>>
>>> Thanks,
>>> Ben
>>>
>>>
>>> On Jun 3, 2016, at 8:47 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>> hang on are you saving this as a new table?
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 3 June 2016 at 14:13, Benjamin Kim <bbuil...@gmail.com> wrote:
>>>
>>>> Does anyone know how to save data in a DataFrame to a table partitioned
>>>> using an existing column reformatted into a derived column?
>>>>
>>>>                 val partitionedDf = df.withColumn("dt",
>>>> concat(substring($"timestamp", 1, 10), lit(" "), substring($"timestamp",
>>>> 12, 2), lit(":00")))
>>>>
>>>>                 sqlContext.setConf("hive.exec.dynamic.partition",
>>>> "true")
>>>>                 sqlContext.setConf("hive.exec.dynamic.partition.mode",
>>>> "nonstrict")
>>>>                 partitionedDf.write
>>>>                     .mode(SaveMode.Append)
>>>>                     .partitionBy("dt")
>>>>                     .saveAsTable("ds.amo_bi_events")
>>>>
>>>> I am getting an ArrayOutOfBounds error. There are 83 columns in the
>>>> destination table. But after adding the derived column, then I get an 84
>>>> error. I assumed that the column used for the partition would not be
>>>> counted.
>>>>
>>>> Can someone please help.
>>>>
>>>> Thanks,
>>>> Ben
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>>
>>
>
>

Reply via email to