Re: dynamic Partition not splitting properly

Hamza Asad Sun, 16 Jun 2013 06:09:28 -0700

Im trying to create partition table (dynamically) from old non partitioned
table. the query is as follow


*INSERT OVERWRITE TABLE new_events_details Partition (event_date) SELECT
id, event_id, user_id, intval_1, intval_2, intval_3, intval_4, intval_5,
intval_6, intval_7, intval_8, intval_9, intval_10, intval_11, intval_12,
intval_13, intval_14, intval_15, intval_16, intval_17, intval_18,
intval_19, intval_20, intval_21, intval_22, intval_23, intval_24,
intval_25, intval_26 , to_date(event_date) FROM events_details;*

After waiting for more then 2 hours, following exceptions raised and
further executions stops

*spark.SparkException: Job failed: ResultTask(0, 1063) failed:
ExceptionFailure(org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/tmp/hive-hadoop/hive_2013-06-16_15-08-26_985_3160022698353542666/_task_tmp.-ext-10000/event_date=2013-02-22/_tmp.001063_0
could only be replicated to 0 nodes, instead of 1
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
    at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
)
    at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:529)
    at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:527)
    at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:527)
    at
spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:497)
    at spark.scheduler.DAGScheduler.run(DAGScheduler.scala:269)
    at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:90)
FAILED: Execution Error, return code -101 from shark.execution.SparkTask
*
why it giving me exception?


On Fri, Jun 14, 2013 at 1:38 PM, Hamza Asad <[email protected]> wrote:

> ok .. got it.. Thanx :)
> p.s Nitin, have u any idea of indexes? i have emailed issue with subject "
> *index not working*", can u see it please?
>
>
> On Fri, Jun 14, 2013 at 1:30 PM, Nitin Pawar <[email protected]>wrote:
>
>> thats a wrong query
>>
>> insert into table table_name partition (partition_names) select cols,
>> do_data(event_date) from table
>>
>> this is how it should look like
>> hive will take care of inserting into respective partitions after you
>> enable dynamic partitions
>>
>>
>> On Fri, Jun 14, 2013 at 1:21 PM, Hamza Asad <[email protected]>wrote:
>>
>>> i 'm executing following command but it fail to recognize partition
>>> column
>>> *INSERT INTO TABLE rc_partition_table
>>> PARTITION (to_date(event_date))
>>> SELECT * FROM events_details*
>>>
>>>
>>> On Fri, Jun 14, 2013 at 12:41 PM, Nitin Pawar 
>>> <[email protected]>wrote:
>>>
>>>> just use hive split function for strings and get the value extracted.
>>>> by the way I am not sure why the to_date function is failing
>>>> stringto_date(string timestamp)Returns the date part of a timestamp
>>>> string: to_date("1970-01-01 00:00:00") = "1970-01-01"
>>>> also, I am sorry i might have misread your question
>>>> what do you mean by "partition column does not accepts
>>>> to_date(event_date) form "
>>>>
>>>>
>>>>
>>>> On Fri, Jun 14, 2013 at 1:04 PM, Hamza Asad <[email protected]>wrote:
>>>>
>>>>> sample row of my data is
>>>>> *591269735,1,1022,2012-06-24
>>>>> 11:08:10.9,null,2,null,null,null,null,null,null,null,null,12,null,null,2,null,null,null,null,4,1,null,null,null,null,null,null
>>>>> *
>>>>>
>>>>> and i want to partition it according to date i.e *2012-06-24 *skiping
>>>>> the hh:mm:ss.ff part
>>>>>
>>>>>
>>>>> On Fri, Jun 14, 2013 at 12:27 PM, Nitin Pawar <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> can you provide whats your data and what you want it to look like ?
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 14, 2013 at 12:31 PM, Hamza Asad 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> which UDF? it does not take to_date(event_date) column
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 14, 2013 at 11:54 AM, Nitin Pawar <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> use already existing UDFs to split or transform your values the way
>>>>>>>> you want
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 14, 2013 at 12:09 PM, Hamza Asad <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> OIC. I got it. Thanx alot nitin :). One more thing i want to ask
>>>>>>>>> related this issue, if old table contains event_date in format 
>>>>>>>>> "2012-06-24
>>>>>>>>> 06:04:11.9" then how can i partition it according to date part only? 
>>>>>>>>> As
>>>>>>>>> partition column does not accepts to_date(event_date) form.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jun 13, 2013 at 5:07 PM, Nitin Pawar <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> If the input column value is NULL or empty string, the row will be 
>>>>>>>>>> put into a special partition, whose name is controlled by the hive 
>>>>>>>>>> parameter hive.exec.default.dynamic.partition.name. The default 
>>>>>>>>>> value is `__HIVE_DEFAULT_PARTITION__`. Basically this partition will 
>>>>>>>>>> contain all
>>>>>>>>>> "bad" rows whose value are not valid partition names.
>>>>>>>>>>
>>>>>>>>>> so basically you do following things
>>>>>>>>>>
>>>>>>>>>> when you create a partitioned table, your partitioned column is 
>>>>>>>>>> normally at the end of the table, so when you are inserting data 
>>>>>>>>>> into this partitioned table, I would recommend using the column 
>>>>>>>>>> names in place select * from
>>>>>>>>>>
>>>>>>>>>> so your insert query should look like
>>>>>>>>>>
>>>>>>>>>> set hive.exec.dynamic.partition=true;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> set hive.exec.dynamic.partition.mode=nonstrict;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> insert overwrite table new_table partition(event_date) select col1, 
>>>>>>>>>> col2 .... coln, event_date from old_table;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 13, 2013 at 5:24 PM, Hamza Asad <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> when i browse it in browser, all the data is in *
>>>>>>>>>>> event_date=__HIVE_DEFAULT_PARTITION__<http://10.0.0.14:50075/browseDirectory.jsp?dir=%2Fvar%2Flog%2Fpring%2Fhive%2Fwarehouse%2Fnydus.db%2Fnew_rc_partition_cluster_table%2Fevent_date%3D__HIVE_DEFAULT_PARTITION__&namenodeInfoPort=50070>
>>>>>>>>>>> *, rest of the files does not contains data
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 13, 2013 at 4:52 PM, Nitin Pawar <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> what do you mean when you say "it wont split correctly" ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 13, 2013 at 5:19 PM, Hamza Asad <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> what if i have data of more then 500 days then how can i
>>>>>>>>>>>>> create partition on date column by specifying each and every 
>>>>>>>>>>>>> date? (i knw
>>>>>>>>>>>>> that does not happens in dynamic partition but on dynamic 
>>>>>>>>>>>>> partition, it
>>>>>>>>>>>>> wont splits correctly).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 13, 2013 at 4:12 PM, Nitin Pawar <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> you can partition existing table unless the hdfs data is laid
>>>>>>>>>>>>>> out in partitioned fashion.
>>>>>>>>>>>>>> your best bet is create a new partitioned table
>>>>>>>>>>>>>> enable dynamic paritionining
>>>>>>>>>>>>>> read from old table and write into new table
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> you can verify the new partitions by using command "show
>>>>>>>>>>>>>> partitions"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jun 13, 2013 at 4:40 PM, Hamza Asad <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> now i created partition table like
>>>>>>>>>>>>>>> *CREATE TABLE new_rc_partition_cluster_table(
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   id int,
>>>>>>>>>>>>>>>   event_id int,
>>>>>>>>>>>>>>>   user_id BIGINT,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   intval_1 int ,
>>>>>>>>>>>>>>>   intval_2 int,
>>>>>>>>>>>>>>>   intval_3 int,
>>>>>>>>>>>>>>>   intval_4 int,
>>>>>>>>>>>>>>>   intval_5 int,
>>>>>>>>>>>>>>>   intval_6 int,
>>>>>>>>>>>>>>>   intval_7 int,
>>>>>>>>>>>>>>>   intval_8 int,
>>>>>>>>>>>>>>>   intval_9 int,
>>>>>>>>>>>>>>>   intval_10 int,
>>>>>>>>>>>>>>>   intval_11 int,
>>>>>>>>>>>>>>>   intval_12 int,
>>>>>>>>>>>>>>>   intval_13 int,
>>>>>>>>>>>>>>>   intval_14 int,
>>>>>>>>>>>>>>>   intval_15 int,
>>>>>>>>>>>>>>>   intval_16 int,
>>>>>>>>>>>>>>>   intval_17 int,
>>>>>>>>>>>>>>>   intval_18 int,
>>>>>>>>>>>>>>>   intval_19 int,
>>>>>>>>>>>>>>>   intval_20 int,
>>>>>>>>>>>>>>>   intval_21 int,
>>>>>>>>>>>>>>>   intval_22 int,
>>>>>>>>>>>>>>>   intval_23 int,
>>>>>>>>>>>>>>>   intval_24 int,
>>>>>>>>>>>>>>>   intval_25 int,
>>>>>>>>>>>>>>>   intval_26 int)
>>>>>>>>>>>>>>>   PARTITIONED BY (event_date string)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> CLUSTERED BY(id) INTO 256 BUCKETS
>>>>>>>>>>>>>>> ROW FORMAT DELIMITED
>>>>>>>>>>>>>>>  FIELDS TERMINATED BY ','
>>>>>>>>>>>>>>> STORED AS RCFile;
>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> rest of the commands are same. But this time INSERT
>>>>>>>>>>>>>>> OVERWRITE query executed and took time but when i queries from 
>>>>>>>>>>>>>>> that table,
>>>>>>>>>>>>>>> it results none as it does not contains data. why is this so? 
>>>>>>>>>>>>>>> and also
>>>>>>>>>>>>>>> please tell me how can i partition my existing table dynamicaly 
>>>>>>>>>>>>>>> on date so
>>>>>>>>>>>>>>> that data splits equally without mentioning date explicitly?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Jun 12, 2013 at 6:50 PM, Nitin Pawar <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> you did not create partitioned table. You just created a
>>>>>>>>>>>>>>>> bucketed table.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> refer to partitioned table created
>>>>>>>>>>>>>>>> something like
>>>>>>>>>>>>>>>> partitioned by (event_date string)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Jun 12, 2013 at 7:17 PM, Hamza Asad <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> i have created table after enabling dynamic partition. i
>>>>>>>>>>>>>>>>> partitioned it on date but it is not splitting data datewise. 
>>>>>>>>>>>>>>>>> Below is the
>>>>>>>>>>>>>>>>> query of table created and data insert
>>>>>>>>>>>>>>>>> CREATE TABLE rc_partition_cluster_table(
>>>>>>>>>>>>>>>>>   id int,
>>>>>>>>>>>>>>>>>   event_id int,
>>>>>>>>>>>>>>>>>   user_id BIGINT,
>>>>>>>>>>>>>>>>>   event_date string,
>>>>>>>>>>>>>>>>>   intval_1 int )
>>>>>>>>>>>>>>>>> CLUSTERED BY(id) INTO 256 BUCKETS
>>>>>>>>>>>>>>>>> ROW FORMAT DELIMITED
>>>>>>>>>>>>>>>>>  FIELDS TERMINATED BY ','
>>>>>>>>>>>>>>>>> STORED AS RCFile;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> set hive.exec.dynamic.partition=true;
>>>>>>>>>>>>>>>>> set hive.exec.dynamic.partition.mode=nonstrict;
>>>>>>>>>>>>>>>>> set hive.exec.max.dynamic.partitions=1000;
>>>>>>>>>>>>>>>>> set hive.exec.max.dynamic.partitions.pernode=1000;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> INSERT OVERWRITE TABLE rc_partition_cluster_table
>>>>>>>>>>>>>>>>> Partition (event_date)
>>>>>>>>>>>>>>>>> SELECT * FROM events_details;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> why it is not working fine?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> *Muhammad Hamza Asad*
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Nitin Pawar
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> *Muhammad Hamza Asad*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Nitin Pawar
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> *Muhammad Hamza Asad*
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Nitin Pawar
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Muhammad Hamza Asad*
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Nitin Pawar
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Muhammad Hamza Asad*
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Nitin Pawar
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Muhammad Hamza Asad*
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Nitin Pawar
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Muhammad Hamza Asad*
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>>
>>> --
>>> *Muhammad Hamza Asad*
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *Muhammad Hamza Asad*
>



-- 
*Muhammad Hamza Asad*

Re: dynamic Partition not splitting properly

Reply via email to