Re: Access to wiki (documenting locking requirements).

2015-10-22 Thread Lefty Leverenz
Done. Welcome to the Hive wiki team, Elliot! -- Lefty On Thu, Oct 22, 2015 at 5:58 AM, Elliot West wrote: > Hi, > > May I have access to edit the wiki? My confluence user name is 'teabot'. > > I've been looking briefly at ALTER TABLE CONCATENATE and noticed that the > operation isn't listed o

Hive on Spark

2015-10-22 Thread Jone Zhang
1.How can i set Storage Level when i use Hive on Spark? 2.Do Spark have any intention of dynamically determined Hive on MapReduce or Hive on Spark, base on SQL features. Thanks in advance Best regards

Re: Need suggestions on processing JSON junk (e.g., invalid double quotes) data using HIVE

2015-10-22 Thread Sam Joe
Hi, Please see the logs are given below: hive> SELECT t.retweeted_screen_name, >Sum(retweets) AS total_retweets, >Count(*) AS tweet_count > FROM (SELECT retweeted_status.user.screen_name AS retweeted_screen_name, >retweeted_status.text, >

Need suggestions on processing JSON junk (e.g., invalid double quotes) data using HIVE

2015-10-22 Thread Sam Joe
Hi, After streaming twitter data to HDFS using Flume, I'm trying to analyze it using some HIVE queries. The data is in JSON format and not clean having double quotes (") in wrong places causing the HIVE queries to fail. I am getting the following error: Failed with exception java.io.IOException:o

Re: Hive function to convert numeric IP address to "dot" format?

2015-10-22 Thread Vikas Parashar
Hi Mark, You can segregate it at the input level. Could you please explain how your data are ingesting in DB. On Fri, Oct 23, 2015 at 12:14 AM, Mark Sunderlin wrote: > Does hive have a built in function to return a dotted-quad representation > of an IP address given a network address as an int

Hive function to convert numeric IP address to "dot" format?

2015-10-22 Thread Mark Sunderlin
Does hive have a built in function to return a dotted-quad representation of an IP address given a network address as an integer as input? If not, does anyone have SQL they would be willing to share that does this? What I am looking for: My data is in the below "raw" format, I want it in the "d

Re: Hive s3 external table with sub directories

2015-10-22 Thread Sergey Shelukhin
I don’t think Hive picks up partitions automatically in this scenario. Maybe a ticket could be added to add partitions based on some additional syntax, as this seems to be an occasionally used scenario. I’ve seen msck used as a hack to “restore” partitions into metastore (it will find the direct

Re: Hi, Hive People urgent question about [Distribute By] function

2015-10-22 Thread Gopal Vijayaraghavan
> so do you think if we want the same result from Hive and Spark or the >other freamwork, how could we try this one ? There's a special backwards compat slow codepath that gets triggered if you do set mapred.reduce.tasks=199; (or any number) This will produce the exact same hash-code as the jav

Re: Hi, Hive People urgent question about [Distribute By] function

2015-10-22 Thread Philip Lee
Thanks for your help. so do you think if we want the same result from Hive and Spark or the other freamwork, how could we try this one ? could you tell me in detail. Regards, Philip On Thu, Oct 22, 2015 at 6:25 PM, Gopal Vijayaraghavan wrote: > > > When applying [Distribute By] on Hive to the

Re: Hi, Hive People urgent question about [Distribute By] function

2015-10-22 Thread Gopal Vijayaraghavan
> When applying [Distribute By] on Hive to the framework, the function >should be partitionByHash on Flink. This is to spread out all the rows >distributed by a hash key from Object Class in Java. Hive does not use the Object hashCode - the identityHashCode is inconsistent, so Object.hashCode() .

Hi, Hive People urgent question about [Distribute By] function

2015-10-22 Thread Philip Lee
Hello, I am working on Flink and Spark majoring in Computer Science in Berlin. I have the important question. Well, this question is from what I do these days, which is translations Hive Query to Flink. When applying [Distribute By] on Hive to the framework, the function should be partitionByHash

Locking when using the Metastore/HCatalog APIs.

2015-10-22 Thread Elliot West
I notice from the Hive locking wiki page that locks may be acquired for a range of HQL DDL operations. I wanted to know how the locking scheme mapped mapped/employed by equivalent operations in the Metastore and HCatalog APIs. Consider the

Hive s3 external table with sub directories

2015-10-22 Thread Hafiz Mujadid
I have following s3 directory structure. Data/ Year=2015/ Month=01/ Day=01/ files Day=02/ files Month=02/ Day=01/ files Day=02/ files . .

Access to wiki (documenting locking requirements).

2015-10-22 Thread Elliot West
Hi, May I have access to edit the wiki? My confluence user name is 'teabot'. I've been looking briefly at ALTER TABLE CONCATENATE and noticed that the operation isn't listed on the Hive/Locking wiki page even though it acquires an exclusi

the number of files after merging

2015-10-22 Thread patcharee
Hi, I am using alter command below to merge partitioned orc file on one partition: alter table X partition(zone=1,z=1,year=2009,month=1) CONCATENATE; - How can I control the number of files after merging? I would like to get only one file per partition. - Is it possible to concatenate the w

RE: Double quotes in csv data

2015-10-22 Thread michael.england
Hi, Is there an easier way to do this with a Serde? The data volumes could easily reach multi-terabyte level so it would be nice if Hive could handle this. Thanks, Michael From: Vikas Parashar [mailto:para.vi...@gmail.com] Sent: 21 October 2015 16:29 To: user@hive.apache.org Subject: Re: Double