Re: Embedding Hive

2012-04-24 Thread Zizon Qiu
I had try the followings, and it works.. SessionState session = new SessionState(conf);// the confi should have enough informations,such as the access to hive meta server database(the myslq 额头) session.setIsSilent(true); session.setIsVerbose(true); SessionState.start(session); Driver driver = new

Re: Embedding Hive

2012-04-24 Thread Dilip Joseph
You can directly embed the hive client library in your java program, and use it without running a hive service. My blog post at http://csgrad.blogspot.com/2010/04/to-use-language-other-than-java-say.htmldescribes how to run hive queries from Jython. Something very similar should work for Java. D

Embedding Hive

2012-04-24 Thread Vinod Singh
Hello, I would like to embed Hive (client) in my application to execute a sequence of queries. Right now I do it using CLI (hive -f myScript.sql). Problem with this approach is that I do not get an return / error code to know the status of query programmatically. So my question is what is the bes

Facing some problems while running the Hive jobs in Amazon EMR

2012-04-24 Thread Bhavesh Shah
Hello all, I am new to Amazon Services and started to learning the new things about Amazon EMR, S3 and EC2. But as I read about the EMR because I want to deploy my task using Amazon EMR only.But I am facing some problems like while running Sample Hive Program: 1) I created the bucket in which I s

Re: subquery + lateral view fails without count

2012-04-24 Thread Mark Grover
Hi Ruben, Looks like pastie is down (http://pastie.org/) because of recent DDOS attacks. Can you please post your queries elsewhere? Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com e: mgro...@oanda.com "Best Trading Platform" - World Finance

Hive + input format versioning dilemma

2012-04-24 Thread Nerius Landys
Hi, I'm running some Hive jobs on Amazon Elastic MapReduce. The versions for Hive and Hadoop as reported by the instance: hadoop@ip-10-64-33-113:~$ hive --version Hive version 0.7.1.4 hadoop@ip-10-64-33-113:~$ hadoop version Hadoop 0.20.205 I'm able to specify a table and execute a query using

RE: Possible to return different number of columns than parameters specified in custom UDTF?

2012-04-24 Thread Ryabin, Thomas
I figured out how to do this. The problem is that you need to add the number of fields you want to be returned to the StandardStructObjectInspector that is returned in the initialize() method of the UDTF. -Thomas From: Ryabin, Thomas Sent: Tuesday, April 24, 2012 12:46 PM To: user@hive.apa

Re: Doubts related to Amazon EMR

2012-04-24 Thread Mark Grover
Hi Bhavesh, If you copy your jar over to master node of your EMR cluster and install Sqoop like Kyle suggested, you can run your jar on the master node, just like you did on your local cluster before. Just make sure that the Hive Jdbc drivers are available to jar and that you connect to the Hive

question about Hive 'recover partitions' on AWS S3

2012-04-24 Thread Tony Burton
Hi, Is it possible ever to not specify the partition variable name when discovering partitions? I'm sure I've seen this demonstrated but of course when it's needed, I can't find it. Can anyone clarify? I have a number of date-named directories in Amazon AWS S3, containing data stored in sequen

Possible to return different number of columns than parameters specified in custom UDTF?

2012-04-24 Thread Ryabin, Thomas
Hello, I have created a custom UDTF called "test_udtf". This function takes 4 parameters. Right now I can use this function like so in the following query: SELECT test_udtf(product, store, 'test0', 'test1') as (col0, col1, col2, col3) from products join stores; The problem is that I want t

RE: When/how to use partitions and buckets usefully?

2012-04-24 Thread Ruben de Vries
I got the (rather big) log here in a github gist: https://gist.github.com/2480893 And I also attached the plan.xml it was using to the gist. When loading the members_map (11mil records, 320mb, 30b per record), it seems to take about 198b per record in the members_map, resulting in crashing aro

Re: external table on flume log files in S3

2012-04-24 Thread Bejoy KS
Hi Soren If you can collect or order the log files into date based sub dirs in S3. Then you can partition the table based on date. With partitions you can query a subset of your data based on date. You can organize the data into date folders during flume ingestion itself. Regards Bejoy K

external table on flume log files in S3

2012-04-24 Thread Søren
Hi Hive community We are collecting huge amounts of data into Amazon S3 using Flume. In Elastic Mapreduce, we have so far managed to create an external Hive table on JSON formatted gzipped log files in S3 using a customized serde. The log files are collected and stored in one single folder wi

Re: When/how to use partitions and buckets usefully?

2012-04-24 Thread Bejoy Ks
Hi Ruben       The operation you are seeing in your log is preparation of hash table of the smaller table, This hash table file is compressed and loaded into Distributed Cache and from there it is used for map side joins. From your console log the hash table size/data size has gone to nearly 1.5

RE: Possible to use regex column specification with WHERE clause?

2012-04-24 Thread Ryabin, Thomas
This seems like it would work. Thanks, Thomas -Original Message- From: Nitin Pawar [mailto:nitinpawar...@gmail.com] Sent: Tuesday, April 24, 2012 3:31 AM To: user@hive.apache.org Subject: Re: Possible to use regex column specification with WHERE clause? you may want to have a programmat

Re: removing hdfs table data directory does not throw error in hive

2012-04-24 Thread Nitin Pawar
looks like a good use case created and improvement request https://issues.apache.org/jira/browse/HIVE-2980 On Tue, Apr 24, 2012 at 9:10 AM, Sukhendu Chakraborty < sukhendu.chakrabo...@gmail.com> wrote: > Thanks Nitin. I am aware of what Hive is doing. The question is, is it > okay not return an

FW: When/how to use partitions and buckets usefully?

2012-04-24 Thread Ruben de Vries
Here are both tables: $ hdfs -count /user/hive/warehouse/hyves_goldmine.db/members_map    1    1      247231757 hdfs://localhost:54310/user/hive/warehouse/hyves_goldmine.db/members_map $ hdfs -count /user/hive/warehouse/hyves_goldmine.db/visit_stats 442  441 

Re: Possible to use regex column specification with WHERE clause?

2012-04-24 Thread Nitin Pawar
Shashwat, i think he wanted to put a regex in the where clause to derive a column name instead of select clause On Tue, Apr 24, 2012 at 4:45 PM, shashwat shriparv < dwivedishash...@gmail.com> wrote: > Use this to generate probable strings Some examples are here : > > regexp_extract(s, '^([a-z

Re: Possible to use regex column specification with WHERE clause?

2012-04-24 Thread shashwat shriparv
Use this to generate probable strings Some examples are here : regexp_extract(s, '^([a-zA-Z0-9]{2}\.)?(a-zA-Z0-9]{3}-?){3}') select regexp_extract(request, ' (\\S*) HTTP', 1) from logfile; select regexp_extract('junk:text:ua123','ua[0-9]+',0) from dual and pass in your query with or condition.

Re: When/how to use partitions and buckets usefully?

2012-04-24 Thread Nitin Pawar
This operation is erroring out on the hive client itself before starting a map so splitting to mappers is out of question. can you do a dfs count for the members_map table hdfslocation and tell us the result? On Tue, Apr 24, 2012 at 2:06 PM, Ruben de Vries wrote: > Hmm I must be doing something

RE: When/how to use partitions and buckets usefully?

2012-04-24 Thread Ruben de Vries
Hmm I must be doing something wrong, the members_map table is 300ish MB. When I execute the following query: SELECT /*+ MAPJOIN(members_map) */ date_int, members_map.gender AS gender, 'generic', COUNT( memberId ) AS unique, SUM( `generic`['count'] ) AS count, SUM( `gen

Re: When/how to use partitions and buckets usefully?

2012-04-24 Thread Bejoy Ks
Hi Ruben Map join hint is provided to hive using "MAPJOIN" keyword as : SELECT /*+ MAPJOIN(b) */ a.key, a.value FROM a join b on a.key = b.key To use map side join some hive configuration properties needs to be enabled For plain map side joins hive>SET hive.auto.convert.join=true; Latest versions

Re: Lifecycle and Configuration of a hive UDF

2012-04-24 Thread Justin Coffey
Hi Mark, Looks great to me! Thanks for adding it. -Justin On Tue, Apr 24, 2012 at 5:55 AM, Mark Grover wrote: > Added a tiny blurb here: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-UDFinternals > Comments/suggestions welcome! > > Thanks for brin

Re: Possible to use regex column specification with WHERE clause?

2012-04-24 Thread Nitin Pawar
you may want to have a programmatic approach for doing this and provide hive with a final query. You can solve this with either solving your regular expression outside hive paradigm and then provide the query to hive On 4/23/12, Ryabin, Thomas wrote: > Hi, > > > > I know that it is possible to

Re: When/how to use partitions and buckets usefully?

2012-04-24 Thread Nitin Pawar
If you are doing a map side join make sure the table members_map is small enough to hold in memory On 4/24/12, Ruben de Vries wrote: > Wow thanks everyone for the nice feedback! > > I can force a mapside join by doing /*+ STREAMTABLE(members_map) */ right? > > > Cheers, > > Ruben de Vries > > ---