Re: hive query error

2013-08-21 Thread 闫昆
thanks Bing I found it 2013/8/22 Bing Li > By default, hive.log should exist in /tmp/. > Also, it could be set in $HIVE_HOME/conf/hive-log4j.properties and > hive-exec-log4j.properties > - hive.log.dir > - hive.log.file > > > 2013/8/22 闫昆 > >> hi all >> when exec hive query throw exception as

Re: only one mapper

2013-08-21 Thread Rajesh Balamohan
Good to hear that. On Thu, Aug 22, 2013 at 9:02 AM, 闫昆 wrote: > thanks all i move lzo index to hive directory is work fine . > thanks > > > 2013/8/22 Rajesh Balamohan > >> Create the LZO index after moving the file to hive directory (i.e after >> executing your LOAD DATA* statement). Index fi

FileNotFoundException

2013-08-21 Thread 闫昆
hi all i exec hive (zh_site)> select id, >longitude, >latitude, >month, >round(avg(sd), 2) as avgsd > from data_zh > where id between '5' and '6' >and mon

Re: only one mapper

2013-08-21 Thread 闫昆
thanks all i move lzo index to hive directory is work fine . thanks 2013/8/22 Rajesh Balamohan > Create the LZO index after moving the file to hive directory (i.e after > executing your LOAD DATA* statement). Index file is needed only during job > execution and if its not present in the same d

high gc time ,how do tuning?

2013-08-21 Thread ch huang
hive> select page_url,concat_ws('&', map_keys(UNION_MAP(MAP(first_category,'fcategory' as fcategorys , token,concat_ws('&', map_keys(UNION_MAP(MAP(concat(original_category,',',weight),'dummy' as r from media_visit_info group by page_url,token ; # jstat -gcutil 28409 1000 10 S0 S1

Re: only one mapper

2013-08-21 Thread Rajesh Balamohan
Create the LZO index after moving the file to hive directory (i.e after executing your LOAD DATA* statement). Index file is needed only during job execution and if its not present in the same directory, it would not split the large file. On Thu, Aug 22, 2013 at 7:11 AM, 闫昆 wrote: > In hive i u

Re: hive query error

2013-08-21 Thread Bing Li
By default, hive.log should exist in /tmp/. Also, it could be set in $HIVE_HOME/conf/hive-log4j.properties and hive-exec-log4j.properties - hive.log.dir - hive.log.file 2013/8/22 闫昆 > hi all > when exec hive query throw exception as follow > I donnot know where is error log I found $HIVE_HOME/

hive query error

2013-08-21 Thread 闫昆
hi all when exec hive query throw exception as follow I donnot know where is error log I found $HIVE_HOME/ logs not exist Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 3 In order to change the average load for a reducer (in

execute hive query error "Error: GC overhead limit exceeded"

2013-08-21 Thread ch huang
HI,ALL: i execute a query ,but error,any one know what happened? BTW i use yarn framework 2013-08-22 09:47:09,893 Stage-1 map = 28%, reduce = 1%, Cumulative CPU 4140.64 sec 2013-08-22 09:47:10,952 Stage-1 map = 28%, reduce = 1%, Cumulative CPU 4140.72 sec 2013-08-22 09:47:12,008 Stage-1 map = 28

Re: only one mapper

2013-08-21 Thread 闫昆
In hive i use SET mapreduce.input.fileinputformat.split.maxsize=134217728; but not effect and i found when use LOAD DATA INPATH '/data_split/data_rowkey.lzo' OVERWRITE INTO TABLE data_zh The hdfs data move to hive directory i CREATE EXTERNAL TABLE but issue is data_rowkey.lzo.index is also exi

Re: First/last in npath

2013-08-21 Thread Justin Workman
Confirmed, that this does not work. I get the following error " Non-constant expressions for array indexes not supported" FWIW, I think I have written a UDF that will work for what I want. I still have some work to do to make sure it gets and returns the correct data type of the field being retur

Re: First/last in npath

2013-08-21 Thread Justin Workman
I believe I tried that, both in the return argument and the outer query. If memory serves me, I got an error about the array index needing to be a constant value. I will try again when I get back to a computer. Sent from my iPhone On Aug 21, 2013, at 6:48 PM, Harish Butani wrote: Can you try t

Re: First/last in npath

2013-08-21 Thread Harish Butani
Can you try this: select search_terms, productid, clicks_to_product from npath ( on clicks distributed by sessionid sort by timestamp arg1('SEARCH.NOTPRODUCT*.PRODUCT'), arg2('SEARCH'), arg3(page = 'SEARCH'), arg4('PRODUCT'), arg5(p

Re: Last time request for cwiki update privileges

2013-08-21 Thread Ashutosh Chauhan
Not able to find this id in cwiki. Did you create an account on cwiki.apache.org On Wed, Aug 21, 2013 at 2:59 PM, Mikhail Antonov wrote: > mantonov

Re: Last time request for cwiki update privileges

2013-08-21 Thread Mikhail Antonov
mantonov 2013/8/21 Ashutosh Chauhan > Hey Mikhail, > > Sure. Whats ur cwiki id? > > Thanks, > Ashutosh > > > On Wed, Aug 21, 2013 at 1:58 PM, Mikhail Antonov wrote: > >> Can I also get the edit privilege for wiki please? >> >> I'd like to add some details about LDAP authentication.. >> >> Mikha

Re: Last time request for cwiki update privileges

2013-08-21 Thread Ashutosh Chauhan
Hey Mikhail, Sure. Whats ur cwiki id? Thanks, Ashutosh On Wed, Aug 21, 2013 at 1:58 PM, Mikhail Antonov wrote: > Can I also get the edit privilege for wiki please? > > I'd like to add some details about LDAP authentication.. > > Mikhail > > > 2013/8/21 Stephen Sprague > >> Sanjay gets some lo

Re: Last time request for cwiki update privileges

2013-08-21 Thread Mikhail Antonov
Can I also get the edit privilege for wiki please? I'd like to add some details about LDAP authentication.. Mikhail 2013/8/21 Stephen Sprague > Sanjay gets some love after all! :) > > > On Tue, Aug 20, 2013 at 4:00 PM, Sanjay Subramanian < > sanjay.subraman...@wizecommerce.com> wrote: > >> Th

Re: single output file per partition?

2013-08-21 Thread Igor Tatarinov
Actually, using a temp table doesn't work either. Apparently, a single mapper can read from multiple partitions (and output multiple files). There is no way to force a single mapper per partition. On Wed, Aug 21, 2013 at 11:12 AM, Igor Tatarinov wrote: > Using a single bucket per partition seem

Re: First/last in npath

2013-08-21 Thread Justin Workman
Assuming click stream type of data I want to get the search terms from the first search request, and return the product id that was eventually viewed and the number of clicks to the product. So something like this select search_terms, productid, clicks_to_product from npath ( on clicks

Re: single output file per partition?

2013-08-21 Thread Sanjay Subramanian
Hi I tried file crusher with LZO but it does not work….I have LZO correctly configured in production and my jobs are running daily using LZO compression. I like Crusher so I will see why its not working…Thanks to Edward the code is there to tweak :-) and test locally sanjay From: Stephen S

Re: only one mapper

2013-08-21 Thread Sanjay Subramanian
Hi Try this setting in your hive query SET mapreduce.input.fileinputformat.split.maxsize=; If u set this value "low" then the MR job will use this size to split the input LZO files and u will get multiple mappers (and make sure the input LZO files are indexed I.e. .LZO.INDEX files are created)

Re: single output file per partition?

2013-08-21 Thread Stephen Sprague
I see. I'll have to punt then. However, there is an after the fact file crusher Ed Capriolo wrote a while back here: https://github.com/edwardcapriolo/filecrush YMMV On Wed, Aug 21, 2013 at 11:12 AM, Igor Tatarinov wrote: > Using a single bucket per partition seems to create a single reducer

Re: using hive with multiple schemas

2013-08-21 Thread Sanjay Subramanian
Some ideas to get u started CREATE EXTERNAL TABLE IF NOT EXISTS names(fullname STRING,address STRING,phone STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' CREATE EXTERNAL TABLE IF NOT EXISTS names_detail(id BIGINT, fullname STRING,address STRING,gender STRING, phone STRING) ROW FORMAT DE

Re: First/last in npath

2013-08-21 Thread Harish Butani
Can you provide details on what you want to do. You maybe able to express this by stacking queries: execute npath in a SubQuery in the from clause and then do windowing in an outer select. Also you get the 'path' object back from npath, so you can apply array indexing on it. regards, Harish. On

Re: single output file per partition?

2013-08-21 Thread Igor Tatarinov
Using a single bucket per partition seems to create a single reducer which is too slow. I've tried enforcing small files merge but that didn't work. I still got multiple output files. Creating a temp table and then "combining" the multiple files into one using a simple select * is the only option

Re: only one mapper

2013-08-21 Thread pandees waran
Hi Edward, Could yiu please explain this? Snappy + SequenceFile is a better option then LZO. Thanks, Pandeeswaran  — Sent from Mailbox for iPad On Wed, Aug 21, 2013 at

Re: only one mapper

2013-08-21 Thread Edward Capriolo
LZO files are only splittable if you index them. Sequence files compresses with LZO are splittable without being indexed. Snappy + SequenceFile is a better option then LZO. On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov wrote: > LZO files are combinable so check your max split setting. > > ht

Re: only one mapper

2013-08-21 Thread Igor Tatarinov
LZO files are combinable so check your max split setting. http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3c4e328964.7000...@gmail.com%3E igor decide.com On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 wrote: > hi all when i use hive > hive job make only one mapper actually my file split

Re: using hive with multiple schemas

2013-08-21 Thread Stephen Sprague
yeah. database design is always subjective so everybody has an opinion about it. but if you're just starting out i would recommend you kinda follow the rules as you would in a traditional relational database system. so two different datasets would mean two different tables in both Hive and an Rdb d

Re: joining 2 tables

2013-08-21 Thread Stephen Sprague
I'm not sure if you'd call that a join. that just looks like two tables side by side in some random order. the only way to get that (that i can see) is if there is some kind of function between the two "ids" in the two tables. That way you could join on A.id1 = function(B.id2) otherwise the onl

Re: Last time request for cwiki update privileges

2013-08-21 Thread Stephen Sprague
Sanjay gets some love after all! :) On Tue, Aug 20, 2013 at 4:00 PM, Sanjay Subramanian < sanjay.subraman...@wizecommerce.com> wrote: > Thanks Ashutosh > > From: Ashutosh Chauhan mailto:hashut...@apache.org>> > Reply-To: "user@hive.apache.org" < > user@hive.apache.or

Re: single output file per partition?

2013-08-21 Thread Stephen Sprague
hi igor, lots of ideas there! I can't speak for them all but let me confirm first that "cluster by X into 1 bucket" didn't work? I would have thought that would have done it. On Tue, Aug 20, 2013 at 2:29 PM, Igor Tatarinov wrote: > What's the best way to enforce a single output file per par

Re: First/last in npath

2013-08-21 Thread Edward Capriolo
If you can find no open jira issue on this functionality then that means no one is currently working on it. On Wed, Aug 21, 2013 at 1:43 AM, Justin Workman wrote: > When is it expect to support lead/lag/first_value/last_value in the > npath result statement? > > Thanks > > > Sent from my iPhone

using hive with multiple schemas

2013-08-21 Thread Chris Driscol
Hi - I just started to get my feet wet with Hive and have a question that I have not been able to find an answer to.. Suppose I have 2 CSV files: >cat Schema1.csv Name, Address, Phone Chris, address1, 999-999- and >cat Schema2.csv Id, Name, Address, Gender, Phone 13, Tom, address2, male, 888-

Re: No java compiler available exception for HWI

2013-08-21 Thread Edward Capriolo
We rally should pre compile the jsp. There ia a jira on this somewhere. On Tuesday, August 20, 2013, Bing Li wrote: > Hi, Eric et al > Did you resolve this failure? > I'm using Hive-0.11.0, and get the same error when access to HWI via browser. > > I already set the following properties in hive-s

joining 2 tables

2013-08-21 Thread Sandeep Nemuri
Hi all , I want to join two tables ** I have table_A: id1 var1 var2 1 ab 2 cd Table_B: id2 var3 var4 3 ef 4 gh Expected Output is : id1 var1 var2 id2 var3 var4 1 ab 3e f 2 cd 4g h Thanks in advance. -- --Regards Sandeep Nemuri

only one mapper

2013-08-21 Thread 闫昆
hi all when i use hive hive job make only one mapper actually my file split 18 block my block size is 128MB and data size 2GB i use lzo compression and create file.lzo and make index file.lzo.index i use hive 0.10.0 Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to