[jira] [Created] (HIVE-4475) Switch RCFile default to LazyBinaryColumnarSerDe

2013-05-02 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-4475:


 Summary: Switch RCFile default to LazyBinaryColumnarSerDe
 Key: HIVE-4475
 URL: https://issues.apache.org/jira/browse/HIVE-4475
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner


For most workloads it seems LazyBinaryColumnarSerDe (binary) will perform 
better than ColumnarSerDe (text). Not sure why ColumnarSerDe is the default, 
but my guess is, that's for historical reasons. I suggest switching the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4440) SMB Operator spills to disk like it's 1999

2013-05-02 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4440:
-

Attachment: HIVE-4440.2.patch

 SMB Operator spills to disk like it's 1999
 --

 Key: HIVE-4440
 URL: https://issues.apache.org/jira/browse/HIVE-4440
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-4440.1.patch, HIVE-4440.2.patch


 I was recently looking into some performance issue with a query that used SMB 
 join and was running really slow. Turns out that the SMB join by default 
 caches only 100 values per key before spilling to disk. That seems overly 
 conservative to me. Changing the parameter resulted in a ~5x speedup - quite 
 significant.
 The parameter is: hive.mapjoin.bucket.cache.size
 Which right now is only used the SMB Operator as far as I can tell.
 The parameter was introduced originally (3 yrs ago) for the map join operator 
 (looks like pre-SMB) and set to 100 to avoid OOM. That seems to have been in 
 a different context though where you had to avoid running out of memory with 
 the cached hash table in the same process, I think.
 Two things I'd like to propose:
 a) Rename it to what it does: hive.smbjoin.cache.rows
 b) Set it to something less restrictive: 1
 If you string together a 5 table smb join with a map join and a map-side 
 group by aggregation you might still run out of memory, but the renamed 
 parameter should be easier to find and reduce. For most queries, I would 
 think that 1 is still a reasonable number to cache (On the reduce side we 
 use 25000 for shuffle joins).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4440) SMB Operator spills to disk like it's 1999

2013-05-02 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647395#comment-13647395
 ] 

Gunther Hagleitner commented on HIVE-4440:
--

Thanks :-)

Patch .2 honors the old parameter unless it's at the default in which case it 
uses the new one. I also put documentation around it. 

You bring up a good point, but are you sure it's necessary to support both in 
this case though? It's just slightly ugly in the code and requires us to move 
in again to remove later. My thinking is this: If you use the old parameter, 
it's probably because you needed to up it to get better performance - in this 
case the new default should most likely be ok for you. Do you think there's 
going to be cases where this falls flat?

 SMB Operator spills to disk like it's 1999
 --

 Key: HIVE-4440
 URL: https://issues.apache.org/jira/browse/HIVE-4440
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-4440.1.patch, HIVE-4440.2.patch


 I was recently looking into some performance issue with a query that used SMB 
 join and was running really slow. Turns out that the SMB join by default 
 caches only 100 values per key before spilling to disk. That seems overly 
 conservative to me. Changing the parameter resulted in a ~5x speedup - quite 
 significant.
 The parameter is: hive.mapjoin.bucket.cache.size
 Which right now is only used the SMB Operator as far as I can tell.
 The parameter was introduced originally (3 yrs ago) for the map join operator 
 (looks like pre-SMB) and set to 100 to avoid OOM. That seems to have been in 
 a different context though where you had to avoid running out of memory with 
 the cached hash table in the same process, I think.
 Two things I'd like to propose:
 a) Rename it to what it does: hive.smbjoin.cache.rows
 b) Set it to something less restrictive: 1
 If you string together a 5 table smb join with a map join and a map-side 
 group by aggregation you might still run out of memory, but the renamed 
 parameter should be easier to find and reduce. For most queries, I would 
 think that 1 is still a reasonable number to cache (On the reduce side we 
 use 25000 for shuffle joins).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-335) External Tables should have the option to be marked Read Only

2013-05-02 Thread Michael Koehnlein (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647542#comment-13647542
 ] 

Michael Koehnlein commented on HIVE-335:


This would be useful for me, too. We have data on HDFS that belongs to a system 
user account, and our normal users should be able analyze it as an external 
table. As it is now, the users would need HDFS write permissions on the data 
directory if they want to create an external table for that directory 
themselves, although they really only need read permissions. Of course that's 
not a big obstacle, since we can just let the system user create the external 
table. It certainly would be nice to get pure read access via external tables, 
though.

 External Tables should have the option to be marked Read Only
 -

 Key: HIVE-335
 URL: https://issues.apache.org/jira/browse/HIVE-335
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Query Processor
Reporter: Richard Lee

 When creating an External Table, it'd be awesome to have the option of NOT 
 allowing writes to it (disallow any INSERTs or if hive ever allows UPDATEs).  
 Adding and Dropping Partitions should still be allowed.
 This will enable hive to play well with external data stores other than 
 hdfs where data should be non-maleable.
 I'd recomend the following syntax, which applies ONLY to external tables:
 CREATE EXTERNAL [READONLY] TABLE ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4471) Build fails with hcatalog checkstyle error

2013-05-02 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647646#comment-13647646
 ] 

Ashutosh Chauhan commented on HIVE-4471:


+1. [~traviscrawford] would you like to take a look?

 Build fails with hcatalog checkstyle error
 --

 Key: HIVE-4471
 URL: https://issues.apache.org/jira/browse/HIVE-4471
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-4471.1.patch, HIVE-4471.2.patch


 This is the output:
 checkstyle:
  [echo] hcatalog
 [checkstyle] Running Checkstyle 5.5 on 412 files
 [checkstyle] 
 /home/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/hcatalog/src/test/.gitignore:1:
  Missing a header - not enough lines in file.
 BUILD FAILED
 /home/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/build.xml:296: 
 The following error occurred while executing this line:
 /home/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/build.xml:298: 
 The following error occurred while executing this line:
 /home/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/hcatalog/build.xml:109:
  The following error occurred while executing this line:
 /home/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/hcatalog/build-support/ant/checkstyle.xml:32:
  Got 1 errors and 0 warnings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4421) Improve memory usage by ORC dictionaries

2013-05-02 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647660#comment-13647660
 ] 

Phabricator commented on HIVE-4421:
---

ashutoshc has accepted the revision HIVE-4421 [jira] Improve memory usage by 
ORC dictionaries.

  +1 will commit if tests pass.

REVISION DETAIL
  https://reviews.facebook.net/D10545

BRANCH
  h-4421

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, omalley


 Improve memory usage by ORC dictionaries
 

 Key: HIVE-4421
 URL: https://issues.apache.org/jira/browse/HIVE-4421
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.11.0

 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, 
 HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch


 Currently, for tables with many string columns, it is possible to 
 significantly underestimate the memory used by the ORC dictionaries and cause 
 the query to run out of memory in the task. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4455) HCatalog build directories get included in tar file produced by ant tar

2013-05-02 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4455:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed trunk version as well. Thanks, Alan!

 HCatalog build directories get included in tar file produced by ant tar
 -

 Key: HIVE-4455
 URL: https://issues.apache.org/jira/browse/HIVE-4455
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure, HCatalog
Affects Versions: 0.11.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.11.0

 Attachments: buildbloat.patch, HIVE-4455.patch, HIVE-4455-trunk.patch


 The excludes in the tar target aren't properly excluding the build 
 directories in HCatalog

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4461) hcatalog jars not getting published to maven repo

2013-05-02 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4461:
---

   Resolution: Fixed
Fix Version/s: 0.11.0
   Status: Resolved  (was: Patch Available)

Marking this as resolved, as per Alan's comments.

 hcatalog jars not getting published to maven repo
 -

 Key: HIVE-4461
 URL: https://issues.apache.org/jira/browse/HIVE-4461
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Ashutosh Chauhan
Assignee: Alan Gates
 Fix For: 0.11.0

 Attachments: HIVE-4461.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4392) Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns

2013-05-02 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647690#comment-13647690
 ] 

Ashutosh Chauhan commented on HIVE-4392:


Ok. Lets go ahead with this patch than. [~navis] Do you want to update the 
patch with these tests or shall I go ahead with testing it for commit?

 Illogical InvalidObjectException throwed when use mulit aggregate functions 
 with star columns 
 --

 Key: HIVE-4392
 URL: https://issues.apache.org/jira/browse/HIVE-4392
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Apache Hadoop 0.20.1
 Apache Hive Trunk
Reporter: caofangkun
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4392.D10431.1.patch, HIVE-4392.D10431.2.patch, 
 HIVE-4392.D10431.3.patch, HIVE-4392.D10431.4.patch


 For Example:
 hive (default) create table liza_1 as 
select *, sum(key), sum(value) 
from new_src;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201304191025_0003, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0003
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0003
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 
 1
 2013-04-22 11:09:28,017 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:09:34,054 Stage-1 map = 0%,  reduce = 100%
 2013-04-22 11:09:37,074 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0003
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a 
 valid object name)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 MapReduce Jobs Launched: 
 Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 12 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 hive (default) create table liza_1 as 
select *, sum(key), sum(value) 
from new_src   
group by key, value;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201304191025_0004, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0004
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0004
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 
 1
 2013-04-22 11:11:58,945 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:12:01,964 Stage-1 map = 0%,  reduce = 100%
 2013-04-22 11:12:04,982 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0004
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a 
 valid object name)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 MapReduce Jobs Launched: 
 Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 But the following tow Queries  work:
 hive (default) create table liza_1 as select * from new_src;
 Total MapReduce jobs = 3
 Launching Job 1 out of 3
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201304191025_0006, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0006
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0006
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers:  0
 2013-04-22 11:15:00,681 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:15:03,697 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0006
 Stage-4 is selected by condition resolver.
 Stage-3 is filtered out by condition resolver.
 Stage-5 is filtered out by condition resolver.
 Moving data to: 
 hdfs://hd17-vm5:9101/user/zongren/hive-scratchdir/hive_2013-04-22_11-14-54_632_6709035018023861094/-ext-10001
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 Table default.liza_1 stats: 

[jira] [Resolved] (HIVE-4182) doAS does not work with HiveServer2 in non-kerberos mode with local job

2013-05-02 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4182.


   Resolution: Fixed
Fix Version/s: 0.11.0

Fixed via HIVE-4315

 doAS does not work with HiveServer2 in non-kerberos mode with local job
 ---

 Key: HIVE-4182
 URL: https://issues.apache.org/jira/browse/HIVE-4182
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
  Labels: HiveServer2
 Fix For: 0.11.0

 Attachments: HIVE-4182.1.patch


 When HiveServer2 is configured without kerberos security enabled, and the 
 query gets launched as a local map-reduce job, the job runs as the user hive 
 server is running as , instead of the user who submitted the query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4476) HiveMetaStore caches the creation of a default db in a static way

2013-05-02 Thread Brock Noland (JIRA)
Brock Noland created HIVE-4476:
--

 Summary: HiveMetaStore caches the creation of a default db in a 
static way
 Key: HIVE-4476
 URL: https://issues.apache.org/jira/browse/HIVE-4476
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0, 0.11.0
Reporter: Brock Noland
Priority: Minor


Currently HiveMetaStore.HMSHandler has a static flag set to true if the JVM has 
ever created a default db:

https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L176

However, when testing it's nice to be able to create multiple HiveMetastore 
instances in a single JVM. Perhaps we should add a flag 
hive.metastore.always.create.default.db or something similar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4476) HiveMetaStore caches the creation of a default db in a static way

2013-05-02 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647744#comment-13647744
 ] 

Brock Noland commented on HIVE-4476:


perhaps the use of checkForDefaultDb in that class just needs to be modified.

 HiveMetaStore caches the creation of a default db in a static way
 -

 Key: HIVE-4476
 URL: https://issues.apache.org/jira/browse/HIVE-4476
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.10.0, 0.11.0
Reporter: Brock Noland
Priority: Minor

 Currently HiveMetaStore.HMSHandler has a static flag set to true if the JVM 
 has ever created a default db:
 https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L176
 However, when testing it's nice to be able to create multiple HiveMetastore 
 instances in a single JVM. Perhaps we should add a flag 
 hive.metastore.always.create.default.db or something similar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4474) Column access not tracked properly for partitioned tables

2013-05-02 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647763#comment-13647763
 ] 

Gang Tim Liu commented on HIVE-4474:


running test.

 Column access not tracked properly for partitioned tables
 -

 Key: HIVE-4474
 URL: https://issues.apache.org/jira/browse/HIVE-4474
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
 Attachments: HIVE-4474.1.patch.txt


 The columns recorded as being accessed is incorrect for partitioned tables. 
 The index of accessed columns is a position in the list of non-partition 
 columns, but a list of all columns is being used right now to do the lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4477) remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic

2013-05-02 Thread Eric Hanson (JIRA)
Eric Hanson created HIVE-4477:
-

 Summary: remove redundant copy of arithmetic filter unit test 
testColOpScalarNumericFilterNullAndRepeatingLogic
 Key: HIVE-4477
 URL: https://issues.apache.org/jira/browse/HIVE-4477
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson


same test got ported to 2 different files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4448) Fix metastore warehouse incorrect location on Windows in unit tests

2013-05-02 Thread Shuaishuai Nie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuaishuai Nie updated HIVE-4448:
-

Summary: Fix metastore warehouse incorrect location on Windows in unit 
tests  (was: Fix metastore warehouse incorrect path on Windows in unit tests)

 Fix metastore warehouse incorrect location on Windows in unit tests
 ---

 Key: HIVE-4448
 URL: https://issues.apache.org/jira/browse/HIVE-4448
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.11.0
 Environment: Windows
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-4448.1.patch


 Unit test cases which not using QTestUtil will pass incompatible Windows path 
 of METASTOREWAREHOUSE to HiveConf which result in creating the 
 /test/data/warehouse folder in the wrong location in Windows. This folder 
 will not be deleted at the beginning of the unit test and the content will 
 cause failure of unit tests if run the same test case repeatedly. The root 
 cause of this problem is for path like this 
 pfile://C:\hive\build\ql/test/data/warehouse, the C:\hive\build\ part 
 will be parsed as authority of the path and removed from the path string. The 
 patch will fix this problem and make the unit test result consistent between 
 Windows and Linux.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4477) remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic

2013-05-02 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4477:
--

Attachment: HIVE-4477.1.patch

 remove redundant copy of arithmetic filter unit test 
 testColOpScalarNumericFilterNullAndRepeatingLogic
 --

 Key: HIVE-4477
 URL: https://issues.apache.org/jira/browse/HIVE-4477
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4477.1.patch


 same test got ported to 2 different files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-02 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Attachment: HIVE-3959.patch.11.txt

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.2, HIVE-3959.patch.9.txt


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic

2013-05-02 Thread Eric Hanson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10906/
---

Review request for hive.


Description
---

remove redundant copy of arithmetic filter unit test 
testColOpScalarNumericFilterNullAndRepeatingLogic


This addresses bug HIVE-4477.
https://issues.apache.org/jira/browse/HIVE-4477


Diffs
-

  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorFilterOperator.java 
3ad6c7f 

Diff: https://reviews.apache.org/r/10906/diff/


Testing
---


Thanks,

Eric Hanson



[jira] [Updated] (HIVE-4477) remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic

2013-05-02 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4477:
--

Status: Patch Available  (was: Open)

 remove redundant copy of arithmetic filter unit test 
 testColOpScalarNumericFilterNullAndRepeatingLogic
 --

 Key: HIVE-4477
 URL: https://issues.apache.org/jira/browse/HIVE-4477
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4477.1.patch


 same test got ported to 2 different files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4477) remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic

2013-05-02 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647840#comment-13647840
 ] 

Eric Hanson commented on HIVE-4477:
---

Code review available at https://reviews.apache.org/r/10906/

 remove redundant copy of arithmetic filter unit test 
 testColOpScalarNumericFilterNullAndRepeatingLogic
 --

 Key: HIVE-4477
 URL: https://issues.apache.org/jira/browse/HIVE-4477
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4477.1.patch


 same test got ported to 2 different files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4478) In ORC, add boolean noNulls flag to column stripe metadata

2013-05-02 Thread Eric Hanson (JIRA)
Eric Hanson created HIVE-4478:
-

 Summary: In ORC, add boolean noNulls flag to column stripe metadata
 Key: HIVE-4478
 URL: https://issues.apache.org/jira/browse/HIVE-4478
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Owen O'Malley


Currently, the stripe metadata for ORC contains the min and max value for each 
column in the stripe. This will be used for stripe elimination. However, an 
additional bit of metadata, noNulls (true/false), is needed to help speed up 
vectorized query execution as much as 30%. 

The vectorized QE code has a Boolean flag for each column vector called 
noNulls. If this is true, all the null-checking logic is skipped. For simple 
filters and arithmetic expressions, this can save on the order of 30% of the 
time.

Once this noNulls stripe metadata is available, the vectorized iterator for ORC 
can be updated to avoid all expense to load the isNull bitmap, and efficiently 
set the noNulls flag for each column vector.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4478) In ORC, add boolean noNulls flag to column stripe metadata

2013-05-02 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4478:
--

Description: 
Currently, the stripe metadata for ORC contains the min and max value for each 
column in the stripe. This will be used for stripe elimination. However, an 
additional bit of metadata for each column for each stripe, noNulls 
(true/false), is needed to help speed up vectorized query execution as much as 
30%. 

The vectorized QE code has a Boolean flag for each column vector called 
noNulls. If this is true, all the null-checking logic is skipped for that 
column for a VectorizedRowBatch when an operation is performed on that column. 
For simple filters and arithmetic expressions, this can save on the order of 
30% of the time.

Once this noNulls stripe metadata is available, the vectorized iterator 
(reader) for ORC can be updated to avoid all expense to load the isNull bitmap, 
and efficiently set the noNulls flag for each column vector.

  was:
Currently, the stripe metadata for ORC contains the min and max value for each 
column in the stripe. This will be used for stripe elimination. However, an 
additional bit of metadata, noNulls (true/false), is needed to help speed up 
vectorized query execution as much as 30%. 

The vectorized QE code has a Boolean flag for each column vector called 
noNulls. If this is true, all the null-checking logic is skipped. For simple 
filters and arithmetic expressions, this can save on the order of 30% of the 
time.

Once this noNulls stripe metadata is available, the vectorized iterator for ORC 
can be updated to avoid all expense to load the isNull bitmap, and efficiently 
set the noNulls flag for each column vector.


 In ORC, add boolean noNulls flag to column stripe metadata
 --

 Key: HIVE-4478
 URL: https://issues.apache.org/jira/browse/HIVE-4478
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Owen O'Malley

 Currently, the stripe metadata for ORC contains the min and max value for 
 each column in the stripe. This will be used for stripe elimination. However, 
 an additional bit of metadata for each column for each stripe, noNulls 
 (true/false), is needed to help speed up vectorized query execution as much 
 as 30%. 
 The vectorized QE code has a Boolean flag for each column vector called 
 noNulls. If this is true, all the null-checking logic is skipped for that 
 column for a VectorizedRowBatch when an operation is performed on that 
 column. For simple filters and arithmetic expressions, this can save on the 
 order of 30% of the time.
 Once this noNulls stripe metadata is available, the vectorized iterator 
 (reader) for ORC can be updated to avoid all expense to load the isNull 
 bitmap, and efficiently set the noNulls flag for each column vector.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4376) Document ORC file format in Hive wiki

2013-05-02 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647934#comment-13647934
 ] 

Lefty Leverenz commented on HIVE-4376:
--

Done.  You can find the ORC wikidoc here:  
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC].

It's in the [Language 
Manual|https://cwiki.apache.org/confluence/display/Hive/LanguageManual] under a 
stub for File Formats.  Information about other file formats would also be 
helpful.

 Document ORC file format in Hive wiki
 -

 Key: HIVE-4376
 URL: https://issues.apache.org/jira/browse/HIVE-4376
 Project: Hive
  Issue Type: Bug
  Components: Documentation, Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Lefty Leverenz
Assignee: Lefty Leverenz
  Labels: wiki

 Add a wiki documenting the Optimized Row Columnar file format for Hive 
 release 0.11 ([HIVE-3874|https://issues.apache.org/jira/browse/HIVE-3874]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Need to track docs for future releases

2013-05-02 Thread Lefty Leverenz
Now that all the Hive docs are in the wiki, we can't commit new
documentation to trunk or branch.  But we don't want to add docs to the
wiki prematurely, so there's an increased likelihood that we'll lose track
of some doc requirements for future releases.  Does anyone know of a good
way to ensure that no doc gets left behind?

One possibility is to use labels on JIRAs that need future documentation.
 When HIVE-# gets committed with Fix in 0.12 and still needs docs, it
would get a label such as doc-needed-v0.12 which can be used to find all
the doc requirements at release time.

That might be the simplest solution, although I see two problems:  if the
fix number gets changed, the label has to change too; and sometimes people
enter a label that seems right to them but doesn't match exactly.

Another possibility is to use JIRAs, either adding a child JIRA for each
closed JIRA that still needs doc or using an umbrella JIRA for each
upcoming release.

An ideal solution would automatically spew out a list of JIRAS that need
docs for a given release number, either on request or when the release
happens.  Is that technically possible?

– Lefty


[jira] [Commented] (HIVE-4466) Fix continue.on.failure in unit tests to -well- continue on failure in unit tests

2013-05-02 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647991#comment-13647991
 ] 

Ashutosh Chauhan commented on HIVE-4466:


+1 will commit if tests pass.

 Fix continue.on.failure in unit tests to -well- continue on failure in unit 
 tests
 -

 Key: HIVE-4466
 URL: https://issues.apache.org/jira/browse/HIVE-4466
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-4466.1.patch


 continue.on.failure is no longer hooked up to anything in the build scripts. 
 more importantly, the only choice right now is to continue through a module 
 and then fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4479) Child expressions are not being evaluated hierarchically in a few templates.

2013-05-02 Thread Jitendra Nath Pandey (JIRA)
Jitendra Nath Pandey created HIVE-4479:
--

 Summary: Child expressions are not being evaluated hierarchically 
in a few templates.
 Key: HIVE-4479
 URL: https://issues.apache.org/jira/browse/HIVE-4479
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey


FilterColumnCompareColumn.txt, FilterStringColumnCompareScalar.txt and 
ScalarArithmeticColumn.txt are not evaluating the child expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4479) Child expressions are not being evaluated hierarchically in a few templates.

2013-05-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4479:
---

Attachment: HIVE-4479.1.patch

 Child expressions are not being evaluated hierarchically in a few templates.
 

 Key: HIVE-4479
 URL: https://issues.apache.org/jira/browse/HIVE-4479
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4479.1.patch


 FilterColumnCompareColumn.txt, FilterStringColumnCompareScalar.txt and 
 ScalarArithmeticColumn.txt are not evaluating the child expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4480) Implement partition support for vectorized query execution

2013-05-02 Thread Sarvesh Sakalanaga (JIRA)
Sarvesh Sakalanaga created HIVE-4480:


 Summary: Implement partition support for vectorized query execution
 Key: HIVE-4480
 URL: https://issues.apache.org/jira/browse/HIVE-4480
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4480) Implement partition support for vectorized query execution

2013-05-02 Thread Sarvesh Sakalanaga (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarvesh Sakalanaga updated HIVE-4480:
-

Description: Add support for eager deserialization of row data using serde 
in the RecordReader layer. Also add support for partitions in this layer so 
that the vectorized batch is populated correctly.

 Implement partition support for vectorized query execution
 --

 Key: HIVE-4480
 URL: https://issues.apache.org/jira/browse/HIVE-4480
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga

 Add support for eager deserialization of row data using serde in the 
 RecordReader layer. Also add support for partitions in this layer so that the 
 vectorized batch is populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4454) Support partitioned tables in vectorized query execution.

2013-05-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey resolved HIVE-4454.


Resolution: Duplicate

Duplicate of HIVE-4480.

 Support partitioned tables in vectorized query execution.
 -

 Key: HIVE-4454
 URL: https://issues.apache.org/jira/browse/HIVE-4454
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey

 Partitioned tables are very common use case. Vectorized code path should 
 support that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4481) Vectorized row batch should be initialized with additional columns to hold intermediate output.

2013-05-02 Thread Jitendra Nath Pandey (JIRA)
Jitendra Nath Pandey created HIVE-4481:
--

 Summary: Vectorized row batch should be initialized with 
additional columns to hold intermediate output.
 Key: HIVE-4481
 URL: https://issues.apache.org/jira/browse/HIVE-4481
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey


Vectorized row batch should be initialized with additional columns to hold 
intermediate output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4481) Vectorized row batch should be initialized with additional columns to hold intermediate output.

2013-05-02 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4481:
---

Attachment: HIVE-4481.1.patch

 Vectorized row batch should be initialized with additional columns to hold 
 intermediate output.
 ---

 Key: HIVE-4481
 URL: https://issues.apache.org/jira/browse/HIVE-4481
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4481.1.patch


 Vectorized row batch should be initialized with additional columns to hold 
 intermediate output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4479) Child expressions are not being evaluated hierarchically in a few templates.

2013-05-02 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648034#comment-13648034
 ] 

Jitendra Nath Pandey commented on HIVE-4479:


Review board entry: https://reviews.apache.org/r/10908/

 Child expressions are not being evaluated hierarchically in a few templates.
 

 Key: HIVE-4479
 URL: https://issues.apache.org/jira/browse/HIVE-4479
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4479.1.patch


 FilterColumnCompareColumn.txt, FilterStringColumnCompareScalar.txt and 
 ScalarArithmeticColumn.txt are not evaluating the child expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4482) Template file VectorUDAFAvg.txt missing from public branch; CodeGen.java fails

2013-05-02 Thread Eric Hanson (JIRA)
Eric Hanson created HIVE-4482:
-

 Summary: Template file VectorUDAFAvg.txt missing from public 
branch; CodeGen.java fails
 Key: HIVE-4482
 URL: https://issues.apache.org/jira/browse/HIVE-4482
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Remus Rusanu


In vectorization branch, file
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFAvg.txt
is missing. So CodeGen.java doesn't run to completion, because it references 
that file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4392) Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns

2013-05-02 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4392:
--

Attachment: HIVE-4392.D10431.5.patch

navis updated the revision HIVE-4392 [jira] Illogical InvalidObjectException 
throwed when use mulit aggregate functions with star columns.

  Added tests

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D10431

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D10431?vs=33177id=33285#toc

AFFECTED FILES
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/test/queries/clientpositive/ctas_colname.q
  ql/src/test/results/clientpositive/ctas_colname.q.out

To: JIRA, ashutoshc, navis
Cc: hbutani


 Illogical InvalidObjectException throwed when use mulit aggregate functions 
 with star columns 
 --

 Key: HIVE-4392
 URL: https://issues.apache.org/jira/browse/HIVE-4392
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Apache Hadoop 0.20.1
 Apache Hive Trunk
Reporter: caofangkun
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4392.D10431.1.patch, HIVE-4392.D10431.2.patch, 
 HIVE-4392.D10431.3.patch, HIVE-4392.D10431.4.patch, HIVE-4392.D10431.5.patch


 For Example:
 hive (default) create table liza_1 as 
select *, sum(key), sum(value) 
from new_src;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201304191025_0003, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0003
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0003
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 
 1
 2013-04-22 11:09:28,017 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:09:34,054 Stage-1 map = 0%,  reduce = 100%
 2013-04-22 11:09:37,074 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0003
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a 
 valid object name)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 MapReduce Jobs Launched: 
 Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 12 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 hive (default) create table liza_1 as 
select *, sum(key), sum(value) 
from new_src   
group by key, value;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201304191025_0004, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0004
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0004
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 
 1
 2013-04-22 11:11:58,945 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:12:01,964 Stage-1 map = 0%,  reduce = 100%
 2013-04-22 11:12:04,982 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0004
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a 
 valid object name)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 MapReduce Jobs Launched: 
 Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 But the following tow Queries  work:
 hive (default) create table liza_1 as select * from new_src;
 Total MapReduce jobs = 3
 Launching Job 1 out of 3
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201304191025_0006, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0006
 

[jira] [Commented] (HIVE-4392) Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns

2013-05-02 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648069#comment-13648069
 ] 

Navis commented on HIVE-4392:
-

Added tests. Not changed aggregation columns.

 Illogical InvalidObjectException throwed when use mulit aggregate functions 
 with star columns 
 --

 Key: HIVE-4392
 URL: https://issues.apache.org/jira/browse/HIVE-4392
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Apache Hadoop 0.20.1
 Apache Hive Trunk
Reporter: caofangkun
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4392.D10431.1.patch, HIVE-4392.D10431.2.patch, 
 HIVE-4392.D10431.3.patch, HIVE-4392.D10431.4.patch, HIVE-4392.D10431.5.patch


 For Example:
 hive (default) create table liza_1 as 
select *, sum(key), sum(value) 
from new_src;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201304191025_0003, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0003
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0003
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 
 1
 2013-04-22 11:09:28,017 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:09:34,054 Stage-1 map = 0%,  reduce = 100%
 2013-04-22 11:09:37,074 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0003
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a 
 valid object name)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 MapReduce Jobs Launched: 
 Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 12 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 hive (default) create table liza_1 as 
select *, sum(key), sum(value) 
from new_src   
group by key, value;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Starting Job = job_201304191025_0004, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0004
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0004
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 
 1
 2013-04-22 11:11:58,945 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:12:01,964 Stage-1 map = 0%,  reduce = 100%
 2013-04-22 11:12:04,982 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0004
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a 
 valid object name)
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 MapReduce Jobs Launched: 
 Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 But the following tow Queries  work:
 hive (default) create table liza_1 as select * from new_src;
 Total MapReduce jobs = 3
 Launching Job 1 out of 3
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201304191025_0006, Tracking URL = 
 http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0006
 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill 
 job_201304191025_0006
 Hadoop job information for Stage-1: number of mappers: 0; number of reducers:  0
 2013-04-22 11:15:00,681 Stage-1 map = 0%,  reduce = 0%
 2013-04-22 11:15:03,697 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201304191025_0006
 Stage-4 is selected by condition resolver.
 Stage-3 is filtered out by condition resolver.
 Stage-5 is filtered out by condition resolver.
 Moving data to: 
 hdfs://hd17-vm5:9101/user/zongren/hive-scratchdir/hive_2013-04-22_11-14-54_632_6709035018023861094/-ext-10001
 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
 Table default.liza_1 stats: [num_partitions: 0, num_files: 0, num_rows: 0, 
 total_size: 0, raw_data_size: 0]
 MapReduce Jobs 

[jira] [Updated] (HIVE-4462) Finish support for modulo (%) operator for vectorized arithmetic

2013-05-02 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4462:
--

Attachment: HIVE-4462.1.patch

 Finish support for modulo (%) operator for vectorized arithmetic
 

 Key: HIVE-4462
 URL: https://issues.apache.org/jira/browse/HIVE-4462
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4462.1.patch


 Support for vectorized modulo (%) is missing in CodeGen.java for several 
 situations, e.g. most ColArithmeticScalar situations. This is to add modulo 
 operator for all necessary situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: finish support for vectorized Modulo (%) operator

2013-05-02 Thread Eric Hanson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10911/
---

Review request for hive.


Description
---

finish support for vectorized Modulo (%) operator


This addresses bug HIVE-4462.
https://issues.apache.org/jira/browse/HIVE-4462


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/DoubleColModuloDoubleColumn.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/DoubleColModuloDoubleScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/DoubleColModuloLongColumn.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/DoubleColModuloLongScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/LongColModuloDoubleColumn.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/LongColModuloDoubleScalar.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java
 9279101 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorScalarColArithmetic.java
 7c8b9c3 

Diff: https://reviews.apache.org/r/10911/diff/


Testing
---


Thanks,

Eric Hanson



[jira] [Commented] (HIVE-4462) Finish support for modulo (%) operator for vectorized arithmetic

2013-05-02 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648083#comment-13648083
 ] 

Eric Hanson commented on HIVE-4462:
---

Code review available at https://reviews.apache.org/r/10911/

 Finish support for modulo (%) operator for vectorized arithmetic
 

 Key: HIVE-4462
 URL: https://issues.apache.org/jira/browse/HIVE-4462
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4462.1.patch


 Support for vectorized modulo (%) is missing in CodeGen.java for several 
 situations, e.g. most ColArithmeticScalar situations. This is to add modulo 
 operator for all necessary situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4462) Finish support for modulo (%) operator for vectorized arithmetic

2013-05-02 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4462:
--

Status: Patch Available  (was: Open)

 Finish support for modulo (%) operator for vectorized arithmetic
 

 Key: HIVE-4462
 URL: https://issues.apache.org/jira/browse/HIVE-4462
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: HIVE-4462.1.patch


 Support for vectorized modulo (%) is missing in CodeGen.java for several 
 situations, e.g. most ColArithmeticScalar situations. This is to add modulo 
 operator for all necessary situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4480) Implement partition support for vectorized query execution

2013-05-02 Thread Sarvesh Sakalanaga (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarvesh Sakalanaga updated HIVE-4480:
-

Attachment: Hive-4480.1.patch

 Implement partition support for vectorized query execution
 --

 Key: HIVE-4480
 URL: https://issues.apache.org/jira/browse/HIVE-4480
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Attachments: Hive-4480.1.patch


 Add support for eager deserialization of row data using serde in the 
 RecordReader layer. Also add support for partitions in this layer so that the 
 vectorized batch is populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4480) Implement partition support for vectorized query execution

2013-05-02 Thread Sarvesh Sakalanaga (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648105#comment-13648105
 ] 

Sarvesh Sakalanaga commented on HIVE-4480:
--

Patch uploaded

 Implement partition support for vectorized query execution
 --

 Key: HIVE-4480
 URL: https://issues.apache.org/jira/browse/HIVE-4480
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Attachments: Hive-4480.1.patch


 Add support for eager deserialization of row data using serde in the 
 RecordReader layer. Also add support for partitions in this layer so that the 
 vectorized batch is populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4483) Input format to read vector data from RC

2013-05-02 Thread Sarvesh Sakalanaga (JIRA)
Sarvesh Sakalanaga created HIVE-4483:


 Summary: Input format to read vector data from RC
 Key: HIVE-4483
 URL: https://issues.apache.org/jira/browse/HIVE-4483
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4484) Current hive is slower than previous versions

2013-05-02 Thread Navis (JIRA)
Navis created HIVE-4484:
---

 Summary: Current hive is slower than previous versions
 Key: HIVE-4484
 URL: https://issues.apache.org/jira/browse/HIVE-4484
 Project: Hive
  Issue Type: Task
 Environment: ubuntu 10.10, 4G, i7-8core
Reporter: Navis


Comparing logs for various patches, I've found query execution become slower 
than before. For example, (picked not-changed tests)

{noformat}
ppr_pushdown.q
135~140 sec : 2012-03-27 ~ 2012-07-17
140~160 sec : ~ 2012-11-28
160~220 sec : ~ 2013-03-30
220~250 src : ~ current (HIVE-4392)

join_nulls.q
295~310 sec : 2012-03-27 ~ 2012-07-17
310~330 sec : ~ 2012-11-28
330~370 sec : ~ 2013-03-30
400~460 src : ~ current (HIVE-4392)
{noformat}

This explains much on recent prolonged test time. It might be from changes on 
test framework. But still need investigation before adding more functionality 
into hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4480) Implement partition support for vectorized query execution

2013-05-02 Thread Sarvesh Sakalanaga (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sarvesh Sakalanaga updated HIVE-4480:
-

Status: Patch Available  (was: Open)

 Implement partition support for vectorized query execution
 --

 Key: HIVE-4480
 URL: https://issues.apache.org/jira/browse/HIVE-4480
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Attachments: Hive-4480.1.patch


 Add support for eager deserialization of row data using serde in the 
 RecordReader layer. Also add support for partitions in this layer so that the 
 vectorized batch is populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4485) beeline prints null as empty strings

2013-05-02 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-4485:
---

 Summary: beeline prints null as empty strings
 Key: HIVE-4485
 URL: https://issues.apache.org/jira/browse/HIVE-4485
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair


 beeline is printing nulls as emtpy strings. 
This is inconsistent with hive cli and other databases, they print null as 
NULL string.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4485) beeline prints null as empty strings

2013-05-02 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4485:


Component/s: HiveServer2

 beeline prints null as empty strings
 

 Key: HIVE-4485
 URL: https://issues.apache.org/jira/browse/HIVE-4485
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair

  beeline is printing nulls as emtpy strings. 
 This is inconsistent with hive cli and other databases, they print null as 
 NULL string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4377) Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)

2013-05-02 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4377:
--

Attachment: HIVE-4377.D10377.2.patch

navis updated the revision HIVE-4377 [jira] Add more comment to 
https://reviews.facebook.net/D1209 (HIVE-2340).

  Added more comments

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D10377

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D10377?vs=32445id=33291#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q
  ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out

To: JIRA, navis
Cc: njain


 Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)
 --

 Key: HIVE-4377
 URL: https://issues.apache.org/jira/browse/HIVE-4377
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Navis
 Attachments: HIVE-4377.D10377.1.patch, HIVE-4377.D10377.2.patch


 thanks a lot for addressing optimization in HIVE-2340. Awesome!
 Since we are developing at a very fast pace, it would be really useful to
 think about maintainability and testing of the large codebase. Highlights 
 which are applicable for D1209:
   1.  Javadoc for all public/private functions, except for
 setters/getters. For any complex function, clear examples (input/output)
 would really help.
   2.  Specially, for query optimizations, it might be a good idea to have
 a simple working query at the top, and the expected changes. For e.g..
 The operator tree for that query at each step, or a detailed explanation
 at the top.
   3.  If possible, the test name (.q file) where the function is being
 invoked, or the query which would potentially test that scenario, if it
 is a query processor change.
   4.  Comments in each test (.q file)­ that should include the jira
 number,  what is it trying to test. Assumptions about each query.
   5.  Reduce the output for each test ­ whenever query is outputting more
 than 10 results, it should have a reason. Otherwise, each query result
 should be bounded by 10 rows.
 thanks a lot

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira