[jira] [Updated] (HIVE-4485) beeline prints null as empty strings

2013-05-03 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4485:


Attachment: HIVE-4485.1.patch

HIVE-4485.1.patch - initial patch. Makes null string configurable. test needs 
fixing/improvement


 beeline prints null as empty strings
 

 Key: HIVE-4485
 URL: https://issues.apache.org/jira/browse/HIVE-4485
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4485.1.patch


  beeline is printing nulls as emtpy strings. 
 This is inconsistent with hive cli and other databases, they print null as 
 NULL string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions

2013-05-03 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-4486:
--

Summary: FetchOperator slows down SMB map joins by 50% when there are many 
partitions  (was: FetchOperator slows down SMB map joins with many files)

 FetchOperator slows down SMB map joins by 50% when there are many partitions
 

 Key: HIVE-4486
 URL: https://issues.apache.org/jira/browse/HIVE-4486
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC 12.10
Reporter: Gopal V
Priority: Minor

 While looking at log files for SMB joins in hive, it was noticed that the 
 actual join op didn't show up as a significant fraction of the time spent. 
 Most of the time was spent parsing configuration files.
 To confirm, I put log lines in the HiveConf constructor and eventually made 
 the following edit to the code
 {code}
 --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws 
 HiveException {
 * @return list of file status entries
 */
private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws 
 IOException {
 -HiveConf hiveConf = new HiveConf(job, FetchOperator.class);
 -boolean recursive = 
 hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE);
 +boolean recursive = false;
  if (!recursive) {
return fs.listStatus(p);
  }
 {code}
 And re-ran my query to compare timings.
 ||Before||After||
 |Cumulative CPU| 731.07 sec|386.0 sec|
 |Total time | 347.66 seconds | 218.855 seconds | 
 |
 The query used was 
 {code}INSERT OVERWRITE LOCAL DIRECTORY
 '/grid/0/smb/'
 select inv_item_sk
 from
  inventory inv
  join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk)
 limit 10
 ;
 {code}
 On a scale=2 tpcds data-set, where both store_sales  inventory are bucketed 
 into 4 buckets, with store_sales split into 7 partitions and inventory into 
 261 partitions.
 78% of all CPU time was spent within new HiveConf(). The yourkit profiler 
 runs are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4486) FetchOperator slows down SMB map joins with many files

2013-05-03 Thread Gopal V (JIRA)
Gopal V created HIVE-4486:
-

 Summary: FetchOperator slows down SMB map joins with many files
 Key: HIVE-4486
 URL: https://issues.apache.org/jira/browse/HIVE-4486
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC 12.10
Reporter: Gopal V
Priority: Minor


While looking at log files for SMB joins in hive, it was noticed that the 
actual join op didn't show up as a significant fraction of the time spent. Most 
of the time was spent parsing configuration files.

To confirm, I put log lines in the HiveConf constructor and eventually made the 
following edit to the code

{code}
--- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
+++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
@@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws 
HiveException {
* @return list of file status entries
*/
   private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws 
IOException {
-HiveConf hiveConf = new HiveConf(job, FetchOperator.class);
-boolean recursive = 
hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE);
+boolean recursive = false;
 if (!recursive) {
   return fs.listStatus(p);
 }
{code}

And re-ran my query to compare timings.

||Before||After||
|Cumulative CPU| 731.07 sec|386.0 sec|
|Total time | 347.66 seconds | 218.855 seconds | 
|

The query used was 

{code}INSERT OVERWRITE LOCAL DIRECTORY
'/grid/0/smb/'
select inv_item_sk
from
 inventory inv
 join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk)
limit 10
;
{code}

On a scale=2 tpcds data-set, where both store_sales  inventory are bucketed 
into 4 buckets, with store_sales split into 7 partitions and inventory into 261 
partitions.

78% of all CPU time was spent within new HiveConf(). The yourkit profiler runs 
are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions

2013-05-03 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-4486:
--

Attachment: smb-profile.html

attach yourkit profile (HTML)

 FetchOperator slows down SMB map joins by 50% when there are many partitions
 

 Key: HIVE-4486
 URL: https://issues.apache.org/jira/browse/HIVE-4486
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC 12.10
Reporter: Gopal V
Priority: Minor
 Attachments: smb-profile.html


 While looking at log files for SMB joins in hive, it was noticed that the 
 actual join op didn't show up as a significant fraction of the time spent. 
 Most of the time was spent parsing configuration files.
 To confirm, I put log lines in the HiveConf constructor and eventually made 
 the following edit to the code
 {code}
 --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws 
 HiveException {
 * @return list of file status entries
 */
private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws 
 IOException {
 -HiveConf hiveConf = new HiveConf(job, FetchOperator.class);
 -boolean recursive = 
 hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE);
 +boolean recursive = false;
  if (!recursive) {
return fs.listStatus(p);
  }
 {code}
 And re-ran my query to compare timings.
 ||Before||After||
 |Cumulative CPU| 731.07 sec|386.0 sec|
 |Total time | 347.66 seconds | 218.855 seconds | 
 |
 The query used was 
 {code}INSERT OVERWRITE LOCAL DIRECTORY
 '/grid/0/smb/'
 select inv_item_sk
 from
  inventory inv
  join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk)
 limit 10
 ;
 {code}
 On a scale=2 tpcds data-set, where both store_sales  inventory are bucketed 
 into 4 buckets, with store_sales split into 7 partitions and inventory into 
 261 partitions.
 78% of all CPU time was spent within new HiveConf(). The yourkit profiler 
 runs are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions

2013-05-03 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-4486:
--

Description: 
While looking at log files for SMB joins in hive, it was noticed that the 
actual join op didn't show up as a significant fraction of the time spent. Most 
of the time was spent parsing configuration files.

To confirm, I put log lines in the HiveConf constructor and eventually made the 
following edit to the code

{code}
--- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
+++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
@@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws 
HiveException {
* @return list of file status entries
*/
   private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws 
IOException {
-HiveConf hiveConf = new HiveConf(job, FetchOperator.class);
-boolean recursive = 
hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE);
+boolean recursive = false;
 if (!recursive) {
   return fs.listStatus(p);
 }
{code}

And re-ran my query to compare timings.

|| ||Before||After||
|Cumulative CPU| 731.07 sec|386.0 sec|
|Total time | 347.66 seconds | 218.855 seconds | 
|

The query used was 

{code}INSERT OVERWRITE LOCAL DIRECTORY
'/grid/0/smb/'
select inv_item_sk
from
 inventory inv
 join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk)
limit 10
;
{code}

On a scale=2 tpcds data-set, where both store_sales  inventory are bucketed 
into 4 buckets, with store_sales split into 7 partitions and inventory into 261 
partitions.

78% of all CPU time was spent within new HiveConf(). The yourkit profiler runs 
are attached.

  was:
While looking at log files for SMB joins in hive, it was noticed that the 
actual join op didn't show up as a significant fraction of the time spent. Most 
of the time was spent parsing configuration files.

To confirm, I put log lines in the HiveConf constructor and eventually made the 
following edit to the code

{code}
--- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
+++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
@@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws 
HiveException {
* @return list of file status entries
*/
   private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws 
IOException {
-HiveConf hiveConf = new HiveConf(job, FetchOperator.class);
-boolean recursive = 
hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE);
+boolean recursive = false;
 if (!recursive) {
   return fs.listStatus(p);
 }
{code}

And re-ran my query to compare timings.

||Before||After||
|Cumulative CPU| 731.07 sec|386.0 sec|
|Total time | 347.66 seconds | 218.855 seconds | 
|

The query used was 

{code}INSERT OVERWRITE LOCAL DIRECTORY
'/grid/0/smb/'
select inv_item_sk
from
 inventory inv
 join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk)
limit 10
;
{code}

On a scale=2 tpcds data-set, where both store_sales  inventory are bucketed 
into 4 buckets, with store_sales split into 7 partitions and inventory into 261 
partitions.

78% of all CPU time was spent within new HiveConf(). The yourkit profiler runs 
are attached.


 FetchOperator slows down SMB map joins by 50% when there are many partitions
 

 Key: HIVE-4486
 URL: https://issues.apache.org/jira/browse/HIVE-4486
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC 12.10
Reporter: Gopal V
Priority: Minor
 Attachments: smb-profile.html


 While looking at log files for SMB joins in hive, it was noticed that the 
 actual join op didn't show up as a significant fraction of the time spent. 
 Most of the time was spent parsing configuration files.
 To confirm, I put log lines in the HiveConf constructor and eventually made 
 the following edit to the code
 {code}
 --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws 
 HiveException {
 * @return list of file status entries
 */
private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws 
 IOException {
 -HiveConf hiveConf = new HiveConf(job, FetchOperator.class);
 -boolean recursive = 
 hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE);
 +boolean recursive = false;
  if (!recursive) {
return fs.listStatus(p);
  }
 {code}
 And re-ran my query to compare timings.
 || ||Before||After||
 |Cumulative CPU| 731.07 sec|386.0 sec|
 |Total time | 347.66 seconds | 218.855 seconds | 
 |
 The query used was 
 

[jira] [Created] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-05-03 Thread Joey Echeverria (JIRA)
Joey Echeverria created HIVE-4487:
-

 Summary: Hive does not set explicit permissions on 
hive.exec.scratchdir
 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria


The hive.exec.scratchdir defaults to /tmp/hive-${user.name}, but when Hive 
creates this directory it doesn't set any explicit permission on it. This means 
if you have the default HDFS umask setting of 022, then these directories end 
up being world readable. These permissions also get applied to the staging 
directories and their files, thus leaving inter-stage data world readable.

This can cause a potential leak of data especially when operating on a Kerberos 
enabled cluster. Hive should probably default these directories to only be 
readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-05-03 Thread Joey Echeverria (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joey Echeverria updated HIVE-4487:
--

Description: 
The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
creates this directory it doesn't set any explicit permission on it. This means 
if you have the default HDFS umask setting of 022, then these directories end 
up being world readable. These permissions also get applied to the staging 
directories and their files, thus leaving inter-stage data world readable.

This can cause a potential leak of data especially when operating on a Kerberos 
enabled cluster. Hive should probably default these directories to only be 
readable by the owner.

  was:
The hive.exec.scratchdir defaults to /tmp/hive-${user.name}, but when Hive 
creates this directory it doesn't set any explicit permission on it. This means 
if you have the default HDFS umask setting of 022, then these directories end 
up being world readable. These permissions also get applied to the staging 
directories and their files, thus leaving inter-stage data world readable.

This can cause a potential leak of data especially when operating on a Kerberos 
enabled cluster. Hive should probably default these directories to only be 
readable by the owner.


 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria

 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions

2013-05-03 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-4486:
--

Affects Version/s: 0.12.0

 FetchOperator slows down SMB map joins by 50% when there are many partitions
 

 Key: HIVE-4486
 URL: https://issues.apache.org/jira/browse/HIVE-4486
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
 Environment: Ubuntu LXC 12.10
Reporter: Gopal V
Priority: Minor
 Attachments: smb-profile.html


 While looking at log files for SMB joins in hive, it was noticed that the 
 actual join op didn't show up as a significant fraction of the time spent. 
 Most of the time was spent parsing configuration files.
 To confirm, I put log lines in the HiveConf constructor and eventually made 
 the following edit to the code
 {code}
 --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws 
 HiveException {
 * @return list of file status entries
 */
private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws 
 IOException {
 -HiveConf hiveConf = new HiveConf(job, FetchOperator.class);
 -boolean recursive = 
 hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE);
 +boolean recursive = false;
  if (!recursive) {
return fs.listStatus(p);
  }
 {code}
 And re-ran my query to compare timings.
 || ||Before||After||
 |Cumulative CPU| 731.07 sec|386.0 sec|
 |Total time | 347.66 seconds | 218.855 seconds | 
 |
 The query used was 
 {code}INSERT OVERWRITE LOCAL DIRECTORY
 '/grid/0/smb/'
 select inv_item_sk
 from
  inventory inv
  join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk)
 limit 10
 ;
 {code}
 On a scale=2 tpcds data-set, where both store_sales  inventory are bucketed 
 into 4 buckets, with store_sales split into 7 partitions and inventory into 
 261 partitions.
 78% of all CPU time was spent within new HiveConf(). The yourkit profiler 
 runs are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-05-03 Thread Joey Echeverria (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648435#comment-13648435
 ] 

Joey Echeverria commented on HIVE-4487:
---

The current workaround is to set the umask in hive-site.xml:
{code:xml}
  property
namefs.permissions.umask-mode/name
value077/value
  /property
{code}

 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria

 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4488) BucketizedHiveInputFormat is pessimistic with SMB split generation

2013-05-03 Thread Gopal V (JIRA)
Gopal V created HIVE-4488:
-

 Summary: BucketizedHiveInputFormat is pessimistic with SMB split 
generation
 Key: HIVE-4488
 URL: https://issues.apache.org/jira/browse/HIVE-4488
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
 Environment: Ubuntu LXC
Reporter: Gopal V


BucketizedHiveInputFormat generates fewer splits than possible when faced with 
a table structure where both tables are partitioned.

When debugging query82 from the TPC-DS spec, there were 7 partitions in the lhs 
(store_sales)  8 partitions in the rhs (inventory), with 1 bucket each.

Only 7 splits are generated from the mapper, instead of a potential 56 mappers.

{code}
13/05/01 07:08:22 INFO mapred.FileInputFormat: Total input paths to process : 1
13/05/01 07:08:22 INFO io.BucketizedHiveInputFormat: 7 bucketized splits 
generated from 344 original splits.
{code}

The loop that generates the splits is as follows

{code}
InputSplit[] iss = inputFormat.getSplits(newjob, 0);
if (iss != null  iss.length  0) {
  numOrigSplits += iss.length;
  result.add(new BucketizedHiveInputSplit(iss, inputFormatClass
  .getName()));
}
{code}

As is clear from above, even though the more granular (per-file/per-partition) 
splits coming off the getSplits() is being added to a single bucket split.

Logically, in our mapper we get 

{code}
store_sales(2003)/00_1)
join MergeQueue(
  inv(1998-01-01)/00_0
  inv(1998-01-08)/00_0
  inv(1998-01-15)/00_0
  inv(1998-01-22)/00_0
  inv(1998-01-29)/00_0
  inv(1998-02-05)/00_0
  inv(1998-02-12)/00_0
  inv(1998-02-19)/00_0
  inv(1998-02-26)/00_0
  )
{code}

Where ideally, we could've used a CombineFileInputFormat to get node locality 
for the merge queue inputs (viz BucketizedHiveInputSplit).

This would be far better in generating splits  in getting more out of 
short-circuit reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4479) Child expressions are not being evaluated hierarchically in a few templates.

2013-05-03 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4479.


   Resolution: Fixed
Fix Version/s: vectorization-branch

Committed to branch. Thanks, Jitendra!

 Child expressions are not being evaluated hierarchically in a few templates.
 

 Key: HIVE-4479
 URL: https://issues.apache.org/jira/browse/HIVE-4479
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: vectorization-branch

 Attachments: HIVE-4479.1.patch


 FilterColumnCompareColumn.txt, FilterStringColumnCompareScalar.txt and 
 ScalarArithmeticColumn.txt are not evaluating the child expressions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4481) Vectorized row batch should be initialized with additional columns to hold intermediate output.

2013-05-03 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4481.


   Resolution: Fixed
Fix Version/s: vectorization-branch

Committed to branch. Thanks, Jitendra!

 Vectorized row batch should be initialized with additional columns to hold 
 intermediate output.
 ---

 Key: HIVE-4481
 URL: https://issues.apache.org/jira/browse/HIVE-4481
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: vectorization-branch

 Attachments: HIVE-4481.1.patch


 Vectorized row batch should be initialized with additional columns to hold 
 intermediate output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4477) remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic

2013-05-03 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4477:
---

   Resolution: Fixed
Fix Version/s: vectorization-branch
   Status: Resolved  (was: Patch Available)

Committed to branch. Thanks, Eric!

 remove redundant copy of arithmetic filter unit test 
 testColOpScalarNumericFilterNullAndRepeatingLogic
 --

 Key: HIVE-4477
 URL: https://issues.apache.org/jira/browse/HIVE-4477
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Fix For: vectorization-branch

 Attachments: HIVE-4477.1.patch


 same test got ported to 2 different files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4480) Implement partition support for vectorized query execution

2013-05-03 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4480:
---

   Resolution: Fixed
Fix Version/s: vectorization-branch
   Status: Resolved  (was: Patch Available)

Committed to branch. Thanks, Sarvesh!

 Implement partition support for vectorized query execution
 --

 Key: HIVE-4480
 URL: https://issues.apache.org/jira/browse/HIVE-4480
 Project: Hive
  Issue Type: Sub-task
Reporter: Sarvesh Sakalanaga
Assignee: Sarvesh Sakalanaga
 Fix For: vectorization-branch

 Attachments: Hive-4480.1.patch


 Add support for eager deserialization of row data using serde in the 
 RecordReader layer. Also add support for partitions in this layer so that the 
 vectorized batch is populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4489) beeline always return the same error message twice

2013-05-03 Thread Chaoyu Tang (JIRA)
Chaoyu Tang created HIVE-4489:
-

 Summary: beeline always return the same error message twice
 Key: HIVE-4489
 URL: https://issues.apache.org/jira/browse/HIVE-4489
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.10.0
Reporter: Chaoyu Tang
Priority: Minor
 Fix For: 0.11.0


Beeline always returns the same error message twice. for example, if I try to 
create a table a2 which already exists, it prints out two exact same messages 
and it is not quite user friendly.
{{{
beeline !connect jdbc:hive2://localhost:1 scott tiger 
org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://localhost:1
Connected to: Hive (version 0.10.0)
Driver: Hive (version 0.10.0-cdh4.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:1 create table a2 (value int);
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
}}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4489) beeline always return the same error message twice

2013-05-03 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-4489:
--

Description: 
Beeline always returns the same error message twice. for example, if I try to 
create a table a2 which already exists, it prints out two exact same messages 
and it is not quite user friendly.
{code}
beeline !connect jdbc:hive2://localhost:1 scott tiger 
org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://localhost:1
Connected to: Hive (version 0.10.0)
Driver: Hive (version 0.10.0-cdh4.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:1 create table a2 (value int);
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
{code}

  was:
Beeline always returns the same error message twice. for example, if I try to 
create a table a2 which already exists, it prints out two exact same messages 
and it is not quite user friendly.
{{{
beeline !connect jdbc:hive2://localhost:1 scott tiger 
org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://localhost:1
Connected to: Hive (version 0.10.0)
Driver: Hive (version 0.10.0-cdh4.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:1 create table a2 (value int);
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
}}}


 beeline always return the same error message twice
 --

 Key: HIVE-4489
 URL: https://issues.apache.org/jira/browse/HIVE-4489
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.10.0
Reporter: Chaoyu Tang
Priority: Minor
  Labels: newbie
 Fix For: 0.11.0

   Original Estimate: 0h
  Remaining Estimate: 0h

 Beeline always returns the same error message twice. for example, if I try to 
 create a table a2 which already exists, it prints out two exact same messages 
 and it is not quite user friendly.
 {code}
 beeline !connect jdbc:hive2://localhost:1 scott tiger 
 org.apache.hive.jdbc.HiveDriver
 Connecting to jdbc:hive2://localhost:1
 Connected to: Hive (version 0.10.0)
 Driver: Hive (version 0.10.0-cdh4.2.1)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 0: jdbc:hive2://localhost:1 create table a2 (value int);
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4489) beeline always return the same error message twice

2013-05-03 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-4489:
--

Fix Version/s: (was: 0.11.0)

 beeline always return the same error message twice
 --

 Key: HIVE-4489
 URL: https://issues.apache.org/jira/browse/HIVE-4489
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.10.0
Reporter: Chaoyu Tang
Priority: Minor
  Labels: newbie
   Original Estimate: 0h
  Remaining Estimate: 0h

 Beeline always returns the same error message twice. for example, if I try to 
 create a table a2 which already exists, it prints out two exact same messages 
 and it is not quite user friendly.
 {code}
 beeline !connect jdbc:hive2://localhost:1 scott tiger 
 org.apache.hive.jdbc.HiveDriver
 Connecting to jdbc:hive2://localhost:1
 Connected to: Hive (version 0.10.0)
 Driver: Hive (version 0.10.0-cdh4.2.1)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 0: jdbc:hive2://localhost:1 create table a2 (value int);
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4489) beeline always return the same error message twice

2013-05-03 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-4489:
--

Attachment: HIVE-4489.patch

removed duplicated error logging in the low level of exception catch block and 
only the top level catch block print out the error.

 beeline always return the same error message twice
 --

 Key: HIVE-4489
 URL: https://issues.apache.org/jira/browse/HIVE-4489
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.10.0
Reporter: Chaoyu Tang
Priority: Minor
  Labels: newbie
 Attachments: HIVE-4489.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 Beeline always returns the same error message twice. for example, if I try to 
 create a table a2 which already exists, it prints out two exact same messages 
 and it is not quite user friendly.
 {code}
 beeline !connect jdbc:hive2://localhost:1 scott tiger 
 org.apache.hive.jdbc.HiveDriver
 Connecting to jdbc:hive2://localhost:1
 Connected to: Hive (version 0.10.0)
 Driver: Hive (version 0.10.0-cdh4.2.1)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 0: jdbc:hive2://localhost:1 create table a2 (value int);
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-4489: beeline always return the same error message twice

2013-05-03 Thread Chaoyu Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10917/
---

Review request for hive.


Description
---

Beeline always returns the same error message twice -- because the error is 
logged out both in an exception catch block and its outer re-catch block.


This addresses bug HIVE-4489.
https://issues.apache.org/jira/browse/HIVE-4489


Diffs
-

  beeline/src/java/org/apache/hive/beeline/Commands.java 8e2a52f 

Diff: https://reviews.apache.org/r/10917/diff/


Testing
---

Have done the tests.


Thanks,

Chaoyu Tang



[jira] [Commented] (HIVE-4474) Column access not tracked properly for partitioned tables

2013-05-03 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648537#comment-13648537
 ] 

Gang Tim Liu commented on HIVE-4474:


Committed. thank Samuel Yuan

 Column access not tracked properly for partitioned tables
 -

 Key: HIVE-4474
 URL: https://issues.apache.org/jira/browse/HIVE-4474
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
 Attachments: HIVE-4474.1.patch.txt


 The columns recorded as being accessed is incorrect for partitioned tables. 
 The index of accessed columns is a position in the list of non-partition 
 columns, but a list of all columns is being used right now to do the lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Attachment: (was: HIVE-3959.patch.9.txt)

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.12.txt, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Attachment: HIVE-3959.patch.12.txt

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.12.txt, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4251) Indices can't be built on tables whose schema info comes from SerDe

2013-05-03 Thread Mark Wagner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648665#comment-13648665
 ] 

Mark Wagner commented on HIVE-4251:
---

Hi Steven,

Indexing on the field of a record/struct isn't supported yet. That's also the 
case for other metadata like cluster, sort, and skew columns. I've been taking 
a look at that recently, and will open up a JIRA to discuss/track. I tried your 
second case and got the same issue as you. It seems to be an unrelated issue 
that is preventing group by using a struct as a key. These are both issues that 
affect all storage formats though, so we should discuss them in their own JIRAs.

Can you confirm that you're able to create indices on top level primitive 
columns of Avro tables with this patch?

Thanks,
Mark

 Indices can't be built on tables whose schema info comes from SerDe
 ---

 Key: HIVE-4251
 URL: https://issues.apache.org/jira/browse/HIVE-4251
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0, 0.10.1
Reporter: Mark Wagner
Assignee: Mark Wagner
 Fix For: 0.11.0, 0.10.1

 Attachments: HIVE-4251.1.patch, HIVE-4251.2.patch


 Building indices on tables who get the schema information from the 
 deserializer (e.g. Avro backed tables) doesn't work because when the column 
 is checked to exist, the correct API isn't used.
 {code}
 hive describe doctors;   

 OK
 # col_namedata_type   comment 

 numberint from deserializer   
 first_namestring  from deserializer   
 last_name string  from deserializer   
 Time taken: 0.215 seconds, Fetched: 5 row(s)
 hive create index doctors_index on table doctors(number) as 'compact' with 
 deferred rebuild; 
 FAILED: Error in metadata: java.lang.RuntimeException: Check the index 
 columns, they should appear in the table being indexed.
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3746) TRowSet resultset structure should be column-oriented

2013-05-03 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648667#comment-13648667
 ] 

Carl Steinbach commented on HIVE-3746:
--

bq. If an application has requested a single row, and the client has requested 
n rows from the server in an effort to reduce round trips, then n-1 intervening 
values from the first column must be cached off somewhere before the first 
value for the second column can be accessed.

If the fetch size is n, then the client is going to end up storing n rows in 
memory regardless of whether the result set is represented in a row-major or 
column-major format. Put another way, the unit of data transfer between the 
server and client is a variable sized resultset. The client has the option of 
setting the result size very low in order to achieve lower latency, or making 
it very large in order to get higher overall throughput. However, the key 
limitation is that the client is not able to provide access to any of the rows 
contained in a resultset until the entire resultset has been transferred from 
the server to the client. This limitation is a consequence of the fact that 
we're using a message oriented RPC layer (Thrift) to handle communication and 
data transfer between the client and server.

 TRowSet resultset structure should be column-oriented
 -

 Key: HIVE-3746
 URL: https://issues.apache.org/jira/browse/HIVE-3746
 Project: Hive
  Issue Type: Sub-task
  Components: Server Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach
  Labels: HiveServer2



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4490) HS2 - 'select null ..' fails with NPE

2013-05-03 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-4490:
---

 Summary: HS2 - 'select null ..' fails with NPE
 Key: HIVE-4490
 URL: https://issues.apache.org/jira/browse/HIVE-4490
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair


Eg, from beeline 
{code}
 select null, i from t1 ;
Error: Error running query: java.lang.NullPointerException (state=,code=0)
Error: Error running query: java.lang.NullPointerException (state=,code=0)
{code}

In HS2 log
org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.lang.NullPointerException
at 
org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:113)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:169)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:62)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:524)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:57)
at $Proxy8.executeStatement(Unknown Source)
at 
org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:148)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:203)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4491) Grouping by a struct throws an exception

2013-05-03 Thread Mark Wagner (JIRA)
Mark Wagner created HIVE-4491:
-

 Summary: Grouping by a struct throws an exception
 Key: HIVE-4491
 URL: https://issues.apache.org/jira/browse/HIVE-4491
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Mark Wagner
Assignee: Mark Wagner


Queries that require a shuffle with a struct as the key result in an exception: 
{code}Caused by: java.lang.RuntimeException: Hash code on complex types not 
supported yet.
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:528)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:531)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:859)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118)
... 13 more
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4491) Grouping by a struct throws an exception

2013-05-03 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner reassigned HIVE-4491:
-

Assignee: (was: Mark Wagner)

 Grouping by a struct throws an exception
 

 Key: HIVE-4491
 URL: https://issues.apache.org/jira/browse/HIVE-4491
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Mark Wagner

 Queries that require a shuffle with a struct as the key result in an 
 exception: {code}Caused by: java.lang.RuntimeException: Hash code on complex 
 types not supported yet.
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:528)
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:531)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:859)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118)
   ... 13 more
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4491) Grouping by a struct throws an exception

2013-05-03 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated HIVE-4491:
--

Attachment: demonstration.txt

A full demonstration, using the table created in the create_struct_table.q test.

 Grouping by a struct throws an exception
 

 Key: HIVE-4491
 URL: https://issues.apache.org/jira/browse/HIVE-4491
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Mark Wagner
 Attachments: demonstration.txt


 Queries that require a shuffle with a struct as the key result in an 
 exception: {code}Caused by: java.lang.RuntimeException: Hash code on complex 
 types not supported yet.
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:528)
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:531)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:859)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118)
   ... 13 more
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4491) Grouping by a struct throws an exception

2013-05-03 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner reassigned HIVE-4491:
-

Assignee: Mark Wagner

 Grouping by a struct throws an exception
 

 Key: HIVE-4491
 URL: https://issues.apache.org/jira/browse/HIVE-4491
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Mark Wagner
Assignee: Mark Wagner
 Attachments: demonstration.txt


 Queries that require a shuffle with a struct as the key result in an 
 exception: {code}Caused by: java.lang.RuntimeException: Hash code on complex 
 types not supported yet.
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:528)
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:531)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:859)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118)
   ... 13 more
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4491) Grouping by a struct throws an exception

2013-05-03 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner resolved HIVE-4491.
---

Resolution: Duplicate

My mistake. This is a duplicate of HIVE-2517

 Grouping by a struct throws an exception
 

 Key: HIVE-4491
 URL: https://issues.apache.org/jira/browse/HIVE-4491
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Mark Wagner
Assignee: Mark Wagner
 Attachments: demonstration.txt


 Queries that require a shuffle with a struct as the key result in an 
 exception: {code}Caused by: java.lang.RuntimeException: Hash code on complex 
 types not supported yet.
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:528)
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:531)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:859)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118)
   ... 13 more
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4490) HS2 - 'select null ..' fails with NPE

2013-05-03 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648673#comment-13648673
 ] 

Prasad Mujumdar commented on HIVE-4490:
---

Looks like duplicate of HIVE-4172

 HS2 - 'select null ..' fails with NPE
 -

 Key: HIVE-4490
 URL: https://issues.apache.org/jira/browse/HIVE-4490
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair

 Eg, from beeline 
 {code}
  select null, i from t1 ;
 Error: Error running query: java.lang.NullPointerException (state=,code=0)
 Error: Error running query: java.lang.NullPointerException (state=,code=0)
 {code}
 In HS2 log
 org.apache.hive.service.cli.HiveSQLException: Error running query: 
 java.lang.NullPointerException
 at 
 org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:113)
 at 
 org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:169)
 at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:62)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:524)
 at 
 org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:57)
 at $Proxy8.executeStatement(Unknown Source)
 at 
 org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:148)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:203)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4492) Revert HIVE-4322

2013-05-03 Thread Samuel Yuan (JIRA)
Samuel Yuan created HIVE-4492:
-

 Summary: Revert HIVE-4322
 Key: HIVE-4492
 URL: https://issues.apache.org/jira/browse/HIVE-4492
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Thrift API
Reporter: Samuel Yuan
Assignee: Samuel Yuan


See HIVE-4432 and HIVE-4433. It's possible to work around these issues but a 
better solution is probably to roll back the fix and change the API to use a 
primitive type as the map key (in a backwards-compatible manner).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4433) Fix C++ Thrift bindings broken in HIVE-4322

2013-05-03 Thread Samuel Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648683#comment-13648683
 ] 

Samuel Yuan commented on HIVE-4433:
---

I'm thinking it's possible to work around this by defining '' since it's 
present in the auto-generated header file. Given that other language bindings 
might also have been broken by HIVE-4322 though it's probably better to change 
the map key to a primitive type instead. I have filed HIVE-4492 to revert the 
original change.

 Fix C++ Thrift bindings broken in HIVE-4322
 ---

 Key: HIVE-4433
 URL: https://issues.apache.org/jira/browse/HIVE-4433
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Thrift API
Affects Versions: 0.12.0
Reporter: Carl Steinbach
Assignee: Samuel Yuan
Priority: Blocker
 Fix For: 0.12.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4493) Implement filter for string column compared to string column

2013-05-03 Thread Eric Hanson (JIRA)
Eric Hanson created HIVE-4493:
-

 Summary: Implement filter for string column compared to string 
column
 Key: HIVE-4493
 URL: https://issues.apache.org/jira/browse/HIVE-4493
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4493) Implement vectorized filter for string column compared to string column

2013-05-03 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4493:
--

Summary: Implement vectorized filter for string column compared to string 
column  (was: Implement filter for string column compared to string column)

 Implement vectorized filter for string column compared to string column
 ---

 Key: HIVE-4493
 URL: https://issues.apache.org/jira/browse/HIVE-4493
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4494) ORC map columns get class cast exception in some context

2013-05-03 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4494:
---

 Summary: ORC map columns get class cast exception in some context
 Key: HIVE-4494
 URL: https://issues.apache.org/jira/browse/HIVE-4494
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Setting up the test case like:

{quote}
create table map_text (
  name string,
  m mapstring,string
) row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':';

create table map_orc (
  name string,
  m mapstring,string
) stored as orc;

cat map.txt
name1|key11:value11,key12:value12,key13:value13
name2|key21:value21,key22:value22,key23:value23
name3|key31:value31,key32:value32,key33:value33

load data local inpath 'map.txt' into table map_text;

insert overwrite table map_orc select * from map_text;
{quote}

Selecting the name column from orc_map will get the following exception:

{quote}
java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcMapObjectInspector cannot be cast 
to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
at 
org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:522)
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:90)
... 22 more
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcMapObjectInspector cannot be cast 
to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:307)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:270)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:482)
... 23 more
{quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4495) Implement vectorized string substr

2013-05-03 Thread Timothy Chen (JIRA)
Timothy Chen created HIVE-4495:
--

 Summary: Implement vectorized string substr
 Key: HIVE-4495
 URL: https://issues.apache.org/jira/browse/HIVE-4495
 Project: Hive
  Issue Type: Sub-task
Reporter: Timothy Chen




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4495) Implement vectorized string substr

2013-05-03 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4495:
--

Assignee: Eric Hanson

 Implement vectorized string substr
 --

 Key: HIVE-4495
 URL: https://issues.apache.org/jira/browse/HIVE-4495
 Project: Hive
  Issue Type: Sub-task
Reporter: Timothy Chen
Assignee: Eric Hanson



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4194) JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL

2013-05-03 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648844#comment-13648844
 ] 

Richard Ding commented on HIVE-4194:


[~cwsteinbach] Thejas is right about acceptsURL as part of the java.sql.Driver 
interface. I also prefer the simple change to fix this simple issue, and leave 
the package visibility changes to another JIRA. What do you think?

 JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL
 --

 Key: HIVE-4194
 URL: https://issues.apache.org/jira/browse/HIVE-4194
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.11.0

 Attachments: HIVE-4194.patch


 As per JDBC 3.0 Spec (section 9.2)
 If the Driver implementation understands the URL, it will return a 
 Connection object; otherwise it returns null
 Currently HiveConnection constructor will throw IllegalArgumentException if 
 url string doesn't start with jdbc:hive2. This exception should be caught 
 by HiveDriver.connect and return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4489) beeline always return the same error message twice

2013-05-03 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-4489:
--

Attachment: (was: HIVE-4489.patch)

 beeline always return the same error message twice
 --

 Key: HIVE-4489
 URL: https://issues.apache.org/jira/browse/HIVE-4489
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.10.0
Reporter: Chaoyu Tang
Priority: Minor
  Labels: newbie
 Attachments: HIVE-4489.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 Beeline always returns the same error message twice. for example, if I try to 
 create a table a2 which already exists, it prints out two exact same messages 
 and it is not quite user friendly.
 {code}
 beeline !connect jdbc:hive2://localhost:1 scott tiger 
 org.apache.hive.jdbc.HiveDriver
 Connecting to jdbc:hive2://localhost:1
 Connected to: Hive (version 0.10.0)
 Driver: Hive (version 0.10.0-cdh4.2.1)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 0: jdbc:hive2://localhost:1 create table a2 (value int);
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4489) beeline always return the same error message twice

2013-05-03 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-4489:
--

Attachment: HIVE-4489.patch

 beeline always return the same error message twice
 --

 Key: HIVE-4489
 URL: https://issues.apache.org/jira/browse/HIVE-4489
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.10.0
Reporter: Chaoyu Tang
Priority: Minor
  Labels: newbie
 Attachments: HIVE-4489.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 Beeline always returns the same error message twice. for example, if I try to 
 create a table a2 which already exists, it prints out two exact same messages 
 and it is not quite user friendly.
 {code}
 beeline !connect jdbc:hive2://localhost:1 scott tiger 
 org.apache.hive.jdbc.HiveDriver
 Connecting to jdbc:hive2://localhost:1
 Connected to: Hive (version 0.10.0)
 Driver: Hive (version 0.10.0-cdh4.2.1)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 0: jdbc:hive2://localhost:1 create table a2 (value int);
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4492) Revert HIVE-4322

2013-05-03 Thread Samuel Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Yuan updated HIVE-4492:
--

Attachment: HIVE-4492.1.patch.txt

 Revert HIVE-4322
 

 Key: HIVE-4492
 URL: https://issues.apache.org/jira/browse/HIVE-4492
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Thrift API
Reporter: Samuel Yuan
Assignee: Samuel Yuan
 Attachments: HIVE-4492.1.patch.txt


 See HIVE-4432 and HIVE-4433. It's possible to work around these issues but a 
 better solution is probably to roll back the fix and change the API to use 
 a primitive type as the map key (in a backwards-compatible manner).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4194) JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL

2013-05-03 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648869#comment-13648869
 ] 

Carl Steinbach commented on HIVE-4194:
--

Sounds good to me. +1.

I don't have access to a build farm so I'll leave the testing and commit work 
to someone else.

 JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL
 --

 Key: HIVE-4194
 URL: https://issues.apache.org/jira/browse/HIVE-4194
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.11.0

 Attachments: HIVE-4194.patch


 As per JDBC 3.0 Spec (section 9.2)
 If the Driver implementation understands the URL, it will return a 
 Connection object; otherwise it returns null
 Currently HiveConnection constructor will throw IllegalArgumentException if 
 url string doesn't start with jdbc:hive2. This exception should be caught 
 by HiveDriver.connect and return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3959 started by Gang Tim Liu.

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.12.txt, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-03 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Status: Patch Available  (was: In Progress)

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.12.txt, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4496) JDBC2 won't compile with JDK7

2013-05-03 Thread Chris Drome (JIRA)
Chris Drome created HIVE-4496:
-

 Summary: JDBC2 won't compile with JDK7
 Key: HIVE-4496
 URL: https://issues.apache.org/jira/browse/HIVE-4496
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.12.0
Reporter: Chris Drome
Assignee: Chris Drome


HiveServer2 related JDBC does not compile with JDK7. Related to HIVE-3384.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3384) HIVE JDBC module won't compile under JDK1.7 as new methods added in JDBC specification

2013-05-03 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648913#comment-13648913
 ] 

Chris Drome commented on HIVE-3384:
---

The error is not related to this patch. Rather it is associated with new code 
added in 0.11.

Please refer to HIVE-4496.

 HIVE JDBC module won't compile under JDK1.7 as new methods added in JDBC 
 specification
 --

 Key: HIVE-3384
 URL: https://issues.apache.org/jira/browse/HIVE-3384
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.10.0
Reporter: Weidong Bian
Assignee: Chris Drome
Priority: Minor
 Fix For: 0.11.0

 Attachments: D6873-0.9.1.patch, D6873.1.patch, D6873.2.patch, 
 D6873.3.patch, D6873.4.patch, D6873.5.patch, D6873.6.patch, D6873.7.patch, 
 HIVE-3384-0.10.patch, HIVE-3384-2012-12-02.patch, HIVE-3384-2012-12-04.patch, 
 HIVE-3384.2.patch, HIVE-3384-branch-0.9.patch, HIVE-3384.patch, 
 HIVE-JDK7-JDBC.patch


 jdbc module couldn't be compiled with jdk7 as it adds some abstract method in 
 the JDBC specification 
 some error info:
  error: HiveCallableStatement is not abstract and does not override abstract
 method TgetObject(String,ClassT) in CallableStatement
 .
 .
 .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4497) beeline module tests don't get run by default

2013-05-03 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-4497:
---

 Summary: beeline module tests don't get run by default
 Key: HIVE-4497
 URL: https://issues.apache.org/jira/browse/HIVE-4497
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair


beeline tests are not getting run by default . 
See 
https://builds.apache.org/job/Hive-trunk-h0.21/lastCompletedBuild/testReport/


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4497) beeline module tests don't get run by default

2013-05-03 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4497:


Attachment: HIVE-4497.1.patch

HIVE-4497.1.patch - adds beeline to iterate.hive.tests in build.properties


 beeline module tests don't get run by default
 -

 Key: HIVE-4497
 URL: https://issues.apache.org/jira/browse/HIVE-4497
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4497.1.patch


 beeline tests are not getting run by default . 
 See 
 https://builds.apache.org/job/Hive-trunk-h0.21/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4496) JDBC2 won't compile with JDK7

2013-05-03 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-4496:
--

Attachment: HIVE-4496.patch

Attached trunk patch.

 JDBC2 won't compile with JDK7
 -

 Key: HIVE-4496
 URL: https://issues.apache.org/jira/browse/HIVE-4496
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.12.0
Reporter: Chris Drome
Assignee: Chris Drome
 Attachments: HIVE-4496.patch


 HiveServer2 related JDBC does not compile with JDK7. Related to HIVE-3384.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4496) JDBC2 won't compile with JDK7

2013-05-03 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648953#comment-13648953
 ] 

Chris Drome commented on HIVE-4496:
---

Phabricator ticket: https://reviews.facebook.net/D10647

 JDBC2 won't compile with JDK7
 -

 Key: HIVE-4496
 URL: https://issues.apache.org/jira/browse/HIVE-4496
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.12.0
Reporter: Chris Drome
Assignee: Chris Drome
 Attachments: HIVE-4496.patch


 HiveServer2 related JDBC does not compile with JDK7. Related to HIVE-3384.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4496) JDBC2 won't compile with JDK7

2013-05-03 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-4496:
--

Fix Version/s: 0.12.0
   Status: Patch Available  (was: Open)

Ported the HIVE-3384 patch to the HS2 JDBC code.

 JDBC2 won't compile with JDK7
 -

 Key: HIVE-4496
 URL: https://issues.apache.org/jira/browse/HIVE-4496
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.12.0
Reporter: Chris Drome
Assignee: Chris Drome
 Fix For: 0.12.0

 Attachments: HIVE-4496.patch


 HiveServer2 related JDBC does not compile with JDK7. Related to HIVE-3384.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4498) TestBeeLineWithArgs.testPositiveScriptFile fails

2013-05-03 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-4498:
---

 Summary: TestBeeLineWithArgs.testPositiveScriptFile fails
 Key: HIVE-4498
 URL: https://issues.apache.org/jira/browse/HIVE-4498
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2, JDBC
Reporter: Thejas M Nair


TestBeeLineWithArgs.testPositiveScriptFile fails -
{code}
   [junit] 0: jdbc:hive2://localhost:1  STARTED 
testBreakOnErrorScriptFile
[junit] Output: Connecting to jdbc:hive2://localhost:1
[junit] Connected to: Hive (version 0.12.0-SNAPSHOT)
[junit] Driver: Hive (version 0.12.0-SNAPSHOT)
[junit] Transaction isolation: TRANSACTION_REPEATABLE_READ
[junit] Beeline version 0.12.0-SNAPSHOT by Apache Hive
[junit] ++
[junit] | database_name  |
[junit] ++
[junit] ++
[junit] No rows selected (0.899 seconds)
[junit] Closing: org.apache.hive.jdbc.HiveConnection
[junit]
[junit]  FAILED testPositiveScriptFile (ERROR) (2s)

{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4497) beeline module tests don't get run by default

2013-05-03 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4497:


Status: Patch Available  (was: Open)

 beeline module tests don't get run by default
 -

 Key: HIVE-4497
 URL: https://issues.apache.org/jira/browse/HIVE-4497
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4497.1.patch


 beeline tests are not getting run by default . 
 See 
 https://builds.apache.org/job/Hive-trunk-h0.21/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4497) beeline module tests don't get run by default

2013-05-03 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648965#comment-13648965
 ] 

Carl Steinbach commented on HIVE-4497:
--

This is a duplicate of HIVE-4357 (which I never got around to testing), but 
that patch positions beeline after ql, and it makes more sense to run it after 
jdbc as is done here.

+1 (someone else needs to test and commit since I don't have a build farm).

 beeline module tests don't get run by default
 -

 Key: HIVE-4497
 URL: https://issues.apache.org/jira/browse/HIVE-4497
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4497.1.patch


 beeline tests are not getting run by default . 
 See 
 https://builds.apache.org/job/Hive-trunk-h0.21/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4485) beeline prints null as empty strings

2013-05-03 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4485:


Attachment: HIVE-4485.2.patch

Making null string configurable is probably over engineering at this point. 
HIVE-4485.2.patch - Simpler patch that does not make it configurable.


 beeline prints null as empty strings
 

 Key: HIVE-4485
 URL: https://issues.apache.org/jira/browse/HIVE-4485
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4485.1.patch, HIVE-4485.2.patch


  beeline is printing nulls as emtpy strings. 
 This is inconsistent with hive cli and other databases, they print null as 
 NULL string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent

2013-05-03 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648987#comment-13648987
 ] 

Shreepadma Venugopalan commented on HIVE-4435:
--

Can a committer take a look at this?

 Column stats: Distinct value estimator should use hash functions that are 
 pairwise independent
 --

 Key: HIVE-4435
 URL: https://issues.apache.org/jira/browse/HIVE-4435
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: chart_1(1).png, HIVE-4435.1.patch


 The current implementation of Flajolet-Martin estimator to estimate the 
 number of distinct values doesn't use hash functions that are pairwise 
 independent. This is problematic because the input values don't distribute 
 uniformly. When run on large TPC-H data sets, this leads to a huge 
 discrepancy for primary key columns. Primary key columns are typically a 
 monotonically increasing sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4497) beeline module tests don't get run by default

2013-05-03 Thread Rob Weltman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Weltman updated HIVE-4497:
--

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Patch available at https://issues.apache.org/jira/browse/HIVE-4357


 beeline module tests don't get run by default
 -

 Key: HIVE-4497
 URL: https://issues.apache.org/jira/browse/HIVE-4497
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4497.1.patch


 beeline tests are not getting run by default . 
 See 
 https://builds.apache.org/job/Hive-trunk-h0.21/lastCompletedBuild/testReport/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira