[jira] [Updated] (HIVE-4485) beeline prints null as empty strings
[ https://issues.apache.org/jira/browse/HIVE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4485: Attachment: HIVE-4485.1.patch HIVE-4485.1.patch - initial patch. Makes null string configurable. test needs fixing/improvement beeline prints null as empty strings Key: HIVE-4485 URL: https://issues.apache.org/jira/browse/HIVE-4485 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4485.1.patch beeline is printing nulls as emtpy strings. This is inconsistent with hive cli and other databases, they print null as NULL string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions
[ https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4486: -- Summary: FetchOperator slows down SMB map joins by 50% when there are many partitions (was: FetchOperator slows down SMB map joins with many files) FetchOperator slows down SMB map joins by 50% when there are many partitions Key: HIVE-4486 URL: https://issues.apache.org/jira/browse/HIVE-4486 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC 12.10 Reporter: Gopal V Priority: Minor While looking at log files for SMB joins in hive, it was noticed that the actual join op didn't show up as a significant fraction of the time spent. Most of the time was spent parsing configuration files. To confirm, I put log lines in the HiveConf constructor and eventually made the following edit to the code {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws HiveException { * @return list of file status entries */ private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws IOException { -HiveConf hiveConf = new HiveConf(job, FetchOperator.class); -boolean recursive = hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE); +boolean recursive = false; if (!recursive) { return fs.listStatus(p); } {code} And re-ran my query to compare timings. ||Before||After|| |Cumulative CPU| 731.07 sec|386.0 sec| |Total time | 347.66 seconds | 218.855 seconds | | The query used was {code}INSERT OVERWRITE LOCAL DIRECTORY '/grid/0/smb/' select inv_item_sk from inventory inv join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk) limit 10 ; {code} On a scale=2 tpcds data-set, where both store_sales inventory are bucketed into 4 buckets, with store_sales split into 7 partitions and inventory into 261 partitions. 78% of all CPU time was spent within new HiveConf(). The yourkit profiler runs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4486) FetchOperator slows down SMB map joins with many files
Gopal V created HIVE-4486: - Summary: FetchOperator slows down SMB map joins with many files Key: HIVE-4486 URL: https://issues.apache.org/jira/browse/HIVE-4486 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC 12.10 Reporter: Gopal V Priority: Minor While looking at log files for SMB joins in hive, it was noticed that the actual join op didn't show up as a significant fraction of the time spent. Most of the time was spent parsing configuration files. To confirm, I put log lines in the HiveConf constructor and eventually made the following edit to the code {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws HiveException { * @return list of file status entries */ private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws IOException { -HiveConf hiveConf = new HiveConf(job, FetchOperator.class); -boolean recursive = hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE); +boolean recursive = false; if (!recursive) { return fs.listStatus(p); } {code} And re-ran my query to compare timings. ||Before||After|| |Cumulative CPU| 731.07 sec|386.0 sec| |Total time | 347.66 seconds | 218.855 seconds | | The query used was {code}INSERT OVERWRITE LOCAL DIRECTORY '/grid/0/smb/' select inv_item_sk from inventory inv join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk) limit 10 ; {code} On a scale=2 tpcds data-set, where both store_sales inventory are bucketed into 4 buckets, with store_sales split into 7 partitions and inventory into 261 partitions. 78% of all CPU time was spent within new HiveConf(). The yourkit profiler runs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions
[ https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4486: -- Attachment: smb-profile.html attach yourkit profile (HTML) FetchOperator slows down SMB map joins by 50% when there are many partitions Key: HIVE-4486 URL: https://issues.apache.org/jira/browse/HIVE-4486 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC 12.10 Reporter: Gopal V Priority: Minor Attachments: smb-profile.html While looking at log files for SMB joins in hive, it was noticed that the actual join op didn't show up as a significant fraction of the time spent. Most of the time was spent parsing configuration files. To confirm, I put log lines in the HiveConf constructor and eventually made the following edit to the code {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws HiveException { * @return list of file status entries */ private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws IOException { -HiveConf hiveConf = new HiveConf(job, FetchOperator.class); -boolean recursive = hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE); +boolean recursive = false; if (!recursive) { return fs.listStatus(p); } {code} And re-ran my query to compare timings. ||Before||After|| |Cumulative CPU| 731.07 sec|386.0 sec| |Total time | 347.66 seconds | 218.855 seconds | | The query used was {code}INSERT OVERWRITE LOCAL DIRECTORY '/grid/0/smb/' select inv_item_sk from inventory inv join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk) limit 10 ; {code} On a scale=2 tpcds data-set, where both store_sales inventory are bucketed into 4 buckets, with store_sales split into 7 partitions and inventory into 261 partitions. 78% of all CPU time was spent within new HiveConf(). The yourkit profiler runs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions
[ https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4486: -- Description: While looking at log files for SMB joins in hive, it was noticed that the actual join op didn't show up as a significant fraction of the time spent. Most of the time was spent parsing configuration files. To confirm, I put log lines in the HiveConf constructor and eventually made the following edit to the code {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws HiveException { * @return list of file status entries */ private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws IOException { -HiveConf hiveConf = new HiveConf(job, FetchOperator.class); -boolean recursive = hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE); +boolean recursive = false; if (!recursive) { return fs.listStatus(p); } {code} And re-ran my query to compare timings. || ||Before||After|| |Cumulative CPU| 731.07 sec|386.0 sec| |Total time | 347.66 seconds | 218.855 seconds | | The query used was {code}INSERT OVERWRITE LOCAL DIRECTORY '/grid/0/smb/' select inv_item_sk from inventory inv join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk) limit 10 ; {code} On a scale=2 tpcds data-set, where both store_sales inventory are bucketed into 4 buckets, with store_sales split into 7 partitions and inventory into 261 partitions. 78% of all CPU time was spent within new HiveConf(). The yourkit profiler runs are attached. was: While looking at log files for SMB joins in hive, it was noticed that the actual join op didn't show up as a significant fraction of the time spent. Most of the time was spent parsing configuration files. To confirm, I put log lines in the HiveConf constructor and eventually made the following edit to the code {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws HiveException { * @return list of file status entries */ private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws IOException { -HiveConf hiveConf = new HiveConf(job, FetchOperator.class); -boolean recursive = hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE); +boolean recursive = false; if (!recursive) { return fs.listStatus(p); } {code} And re-ran my query to compare timings. ||Before||After|| |Cumulative CPU| 731.07 sec|386.0 sec| |Total time | 347.66 seconds | 218.855 seconds | | The query used was {code}INSERT OVERWRITE LOCAL DIRECTORY '/grid/0/smb/' select inv_item_sk from inventory inv join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk) limit 10 ; {code} On a scale=2 tpcds data-set, where both store_sales inventory are bucketed into 4 buckets, with store_sales split into 7 partitions and inventory into 261 partitions. 78% of all CPU time was spent within new HiveConf(). The yourkit profiler runs are attached. FetchOperator slows down SMB map joins by 50% when there are many partitions Key: HIVE-4486 URL: https://issues.apache.org/jira/browse/HIVE-4486 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC 12.10 Reporter: Gopal V Priority: Minor Attachments: smb-profile.html While looking at log files for SMB joins in hive, it was noticed that the actual join op didn't show up as a significant fraction of the time spent. Most of the time was spent parsing configuration files. To confirm, I put log lines in the HiveConf constructor and eventually made the following edit to the code {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws HiveException { * @return list of file status entries */ private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws IOException { -HiveConf hiveConf = new HiveConf(job, FetchOperator.class); -boolean recursive = hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE); +boolean recursive = false; if (!recursive) { return fs.listStatus(p); } {code} And re-ran my query to compare timings. || ||Before||After|| |Cumulative CPU| 731.07 sec|386.0 sec| |Total time | 347.66 seconds | 218.855 seconds | | The query used was
[jira] [Created] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
Joey Echeverria created HIVE-4487: - Summary: Hive does not set explicit permissions on hive.exec.scratchdir Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria The hive.exec.scratchdir defaults to /tmp/hive-${user.name}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joey Echeverria updated HIVE-4487: -- Description: The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. was: The hive.exec.scratchdir defaults to /tmp/hive-${user.name}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4486) FetchOperator slows down SMB map joins by 50% when there are many partitions
[ https://issues.apache.org/jira/browse/HIVE-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4486: -- Affects Version/s: 0.12.0 FetchOperator slows down SMB map joins by 50% when there are many partitions Key: HIVE-4486 URL: https://issues.apache.org/jira/browse/HIVE-4486 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Environment: Ubuntu LXC 12.10 Reporter: Gopal V Priority: Minor Attachments: smb-profile.html While looking at log files for SMB joins in hive, it was noticed that the actual join op didn't show up as a significant fraction of the time spent. Most of the time was spent parsing configuration files. To confirm, I put log lines in the HiveConf constructor and eventually made the following edit to the code {code} --- ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java +++ ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java @@ -648,8 +648,7 @@ public ObjectInspector getOutputObjectInspector() throws HiveException { * @return list of file status entries */ private FileStatus[] listStatusUnderPath(FileSystem fs, Path p) throws IOException { -HiveConf hiveConf = new HiveConf(job, FetchOperator.class); -boolean recursive = hiveConf.getBoolVar(HiveConf.ConfVars.HADOOPMAPREDINPUTDIRRECURSIVE); +boolean recursive = false; if (!recursive) { return fs.listStatus(p); } {code} And re-ran my query to compare timings. || ||Before||After|| |Cumulative CPU| 731.07 sec|386.0 sec| |Total time | 347.66 seconds | 218.855 seconds | | The query used was {code}INSERT OVERWRITE LOCAL DIRECTORY '/grid/0/smb/' select inv_item_sk from inventory inv join store_sales ss on (ss.ss_item_sk = inv.inv_item_sk) limit 10 ; {code} On a scale=2 tpcds data-set, where both store_sales inventory are bucketed into 4 buckets, with store_sales split into 7 partitions and inventory into 261 partitions. 78% of all CPU time was spent within new HiveConf(). The yourkit profiler runs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir
[ https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648435#comment-13648435 ] Joey Echeverria commented on HIVE-4487: --- The current workaround is to set the umask in hive-site.xml: {code:xml} property namefs.permissions.umask-mode/name value077/value /property {code} Hive does not set explicit permissions on hive.exec.scratchdir -- Key: HIVE-4487 URL: https://issues.apache.org/jira/browse/HIVE-4487 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Joey Echeverria The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive creates this directory it doesn't set any explicit permission on it. This means if you have the default HDFS umask setting of 022, then these directories end up being world readable. These permissions also get applied to the staging directories and their files, thus leaving inter-stage data world readable. This can cause a potential leak of data especially when operating on a Kerberos enabled cluster. Hive should probably default these directories to only be readable by the owner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4488) BucketizedHiveInputFormat is pessimistic with SMB split generation
Gopal V created HIVE-4488: - Summary: BucketizedHiveInputFormat is pessimistic with SMB split generation Key: HIVE-4488 URL: https://issues.apache.org/jira/browse/HIVE-4488 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Environment: Ubuntu LXC Reporter: Gopal V BucketizedHiveInputFormat generates fewer splits than possible when faced with a table structure where both tables are partitioned. When debugging query82 from the TPC-DS spec, there were 7 partitions in the lhs (store_sales) 8 partitions in the rhs (inventory), with 1 bucket each. Only 7 splits are generated from the mapper, instead of a potential 56 mappers. {code} 13/05/01 07:08:22 INFO mapred.FileInputFormat: Total input paths to process : 1 13/05/01 07:08:22 INFO io.BucketizedHiveInputFormat: 7 bucketized splits generated from 344 original splits. {code} The loop that generates the splits is as follows {code} InputSplit[] iss = inputFormat.getSplits(newjob, 0); if (iss != null iss.length 0) { numOrigSplits += iss.length; result.add(new BucketizedHiveInputSplit(iss, inputFormatClass .getName())); } {code} As is clear from above, even though the more granular (per-file/per-partition) splits coming off the getSplits() is being added to a single bucket split. Logically, in our mapper we get {code} store_sales(2003)/00_1) join MergeQueue( inv(1998-01-01)/00_0 inv(1998-01-08)/00_0 inv(1998-01-15)/00_0 inv(1998-01-22)/00_0 inv(1998-01-29)/00_0 inv(1998-02-05)/00_0 inv(1998-02-12)/00_0 inv(1998-02-19)/00_0 inv(1998-02-26)/00_0 ) {code} Where ideally, we could've used a CombineFileInputFormat to get node locality for the merge queue inputs (viz BucketizedHiveInputSplit). This would be far better in generating splits in getting more out of short-circuit reads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4479) Child expressions are not being evaluated hierarchically in a few templates.
[ https://issues.apache.org/jira/browse/HIVE-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4479. Resolution: Fixed Fix Version/s: vectorization-branch Committed to branch. Thanks, Jitendra! Child expressions are not being evaluated hierarchically in a few templates. Key: HIVE-4479 URL: https://issues.apache.org/jira/browse/HIVE-4479 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4479.1.patch FilterColumnCompareColumn.txt, FilterStringColumnCompareScalar.txt and ScalarArithmeticColumn.txt are not evaluating the child expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4481) Vectorized row batch should be initialized with additional columns to hold intermediate output.
[ https://issues.apache.org/jira/browse/HIVE-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4481. Resolution: Fixed Fix Version/s: vectorization-branch Committed to branch. Thanks, Jitendra! Vectorized row batch should be initialized with additional columns to hold intermediate output. --- Key: HIVE-4481 URL: https://issues.apache.org/jira/browse/HIVE-4481 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4481.1.patch Vectorized row batch should be initialized with additional columns to hold intermediate output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4477) remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic
[ https://issues.apache.org/jira/browse/HIVE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4477: --- Resolution: Fixed Fix Version/s: vectorization-branch Status: Resolved (was: Patch Available) Committed to branch. Thanks, Eric! remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic -- Key: HIVE-4477 URL: https://issues.apache.org/jira/browse/HIVE-4477 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Fix For: vectorization-branch Attachments: HIVE-4477.1.patch same test got ported to 2 different files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4480) Implement partition support for vectorized query execution
[ https://issues.apache.org/jira/browse/HIVE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4480: --- Resolution: Fixed Fix Version/s: vectorization-branch Status: Resolved (was: Patch Available) Committed to branch. Thanks, Sarvesh! Implement partition support for vectorized query execution -- Key: HIVE-4480 URL: https://issues.apache.org/jira/browse/HIVE-4480 Project: Hive Issue Type: Sub-task Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga Fix For: vectorization-branch Attachments: Hive-4480.1.patch Add support for eager deserialization of row data using serde in the RecordReader layer. Also add support for partitions in this layer so that the vectorized batch is populated correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4489) beeline always return the same error message twice
Chaoyu Tang created HIVE-4489: - Summary: beeline always return the same error message twice Key: HIVE-4489 URL: https://issues.apache.org/jira/browse/HIVE-4489 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.10.0 Reporter: Chaoyu Tang Priority: Minor Fix For: 0.11.0 Beeline always returns the same error message twice. for example, if I try to create a table a2 which already exists, it prints out two exact same messages and it is not quite user friendly. {{{ beeline !connect jdbc:hive2://localhost:1 scott tiger org.apache.hive.jdbc.HiveDriver Connecting to jdbc:hive2://localhost:1 Connected to: Hive (version 0.10.0) Driver: Hive (version 0.10.0-cdh4.2.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:1 create table a2 (value int); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) }}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4489) beeline always return the same error message twice
[ https://issues.apache.org/jira/browse/HIVE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-4489: -- Description: Beeline always returns the same error message twice. for example, if I try to create a table a2 which already exists, it prints out two exact same messages and it is not quite user friendly. {code} beeline !connect jdbc:hive2://localhost:1 scott tiger org.apache.hive.jdbc.HiveDriver Connecting to jdbc:hive2://localhost:1 Connected to: Hive (version 0.10.0) Driver: Hive (version 0.10.0-cdh4.2.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:1 create table a2 (value int); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) {code} was: Beeline always returns the same error message twice. for example, if I try to create a table a2 which already exists, it prints out two exact same messages and it is not quite user friendly. {{{ beeline !connect jdbc:hive2://localhost:1 scott tiger org.apache.hive.jdbc.HiveDriver Connecting to jdbc:hive2://localhost:1 Connected to: Hive (version 0.10.0) Driver: Hive (version 0.10.0-cdh4.2.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:1 create table a2 (value int); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) }}} beeline always return the same error message twice -- Key: HIVE-4489 URL: https://issues.apache.org/jira/browse/HIVE-4489 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.10.0 Reporter: Chaoyu Tang Priority: Minor Labels: newbie Fix For: 0.11.0 Original Estimate: 0h Remaining Estimate: 0h Beeline always returns the same error message twice. for example, if I try to create a table a2 which already exists, it prints out two exact same messages and it is not quite user friendly. {code} beeline !connect jdbc:hive2://localhost:1 scott tiger org.apache.hive.jdbc.HiveDriver Connecting to jdbc:hive2://localhost:1 Connected to: Hive (version 0.10.0) Driver: Hive (version 0.10.0-cdh4.2.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:1 create table a2 (value int); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4489) beeline always return the same error message twice
[ https://issues.apache.org/jira/browse/HIVE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-4489: -- Fix Version/s: (was: 0.11.0) beeline always return the same error message twice -- Key: HIVE-4489 URL: https://issues.apache.org/jira/browse/HIVE-4489 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.10.0 Reporter: Chaoyu Tang Priority: Minor Labels: newbie Original Estimate: 0h Remaining Estimate: 0h Beeline always returns the same error message twice. for example, if I try to create a table a2 which already exists, it prints out two exact same messages and it is not quite user friendly. {code} beeline !connect jdbc:hive2://localhost:1 scott tiger org.apache.hive.jdbc.HiveDriver Connecting to jdbc:hive2://localhost:1 Connected to: Hive (version 0.10.0) Driver: Hive (version 0.10.0-cdh4.2.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:1 create table a2 (value int); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4489) beeline always return the same error message twice
[ https://issues.apache.org/jira/browse/HIVE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-4489: -- Attachment: HIVE-4489.patch removed duplicated error logging in the low level of exception catch block and only the top level catch block print out the error. beeline always return the same error message twice -- Key: HIVE-4489 URL: https://issues.apache.org/jira/browse/HIVE-4489 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.10.0 Reporter: Chaoyu Tang Priority: Minor Labels: newbie Attachments: HIVE-4489.patch Original Estimate: 0h Remaining Estimate: 0h Beeline always returns the same error message twice. for example, if I try to create a table a2 which already exists, it prints out two exact same messages and it is not quite user friendly. {code} beeline !connect jdbc:hive2://localhost:1 scott tiger org.apache.hive.jdbc.HiveDriver Connecting to jdbc:hive2://localhost:1 Connected to: Hive (version 0.10.0) Driver: Hive (version 0.10.0-cdh4.2.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:1 create table a2 (value int); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-4489: beeline always return the same error message twice
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10917/ --- Review request for hive. Description --- Beeline always returns the same error message twice -- because the error is logged out both in an exception catch block and its outer re-catch block. This addresses bug HIVE-4489. https://issues.apache.org/jira/browse/HIVE-4489 Diffs - beeline/src/java/org/apache/hive/beeline/Commands.java 8e2a52f Diff: https://reviews.apache.org/r/10917/diff/ Testing --- Have done the tests. Thanks, Chaoyu Tang
[jira] [Commented] (HIVE-4474) Column access not tracked properly for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648537#comment-13648537 ] Gang Tim Liu commented on HIVE-4474: Committed. thank Samuel Yuan Column access not tracked properly for partitioned tables - Key: HIVE-4474 URL: https://issues.apache.org/jira/browse/HIVE-4474 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Samuel Yuan Assignee: Samuel Yuan Attachments: HIVE-4474.1.patch.txt The columns recorded as being accessed is incorrect for partitioned tables. The index of accessed columns is a position in the list of non-partition columns, but a list of all columns is being used right now to do the lookup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3959: --- Attachment: (was: HIVE-3959.patch.9.txt) Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3959: --- Attachment: HIVE-3959.patch.12.txt Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4251) Indices can't be built on tables whose schema info comes from SerDe
[ https://issues.apache.org/jira/browse/HIVE-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648665#comment-13648665 ] Mark Wagner commented on HIVE-4251: --- Hi Steven, Indexing on the field of a record/struct isn't supported yet. That's also the case for other metadata like cluster, sort, and skew columns. I've been taking a look at that recently, and will open up a JIRA to discuss/track. I tried your second case and got the same issue as you. It seems to be an unrelated issue that is preventing group by using a struct as a key. These are both issues that affect all storage formats though, so we should discuss them in their own JIRAs. Can you confirm that you're able to create indices on top level primitive columns of Avro tables with this patch? Thanks, Mark Indices can't be built on tables whose schema info comes from SerDe --- Key: HIVE-4251 URL: https://issues.apache.org/jira/browse/HIVE-4251 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0, 0.10.1 Reporter: Mark Wagner Assignee: Mark Wagner Fix For: 0.11.0, 0.10.1 Attachments: HIVE-4251.1.patch, HIVE-4251.2.patch Building indices on tables who get the schema information from the deserializer (e.g. Avro backed tables) doesn't work because when the column is checked to exist, the correct API isn't used. {code} hive describe doctors; OK # col_namedata_type comment numberint from deserializer first_namestring from deserializer last_name string from deserializer Time taken: 0.215 seconds, Fetched: 5 row(s) hive create index doctors_index on table doctors(number) as 'compact' with deferred rebuild; FAILED: Error in metadata: java.lang.RuntimeException: Check the index columns, they should appear in the table being indexed. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3746) TRowSet resultset structure should be column-oriented
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648667#comment-13648667 ] Carl Steinbach commented on HIVE-3746: -- bq. If an application has requested a single row, and the client has requested n rows from the server in an effort to reduce round trips, then n-1 intervening values from the first column must be cached off somewhere before the first value for the second column can be accessed. If the fetch size is n, then the client is going to end up storing n rows in memory regardless of whether the result set is represented in a row-major or column-major format. Put another way, the unit of data transfer between the server and client is a variable sized resultset. The client has the option of setting the result size very low in order to achieve lower latency, or making it very large in order to get higher overall throughput. However, the key limitation is that the client is not able to provide access to any of the rows contained in a resultset until the entire resultset has been transferred from the server to the client. This limitation is a consequence of the fact that we're using a message oriented RPC layer (Thrift) to handle communication and data transfer between the client and server. TRowSet resultset structure should be column-oriented - Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: Server Infrastructure Reporter: Carl Steinbach Assignee: Carl Steinbach Labels: HiveServer2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4490) HS2 - 'select null ..' fails with NPE
Thejas M Nair created HIVE-4490: --- Summary: HS2 - 'select null ..' fails with NPE Key: HIVE-4490 URL: https://issues.apache.org/jira/browse/HIVE-4490 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Eg, from beeline {code} select null, i from t1 ; Error: Error running query: java.lang.NullPointerException (state=,code=0) Error: Error running query: java.lang.NullPointerException (state=,code=0) {code} In HS2 log org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NullPointerException at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:113) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:169) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:62) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:524) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:57) at $Proxy8.executeStatement(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:148) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:203) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4491) Grouping by a struct throws an exception
Mark Wagner created HIVE-4491: - Summary: Grouping by a struct throws an exception Key: HIVE-4491 URL: https://issues.apache.org/jira/browse/HIVE-4491 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Mark Wagner Assignee: Mark Wagner Queries that require a shuffle with a struct as the key result in an exception: {code}Caused by: java.lang.RuntimeException: Hash code on complex types not supported yet. at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:528) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:531) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:859) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118) ... 13 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4491) Grouping by a struct throws an exception
[ https://issues.apache.org/jira/browse/HIVE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner reassigned HIVE-4491: - Assignee: (was: Mark Wagner) Grouping by a struct throws an exception Key: HIVE-4491 URL: https://issues.apache.org/jira/browse/HIVE-4491 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Mark Wagner Queries that require a shuffle with a struct as the key result in an exception: {code}Caused by: java.lang.RuntimeException: Hash code on complex types not supported yet. at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:528) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:531) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:859) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118) ... 13 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4491) Grouping by a struct throws an exception
[ https://issues.apache.org/jira/browse/HIVE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner updated HIVE-4491: -- Attachment: demonstration.txt A full demonstration, using the table created in the create_struct_table.q test. Grouping by a struct throws an exception Key: HIVE-4491 URL: https://issues.apache.org/jira/browse/HIVE-4491 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Mark Wagner Attachments: demonstration.txt Queries that require a shuffle with a struct as the key result in an exception: {code}Caused by: java.lang.RuntimeException: Hash code on complex types not supported yet. at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:528) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:531) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:859) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118) ... 13 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4491) Grouping by a struct throws an exception
[ https://issues.apache.org/jira/browse/HIVE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner reassigned HIVE-4491: - Assignee: Mark Wagner Grouping by a struct throws an exception Key: HIVE-4491 URL: https://issues.apache.org/jira/browse/HIVE-4491 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Mark Wagner Assignee: Mark Wagner Attachments: demonstration.txt Queries that require a shuffle with a struct as the key result in an exception: {code}Caused by: java.lang.RuntimeException: Hash code on complex types not supported yet. at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:528) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:531) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:859) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118) ... 13 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4491) Grouping by a struct throws an exception
[ https://issues.apache.org/jira/browse/HIVE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner resolved HIVE-4491. --- Resolution: Duplicate My mistake. This is a duplicate of HIVE-2517 Grouping by a struct throws an exception Key: HIVE-4491 URL: https://issues.apache.org/jira/browse/HIVE-4491 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Mark Wagner Assignee: Mark Wagner Attachments: demonstration.txt Queries that require a shuffle with a struct as the key result in an exception: {code}Caused by: java.lang.RuntimeException: Hash code on complex types not supported yet. at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:528) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:226) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:531) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:859) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1066) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1118) ... 13 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4490) HS2 - 'select null ..' fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648673#comment-13648673 ] Prasad Mujumdar commented on HIVE-4490: --- Looks like duplicate of HIVE-4172 HS2 - 'select null ..' fails with NPE - Key: HIVE-4490 URL: https://issues.apache.org/jira/browse/HIVE-4490 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Eg, from beeline {code} select null, i from t1 ; Error: Error running query: java.lang.NullPointerException (state=,code=0) Error: Error running query: java.lang.NullPointerException (state=,code=0) {code} In HS2 log org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NullPointerException at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:113) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:169) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:62) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:524) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:57) at $Proxy8.executeStatement(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:148) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:203) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4492) Revert HIVE-4322
Samuel Yuan created HIVE-4492: - Summary: Revert HIVE-4322 Key: HIVE-4492 URL: https://issues.apache.org/jira/browse/HIVE-4492 Project: Hive Issue Type: Bug Components: Metastore, Thrift API Reporter: Samuel Yuan Assignee: Samuel Yuan See HIVE-4432 and HIVE-4433. It's possible to work around these issues but a better solution is probably to roll back the fix and change the API to use a primitive type as the map key (in a backwards-compatible manner). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4433) Fix C++ Thrift bindings broken in HIVE-4322
[ https://issues.apache.org/jira/browse/HIVE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648683#comment-13648683 ] Samuel Yuan commented on HIVE-4433: --- I'm thinking it's possible to work around this by defining '' since it's present in the auto-generated header file. Given that other language bindings might also have been broken by HIVE-4322 though it's probably better to change the map key to a primitive type instead. I have filed HIVE-4492 to revert the original change. Fix C++ Thrift bindings broken in HIVE-4322 --- Key: HIVE-4433 URL: https://issues.apache.org/jira/browse/HIVE-4433 Project: Hive Issue Type: Bug Components: Metastore, Thrift API Affects Versions: 0.12.0 Reporter: Carl Steinbach Assignee: Samuel Yuan Priority: Blocker Fix For: 0.12.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4493) Implement filter for string column compared to string column
Eric Hanson created HIVE-4493: - Summary: Implement filter for string column compared to string column Key: HIVE-4493 URL: https://issues.apache.org/jira/browse/HIVE-4493 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4493) Implement vectorized filter for string column compared to string column
[ https://issues.apache.org/jira/browse/HIVE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4493: -- Summary: Implement vectorized filter for string column compared to string column (was: Implement filter for string column compared to string column) Implement vectorized filter for string column compared to string column --- Key: HIVE-4493 URL: https://issues.apache.org/jira/browse/HIVE-4493 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4494) ORC map columns get class cast exception in some context
Owen O'Malley created HIVE-4494: --- Summary: ORC map columns get class cast exception in some context Key: HIVE-4494 URL: https://issues.apache.org/jira/browse/HIVE-4494 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Setting up the test case like: {quote} create table map_text ( name string, m mapstring,string ) row format delimited fields terminated by '|' collection items terminated by ',' map keys terminated by ':'; create table map_orc ( name string, m mapstring,string ) stored as orc; cat map.txt name1|key11:value11,key12:value12,key13:value13 name2|key21:value21,key22:value22,key23:value23 name3|key31:value31,key32:value32,key33:value33 load data local inpath 'map.txt' into table map_text; insert overwrite table map_orc select * from map_text; {quote} Selecting the name column from orc_map will get the following exception: {quote} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 22 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:522) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:90) ... 22 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcMapObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:307) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138) at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:270) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:482) ... 23 more {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4495) Implement vectorized string substr
Timothy Chen created HIVE-4495: -- Summary: Implement vectorized string substr Key: HIVE-4495 URL: https://issues.apache.org/jira/browse/HIVE-4495 Project: Hive Issue Type: Sub-task Reporter: Timothy Chen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4495) Implement vectorized string substr
[ https://issues.apache.org/jira/browse/HIVE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4495: -- Assignee: Eric Hanson Implement vectorized string substr -- Key: HIVE-4495 URL: https://issues.apache.org/jira/browse/HIVE-4495 Project: Hive Issue Type: Sub-task Reporter: Timothy Chen Assignee: Eric Hanson -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4194) JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL
[ https://issues.apache.org/jira/browse/HIVE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648844#comment-13648844 ] Richard Ding commented on HIVE-4194: [~cwsteinbach] Thejas is right about acceptsURL as part of the java.sql.Driver interface. I also prefer the simple change to fix this simple issue, and leave the package visibility changes to another JIRA. What do you think? JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL -- Key: HIVE-4194 URL: https://issues.apache.org/jira/browse/HIVE-4194 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.11.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.11.0 Attachments: HIVE-4194.patch As per JDBC 3.0 Spec (section 9.2) If the Driver implementation understands the URL, it will return a Connection object; otherwise it returns null Currently HiveConnection constructor will throw IllegalArgumentException if url string doesn't start with jdbc:hive2. This exception should be caught by HiveDriver.connect and return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4489) beeline always return the same error message twice
[ https://issues.apache.org/jira/browse/HIVE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-4489: -- Attachment: (was: HIVE-4489.patch) beeline always return the same error message twice -- Key: HIVE-4489 URL: https://issues.apache.org/jira/browse/HIVE-4489 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.10.0 Reporter: Chaoyu Tang Priority: Minor Labels: newbie Attachments: HIVE-4489.patch Original Estimate: 0h Remaining Estimate: 0h Beeline always returns the same error message twice. for example, if I try to create a table a2 which already exists, it prints out two exact same messages and it is not quite user friendly. {code} beeline !connect jdbc:hive2://localhost:1 scott tiger org.apache.hive.jdbc.HiveDriver Connecting to jdbc:hive2://localhost:1 Connected to: Hive (version 0.10.0) Driver: Hive (version 0.10.0-cdh4.2.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:1 create table a2 (value int); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4489) beeline always return the same error message twice
[ https://issues.apache.org/jira/browse/HIVE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-4489: -- Attachment: HIVE-4489.patch beeline always return the same error message twice -- Key: HIVE-4489 URL: https://issues.apache.org/jira/browse/HIVE-4489 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.10.0 Reporter: Chaoyu Tang Priority: Minor Labels: newbie Attachments: HIVE-4489.patch Original Estimate: 0h Remaining Estimate: 0h Beeline always returns the same error message twice. for example, if I try to create a table a2 which already exists, it prints out two exact same messages and it is not quite user friendly. {code} beeline !connect jdbc:hive2://localhost:1 scott tiger org.apache.hive.jdbc.HiveDriver Connecting to jdbc:hive2://localhost:1 Connected to: Hive (version 0.10.0) Driver: Hive (version 0.10.0-cdh4.2.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:1 create table a2 (value int); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4492) Revert HIVE-4322
[ https://issues.apache.org/jira/browse/HIVE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel Yuan updated HIVE-4492: -- Attachment: HIVE-4492.1.patch.txt Revert HIVE-4322 Key: HIVE-4492 URL: https://issues.apache.org/jira/browse/HIVE-4492 Project: Hive Issue Type: Bug Components: Metastore, Thrift API Reporter: Samuel Yuan Assignee: Samuel Yuan Attachments: HIVE-4492.1.patch.txt See HIVE-4432 and HIVE-4433. It's possible to work around these issues but a better solution is probably to roll back the fix and change the API to use a primitive type as the map key (in a backwards-compatible manner). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4194) JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL
[ https://issues.apache.org/jira/browse/HIVE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648869#comment-13648869 ] Carl Steinbach commented on HIVE-4194: -- Sounds good to me. +1. I don't have access to a build farm so I'll leave the testing and commit work to someone else. JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL -- Key: HIVE-4194 URL: https://issues.apache.org/jira/browse/HIVE-4194 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.11.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.11.0 Attachments: HIVE-4194.patch As per JDBC 3.0 Spec (section 9.2) If the Driver implementation understands the URL, it will return a Connection object; otherwise it returns null Currently HiveConnection constructor will throw IllegalArgumentException if url string doesn't start with jdbc:hive2. This exception should be caught by HiveDriver.connect and return null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-3959 started by Gang Tim Liu. Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3959: --- Status: Patch Available (was: In Progress) Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4496) JDBC2 won't compile with JDK7
Chris Drome created HIVE-4496: - Summary: JDBC2 won't compile with JDK7 Key: HIVE-4496 URL: https://issues.apache.org/jira/browse/HIVE-4496 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.12.0 Reporter: Chris Drome Assignee: Chris Drome HiveServer2 related JDBC does not compile with JDK7. Related to HIVE-3384. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3384) HIVE JDBC module won't compile under JDK1.7 as new methods added in JDBC specification
[ https://issues.apache.org/jira/browse/HIVE-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648913#comment-13648913 ] Chris Drome commented on HIVE-3384: --- The error is not related to this patch. Rather it is associated with new code added in 0.11. Please refer to HIVE-4496. HIVE JDBC module won't compile under JDK1.7 as new methods added in JDBC specification -- Key: HIVE-3384 URL: https://issues.apache.org/jira/browse/HIVE-3384 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.10.0 Reporter: Weidong Bian Assignee: Chris Drome Priority: Minor Fix For: 0.11.0 Attachments: D6873-0.9.1.patch, D6873.1.patch, D6873.2.patch, D6873.3.patch, D6873.4.patch, D6873.5.patch, D6873.6.patch, D6873.7.patch, HIVE-3384-0.10.patch, HIVE-3384-2012-12-02.patch, HIVE-3384-2012-12-04.patch, HIVE-3384.2.patch, HIVE-3384-branch-0.9.patch, HIVE-3384.patch, HIVE-JDK7-JDBC.patch jdbc module couldn't be compiled with jdk7 as it adds some abstract method in the JDBC specification some error info: error: HiveCallableStatement is not abstract and does not override abstract method TgetObject(String,ClassT) in CallableStatement . . . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4497) beeline module tests don't get run by default
Thejas M Nair created HIVE-4497: --- Summary: beeline module tests don't get run by default Key: HIVE-4497 URL: https://issues.apache.org/jira/browse/HIVE-4497 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair beeline tests are not getting run by default . See https://builds.apache.org/job/Hive-trunk-h0.21/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4497) beeline module tests don't get run by default
[ https://issues.apache.org/jira/browse/HIVE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4497: Attachment: HIVE-4497.1.patch HIVE-4497.1.patch - adds beeline to iterate.hive.tests in build.properties beeline module tests don't get run by default - Key: HIVE-4497 URL: https://issues.apache.org/jira/browse/HIVE-4497 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4497.1.patch beeline tests are not getting run by default . See https://builds.apache.org/job/Hive-trunk-h0.21/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4496) JDBC2 won't compile with JDK7
[ https://issues.apache.org/jira/browse/HIVE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Drome updated HIVE-4496: -- Attachment: HIVE-4496.patch Attached trunk patch. JDBC2 won't compile with JDK7 - Key: HIVE-4496 URL: https://issues.apache.org/jira/browse/HIVE-4496 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.12.0 Reporter: Chris Drome Assignee: Chris Drome Attachments: HIVE-4496.patch HiveServer2 related JDBC does not compile with JDK7. Related to HIVE-3384. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4496) JDBC2 won't compile with JDK7
[ https://issues.apache.org/jira/browse/HIVE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648953#comment-13648953 ] Chris Drome commented on HIVE-4496: --- Phabricator ticket: https://reviews.facebook.net/D10647 JDBC2 won't compile with JDK7 - Key: HIVE-4496 URL: https://issues.apache.org/jira/browse/HIVE-4496 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.12.0 Reporter: Chris Drome Assignee: Chris Drome Attachments: HIVE-4496.patch HiveServer2 related JDBC does not compile with JDK7. Related to HIVE-3384. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4496) JDBC2 won't compile with JDK7
[ https://issues.apache.org/jira/browse/HIVE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Drome updated HIVE-4496: -- Fix Version/s: 0.12.0 Status: Patch Available (was: Open) Ported the HIVE-3384 patch to the HS2 JDBC code. JDBC2 won't compile with JDK7 - Key: HIVE-4496 URL: https://issues.apache.org/jira/browse/HIVE-4496 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.12.0 Reporter: Chris Drome Assignee: Chris Drome Fix For: 0.12.0 Attachments: HIVE-4496.patch HiveServer2 related JDBC does not compile with JDK7. Related to HIVE-3384. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4498) TestBeeLineWithArgs.testPositiveScriptFile fails
Thejas M Nair created HIVE-4498: --- Summary: TestBeeLineWithArgs.testPositiveScriptFile fails Key: HIVE-4498 URL: https://issues.apache.org/jira/browse/HIVE-4498 Project: Hive Issue Type: Bug Components: CLI, HiveServer2, JDBC Reporter: Thejas M Nair TestBeeLineWithArgs.testPositiveScriptFile fails - {code} [junit] 0: jdbc:hive2://localhost:1 STARTED testBreakOnErrorScriptFile [junit] Output: Connecting to jdbc:hive2://localhost:1 [junit] Connected to: Hive (version 0.12.0-SNAPSHOT) [junit] Driver: Hive (version 0.12.0-SNAPSHOT) [junit] Transaction isolation: TRANSACTION_REPEATABLE_READ [junit] Beeline version 0.12.0-SNAPSHOT by Apache Hive [junit] ++ [junit] | database_name | [junit] ++ [junit] ++ [junit] No rows selected (0.899 seconds) [junit] Closing: org.apache.hive.jdbc.HiveConnection [junit] [junit] FAILED testPositiveScriptFile (ERROR) (2s) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4497) beeline module tests don't get run by default
[ https://issues.apache.org/jira/browse/HIVE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4497: Status: Patch Available (was: Open) beeline module tests don't get run by default - Key: HIVE-4497 URL: https://issues.apache.org/jira/browse/HIVE-4497 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4497.1.patch beeline tests are not getting run by default . See https://builds.apache.org/job/Hive-trunk-h0.21/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4497) beeline module tests don't get run by default
[ https://issues.apache.org/jira/browse/HIVE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648965#comment-13648965 ] Carl Steinbach commented on HIVE-4497: -- This is a duplicate of HIVE-4357 (which I never got around to testing), but that patch positions beeline after ql, and it makes more sense to run it after jdbc as is done here. +1 (someone else needs to test and commit since I don't have a build farm). beeline module tests don't get run by default - Key: HIVE-4497 URL: https://issues.apache.org/jira/browse/HIVE-4497 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4497.1.patch beeline tests are not getting run by default . See https://builds.apache.org/job/Hive-trunk-h0.21/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4485) beeline prints null as empty strings
[ https://issues.apache.org/jira/browse/HIVE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4485: Attachment: HIVE-4485.2.patch Making null string configurable is probably over engineering at this point. HIVE-4485.2.patch - Simpler patch that does not make it configurable. beeline prints null as empty strings Key: HIVE-4485 URL: https://issues.apache.org/jira/browse/HIVE-4485 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4485.1.patch, HIVE-4485.2.patch beeline is printing nulls as emtpy strings. This is inconsistent with hive cli and other databases, they print null as NULL string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4435) Column stats: Distinct value estimator should use hash functions that are pairwise independent
[ https://issues.apache.org/jira/browse/HIVE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648987#comment-13648987 ] Shreepadma Venugopalan commented on HIVE-4435: -- Can a committer take a look at this? Column stats: Distinct value estimator should use hash functions that are pairwise independent -- Key: HIVE-4435 URL: https://issues.apache.org/jira/browse/HIVE-4435 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: chart_1(1).png, HIVE-4435.1.patch The current implementation of Flajolet-Martin estimator to estimate the number of distinct values doesn't use hash functions that are pairwise independent. This is problematic because the input values don't distribute uniformly. When run on large TPC-H data sets, this leads to a huge discrepancy for primary key columns. Primary key columns are typically a monotonically increasing sequence. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4497) beeline module tests don't get run by default
[ https://issues.apache.org/jira/browse/HIVE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Weltman updated HIVE-4497: -- Resolution: Duplicate Status: Resolved (was: Patch Available) Patch available at https://issues.apache.org/jira/browse/HIVE-4357 beeline module tests don't get run by default - Key: HIVE-4497 URL: https://issues.apache.org/jira/browse/HIVE-4497 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4497.1.patch beeline tests are not getting run by default . See https://builds.apache.org/job/Hive-trunk-h0.21/lastCompletedBuild/testReport/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira