[jira] [Created] (HIVE-4475) Switch RCFile default to LazyBinaryColumnarSerDe
Gunther Hagleitner created HIVE-4475: Summary: Switch RCFile default to LazyBinaryColumnarSerDe Key: HIVE-4475 URL: https://issues.apache.org/jira/browse/HIVE-4475 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner For most workloads it seems LazyBinaryColumnarSerDe (binary) will perform better than ColumnarSerDe (text). Not sure why ColumnarSerDe is the default, but my guess is, that's for historical reasons. I suggest switching the default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4440) SMB Operator spills to disk like it's 1999
[ https://issues.apache.org/jira/browse/HIVE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4440: - Attachment: HIVE-4440.2.patch SMB Operator spills to disk like it's 1999 -- Key: HIVE-4440 URL: https://issues.apache.org/jira/browse/HIVE-4440 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4440.1.patch, HIVE-4440.2.patch I was recently looking into some performance issue with a query that used SMB join and was running really slow. Turns out that the SMB join by default caches only 100 values per key before spilling to disk. That seems overly conservative to me. Changing the parameter resulted in a ~5x speedup - quite significant. The parameter is: hive.mapjoin.bucket.cache.size Which right now is only used the SMB Operator as far as I can tell. The parameter was introduced originally (3 yrs ago) for the map join operator (looks like pre-SMB) and set to 100 to avoid OOM. That seems to have been in a different context though where you had to avoid running out of memory with the cached hash table in the same process, I think. Two things I'd like to propose: a) Rename it to what it does: hive.smbjoin.cache.rows b) Set it to something less restrictive: 1 If you string together a 5 table smb join with a map join and a map-side group by aggregation you might still run out of memory, but the renamed parameter should be easier to find and reduce. For most queries, I would think that 1 is still a reasonable number to cache (On the reduce side we use 25000 for shuffle joins). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4440) SMB Operator spills to disk like it's 1999
[ https://issues.apache.org/jira/browse/HIVE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647395#comment-13647395 ] Gunther Hagleitner commented on HIVE-4440: -- Thanks :-) Patch .2 honors the old parameter unless it's at the default in which case it uses the new one. I also put documentation around it. You bring up a good point, but are you sure it's necessary to support both in this case though? It's just slightly ugly in the code and requires us to move in again to remove later. My thinking is this: If you use the old parameter, it's probably because you needed to up it to get better performance - in this case the new default should most likely be ok for you. Do you think there's going to be cases where this falls flat? SMB Operator spills to disk like it's 1999 -- Key: HIVE-4440 URL: https://issues.apache.org/jira/browse/HIVE-4440 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4440.1.patch, HIVE-4440.2.patch I was recently looking into some performance issue with a query that used SMB join and was running really slow. Turns out that the SMB join by default caches only 100 values per key before spilling to disk. That seems overly conservative to me. Changing the parameter resulted in a ~5x speedup - quite significant. The parameter is: hive.mapjoin.bucket.cache.size Which right now is only used the SMB Operator as far as I can tell. The parameter was introduced originally (3 yrs ago) for the map join operator (looks like pre-SMB) and set to 100 to avoid OOM. That seems to have been in a different context though where you had to avoid running out of memory with the cached hash table in the same process, I think. Two things I'd like to propose: a) Rename it to what it does: hive.smbjoin.cache.rows b) Set it to something less restrictive: 1 If you string together a 5 table smb join with a map join and a map-side group by aggregation you might still run out of memory, but the renamed parameter should be easier to find and reduce. For most queries, I would think that 1 is still a reasonable number to cache (On the reduce side we use 25000 for shuffle joins). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-335) External Tables should have the option to be marked Read Only
[ https://issues.apache.org/jira/browse/HIVE-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647542#comment-13647542 ] Michael Koehnlein commented on HIVE-335: This would be useful for me, too. We have data on HDFS that belongs to a system user account, and our normal users should be able analyze it as an external table. As it is now, the users would need HDFS write permissions on the data directory if they want to create an external table for that directory themselves, although they really only need read permissions. Of course that's not a big obstacle, since we can just let the system user create the external table. It certainly would be nice to get pure read access via external tables, though. External Tables should have the option to be marked Read Only - Key: HIVE-335 URL: https://issues.apache.org/jira/browse/HIVE-335 Project: Hive Issue Type: Improvement Components: Metastore, Query Processor Reporter: Richard Lee When creating an External Table, it'd be awesome to have the option of NOT allowing writes to it (disallow any INSERTs or if hive ever allows UPDATEs). Adding and Dropping Partitions should still be allowed. This will enable hive to play well with external data stores other than hdfs where data should be non-maleable. I'd recomend the following syntax, which applies ONLY to external tables: CREATE EXTERNAL [READONLY] TABLE ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4471) Build fails with hcatalog checkstyle error
[ https://issues.apache.org/jira/browse/HIVE-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647646#comment-13647646 ] Ashutosh Chauhan commented on HIVE-4471: +1. [~traviscrawford] would you like to take a look? Build fails with hcatalog checkstyle error -- Key: HIVE-4471 URL: https://issues.apache.org/jira/browse/HIVE-4471 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4471.1.patch, HIVE-4471.2.patch This is the output: checkstyle: [echo] hcatalog [checkstyle] Running Checkstyle 5.5 on 412 files [checkstyle] /home/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/hcatalog/src/test/.gitignore:1: Missing a header - not enough lines in file. BUILD FAILED /home/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/build.xml:296: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/build.xml:298: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/hcatalog/build.xml:109: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/hcatalog/build-support/ant/checkstyle.xml:32: Got 1 errors and 0 warnings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4421) Improve memory usage by ORC dictionaries
[ https://issues.apache.org/jira/browse/HIVE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647660#comment-13647660 ] Phabricator commented on HIVE-4421: --- ashutoshc has accepted the revision HIVE-4421 [jira] Improve memory usage by ORC dictionaries. +1 will commit if tests pass. REVISION DETAIL https://reviews.facebook.net/D10545 BRANCH h-4421 ARCANIST PROJECT hive To: JIRA, ashutoshc, omalley Improve memory usage by ORC dictionaries Key: HIVE-4421 URL: https://issues.apache.org/jira/browse/HIVE-4421 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.11.0 Attachments: HIVE-4421.D10545.1.patch, HIVE-4421.D10545.2.patch, HIVE-4421.D10545.3.patch, HIVE-4421.D10545.4.patch Currently, for tables with many string columns, it is possible to significantly underestimate the memory used by the ORC dictionaries and cause the query to run out of memory in the task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4455) HCatalog build directories get included in tar file produced by ant tar
[ https://issues.apache.org/jira/browse/HIVE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4455: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed trunk version as well. Thanks, Alan! HCatalog build directories get included in tar file produced by ant tar - Key: HIVE-4455 URL: https://issues.apache.org/jira/browse/HIVE-4455 Project: Hive Issue Type: Bug Components: Build Infrastructure, HCatalog Affects Versions: 0.11.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.11.0 Attachments: buildbloat.patch, HIVE-4455.patch, HIVE-4455-trunk.patch The excludes in the tar target aren't properly excluding the build directories in HCatalog -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4461) hcatalog jars not getting published to maven repo
[ https://issues.apache.org/jira/browse/HIVE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4461: --- Resolution: Fixed Fix Version/s: 0.11.0 Status: Resolved (was: Patch Available) Marking this as resolved, as per Alan's comments. hcatalog jars not getting published to maven repo - Key: HIVE-4461 URL: https://issues.apache.org/jira/browse/HIVE-4461 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Ashutosh Chauhan Assignee: Alan Gates Fix For: 0.11.0 Attachments: HIVE-4461.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4392) Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns
[ https://issues.apache.org/jira/browse/HIVE-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647690#comment-13647690 ] Ashutosh Chauhan commented on HIVE-4392: Ok. Lets go ahead with this patch than. [~navis] Do you want to update the patch with these tests or shall I go ahead with testing it for commit? Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns -- Key: HIVE-4392 URL: https://issues.apache.org/jira/browse/HIVE-4392 Project: Hive Issue Type: Bug Components: Query Processor Environment: Apache Hadoop 0.20.1 Apache Hive Trunk Reporter: caofangkun Assignee: Navis Priority: Minor Attachments: HIVE-4392.D10431.1.patch, HIVE-4392.D10431.2.patch, HIVE-4392.D10431.3.patch, HIVE-4392.D10431.4.patch For Example: hive (default) create table liza_1 as select *, sum(key), sum(value) from new_src; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201304191025_0003, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0003 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill job_201304191025_0003 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1 2013-04-22 11:09:28,017 Stage-1 map = 0%, reduce = 0% 2013-04-22 11:09:34,054 Stage-1 map = 0%, reduce = 100% 2013-04-22 11:09:37,074 Stage-1 map = 100%, reduce = 100% Ended Job = job_201304191025_0003 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a valid object name) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask MapReduce Jobs Launched: Job 0: Reduce: 1 HDFS Read: 0 HDFS Write: 12 SUCCESS Total MapReduce CPU Time Spent: 0 msec hive (default) create table liza_1 as select *, sum(key), sum(value) from new_src group by key, value; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201304191025_0004, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0004 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill job_201304191025_0004 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1 2013-04-22 11:11:58,945 Stage-1 map = 0%, reduce = 0% 2013-04-22 11:12:01,964 Stage-1 map = 0%, reduce = 100% 2013-04-22 11:12:04,982 Stage-1 map = 100%, reduce = 100% Ended Job = job_201304191025_0004 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a valid object name) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask MapReduce Jobs Launched: Job 0: Reduce: 1 HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec But the following tow Queries work: hive (default) create table liza_1 as select * from new_src; Total MapReduce jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201304191025_0006, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0006 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill job_201304191025_0006 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2013-04-22 11:15:00,681 Stage-1 map = 0%, reduce = 0% 2013-04-22 11:15:03,697 Stage-1 map = 100%, reduce = 100% Ended Job = job_201304191025_0006 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive-scratchdir/hive_2013-04-22_11-14-54_632_6709035018023861094/-ext-10001 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 Table default.liza_1 stats:
[jira] [Resolved] (HIVE-4182) doAS does not work with HiveServer2 in non-kerberos mode with local job
[ https://issues.apache.org/jira/browse/HIVE-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-4182. Resolution: Fixed Fix Version/s: 0.11.0 Fixed via HIVE-4315 doAS does not work with HiveServer2 in non-kerberos mode with local job --- Key: HIVE-4182 URL: https://issues.apache.org/jira/browse/HIVE-4182 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Labels: HiveServer2 Fix For: 0.11.0 Attachments: HIVE-4182.1.patch When HiveServer2 is configured without kerberos security enabled, and the query gets launched as a local map-reduce job, the job runs as the user hive server is running as , instead of the user who submitted the query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4476) HiveMetaStore caches the creation of a default db in a static way
Brock Noland created HIVE-4476: -- Summary: HiveMetaStore caches the creation of a default db in a static way Key: HIVE-4476 URL: https://issues.apache.org/jira/browse/HIVE-4476 Project: Hive Issue Type: Improvement Affects Versions: 0.10.0, 0.11.0 Reporter: Brock Noland Priority: Minor Currently HiveMetaStore.HMSHandler has a static flag set to true if the JVM has ever created a default db: https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L176 However, when testing it's nice to be able to create multiple HiveMetastore instances in a single JVM. Perhaps we should add a flag hive.metastore.always.create.default.db or something similar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4476) HiveMetaStore caches the creation of a default db in a static way
[ https://issues.apache.org/jira/browse/HIVE-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647744#comment-13647744 ] Brock Noland commented on HIVE-4476: perhaps the use of checkForDefaultDb in that class just needs to be modified. HiveMetaStore caches the creation of a default db in a static way - Key: HIVE-4476 URL: https://issues.apache.org/jira/browse/HIVE-4476 Project: Hive Issue Type: Improvement Affects Versions: 0.10.0, 0.11.0 Reporter: Brock Noland Priority: Minor Currently HiveMetaStore.HMSHandler has a static flag set to true if the JVM has ever created a default db: https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L176 However, when testing it's nice to be able to create multiple HiveMetastore instances in a single JVM. Perhaps we should add a flag hive.metastore.always.create.default.db or something similar. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4474) Column access not tracked properly for partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647763#comment-13647763 ] Gang Tim Liu commented on HIVE-4474: running test. Column access not tracked properly for partitioned tables - Key: HIVE-4474 URL: https://issues.apache.org/jira/browse/HIVE-4474 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Samuel Yuan Assignee: Samuel Yuan Attachments: HIVE-4474.1.patch.txt The columns recorded as being accessed is incorrect for partitioned tables. The index of accessed columns is a position in the list of non-partition columns, but a list of all columns is being used right now to do the lookup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4477) remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic
Eric Hanson created HIVE-4477: - Summary: remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic Key: HIVE-4477 URL: https://issues.apache.org/jira/browse/HIVE-4477 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson same test got ported to 2 different files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4448) Fix metastore warehouse incorrect location on Windows in unit tests
[ https://issues.apache.org/jira/browse/HIVE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-4448: - Summary: Fix metastore warehouse incorrect location on Windows in unit tests (was: Fix metastore warehouse incorrect path on Windows in unit tests) Fix metastore warehouse incorrect location on Windows in unit tests --- Key: HIVE-4448 URL: https://issues.apache.org/jira/browse/HIVE-4448 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 0.11.0 Environment: Windows Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-4448.1.patch Unit test cases which not using QTestUtil will pass incompatible Windows path of METASTOREWAREHOUSE to HiveConf which result in creating the /test/data/warehouse folder in the wrong location in Windows. This folder will not be deleted at the beginning of the unit test and the content will cause failure of unit tests if run the same test case repeatedly. The root cause of this problem is for path like this pfile://C:\hive\build\ql/test/data/warehouse, the C:\hive\build\ part will be parsed as authority of the path and removed from the path string. The patch will fix this problem and make the unit test result consistent between Windows and Linux. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4477) remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic
[ https://issues.apache.org/jira/browse/HIVE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4477: -- Attachment: HIVE-4477.1.patch remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic -- Key: HIVE-4477 URL: https://issues.apache.org/jira/browse/HIVE-4477 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-4477.1.patch same test got ported to 2 different files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3959: --- Attachment: HIVE-3959.patch.11.txt Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.2, HIVE-3959.patch.9.txt When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10906/ --- Review request for hive. Description --- remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic This addresses bug HIVE-4477. https://issues.apache.org/jira/browse/HIVE-4477 Diffs - ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorFilterOperator.java 3ad6c7f Diff: https://reviews.apache.org/r/10906/diff/ Testing --- Thanks, Eric Hanson
[jira] [Updated] (HIVE-4477) remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic
[ https://issues.apache.org/jira/browse/HIVE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4477: -- Status: Patch Available (was: Open) remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic -- Key: HIVE-4477 URL: https://issues.apache.org/jira/browse/HIVE-4477 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-4477.1.patch same test got ported to 2 different files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4477) remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic
[ https://issues.apache.org/jira/browse/HIVE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647840#comment-13647840 ] Eric Hanson commented on HIVE-4477: --- Code review available at https://reviews.apache.org/r/10906/ remove redundant copy of arithmetic filter unit test testColOpScalarNumericFilterNullAndRepeatingLogic -- Key: HIVE-4477 URL: https://issues.apache.org/jira/browse/HIVE-4477 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-4477.1.patch same test got ported to 2 different files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4478) In ORC, add boolean noNulls flag to column stripe metadata
Eric Hanson created HIVE-4478: - Summary: In ORC, add boolean noNulls flag to column stripe metadata Key: HIVE-4478 URL: https://issues.apache.org/jira/browse/HIVE-4478 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Owen O'Malley Currently, the stripe metadata for ORC contains the min and max value for each column in the stripe. This will be used for stripe elimination. However, an additional bit of metadata, noNulls (true/false), is needed to help speed up vectorized query execution as much as 30%. The vectorized QE code has a Boolean flag for each column vector called noNulls. If this is true, all the null-checking logic is skipped. For simple filters and arithmetic expressions, this can save on the order of 30% of the time. Once this noNulls stripe metadata is available, the vectorized iterator for ORC can be updated to avoid all expense to load the isNull bitmap, and efficiently set the noNulls flag for each column vector. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4478) In ORC, add boolean noNulls flag to column stripe metadata
[ https://issues.apache.org/jira/browse/HIVE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4478: -- Description: Currently, the stripe metadata for ORC contains the min and max value for each column in the stripe. This will be used for stripe elimination. However, an additional bit of metadata for each column for each stripe, noNulls (true/false), is needed to help speed up vectorized query execution as much as 30%. The vectorized QE code has a Boolean flag for each column vector called noNulls. If this is true, all the null-checking logic is skipped for that column for a VectorizedRowBatch when an operation is performed on that column. For simple filters and arithmetic expressions, this can save on the order of 30% of the time. Once this noNulls stripe metadata is available, the vectorized iterator (reader) for ORC can be updated to avoid all expense to load the isNull bitmap, and efficiently set the noNulls flag for each column vector. was: Currently, the stripe metadata for ORC contains the min and max value for each column in the stripe. This will be used for stripe elimination. However, an additional bit of metadata, noNulls (true/false), is needed to help speed up vectorized query execution as much as 30%. The vectorized QE code has a Boolean flag for each column vector called noNulls. If this is true, all the null-checking logic is skipped. For simple filters and arithmetic expressions, this can save on the order of 30% of the time. Once this noNulls stripe metadata is available, the vectorized iterator for ORC can be updated to avoid all expense to load the isNull bitmap, and efficiently set the noNulls flag for each column vector. In ORC, add boolean noNulls flag to column stripe metadata -- Key: HIVE-4478 URL: https://issues.apache.org/jira/browse/HIVE-4478 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Owen O'Malley Currently, the stripe metadata for ORC contains the min and max value for each column in the stripe. This will be used for stripe elimination. However, an additional bit of metadata for each column for each stripe, noNulls (true/false), is needed to help speed up vectorized query execution as much as 30%. The vectorized QE code has a Boolean flag for each column vector called noNulls. If this is true, all the null-checking logic is skipped for that column for a VectorizedRowBatch when an operation is performed on that column. For simple filters and arithmetic expressions, this can save on the order of 30% of the time. Once this noNulls stripe metadata is available, the vectorized iterator (reader) for ORC can be updated to avoid all expense to load the isNull bitmap, and efficiently set the noNulls flag for each column vector. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4376) Document ORC file format in Hive wiki
[ https://issues.apache.org/jira/browse/HIVE-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647934#comment-13647934 ] Lefty Leverenz commented on HIVE-4376: -- Done. You can find the ORC wikidoc here: [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC]. It's in the [Language Manual|https://cwiki.apache.org/confluence/display/Hive/LanguageManual] under a stub for File Formats. Information about other file formats would also be helpful. Document ORC file format in Hive wiki - Key: HIVE-4376 URL: https://issues.apache.org/jira/browse/HIVE-4376 Project: Hive Issue Type: Bug Components: Documentation, Serializers/Deserializers Affects Versions: 0.11.0 Reporter: Lefty Leverenz Assignee: Lefty Leverenz Labels: wiki Add a wiki documenting the Optimized Row Columnar file format for Hive release 0.11 ([HIVE-3874|https://issues.apache.org/jira/browse/HIVE-3874]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Need to track docs for future releases
Now that all the Hive docs are in the wiki, we can't commit new documentation to trunk or branch. But we don't want to add docs to the wiki prematurely, so there's an increased likelihood that we'll lose track of some doc requirements for future releases. Does anyone know of a good way to ensure that no doc gets left behind? One possibility is to use labels on JIRAs that need future documentation. When HIVE-# gets committed with Fix in 0.12 and still needs docs, it would get a label such as doc-needed-v0.12 which can be used to find all the doc requirements at release time. That might be the simplest solution, although I see two problems: if the fix number gets changed, the label has to change too; and sometimes people enter a label that seems right to them but doesn't match exactly. Another possibility is to use JIRAs, either adding a child JIRA for each closed JIRA that still needs doc or using an umbrella JIRA for each upcoming release. An ideal solution would automatically spew out a list of JIRAS that need docs for a given release number, either on request or when the release happens. Is that technically possible? – Lefty
[jira] [Commented] (HIVE-4466) Fix continue.on.failure in unit tests to -well- continue on failure in unit tests
[ https://issues.apache.org/jira/browse/HIVE-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647991#comment-13647991 ] Ashutosh Chauhan commented on HIVE-4466: +1 will commit if tests pass. Fix continue.on.failure in unit tests to -well- continue on failure in unit tests - Key: HIVE-4466 URL: https://issues.apache.org/jira/browse/HIVE-4466 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-4466.1.patch continue.on.failure is no longer hooked up to anything in the build scripts. more importantly, the only choice right now is to continue through a module and then fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4479) Child expressions are not being evaluated hierarchically in a few templates.
Jitendra Nath Pandey created HIVE-4479: -- Summary: Child expressions are not being evaluated hierarchically in a few templates. Key: HIVE-4479 URL: https://issues.apache.org/jira/browse/HIVE-4479 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey FilterColumnCompareColumn.txt, FilterStringColumnCompareScalar.txt and ScalarArithmeticColumn.txt are not evaluating the child expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4479) Child expressions are not being evaluated hierarchically in a few templates.
[ https://issues.apache.org/jira/browse/HIVE-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4479: --- Attachment: HIVE-4479.1.patch Child expressions are not being evaluated hierarchically in a few templates. Key: HIVE-4479 URL: https://issues.apache.org/jira/browse/HIVE-4479 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4479.1.patch FilterColumnCompareColumn.txt, FilterStringColumnCompareScalar.txt and ScalarArithmeticColumn.txt are not evaluating the child expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4480) Implement partition support for vectorized query execution
Sarvesh Sakalanaga created HIVE-4480: Summary: Implement partition support for vectorized query execution Key: HIVE-4480 URL: https://issues.apache.org/jira/browse/HIVE-4480 Project: Hive Issue Type: Sub-task Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4480) Implement partition support for vectorized query execution
[ https://issues.apache.org/jira/browse/HIVE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarvesh Sakalanaga updated HIVE-4480: - Description: Add support for eager deserialization of row data using serde in the RecordReader layer. Also add support for partitions in this layer so that the vectorized batch is populated correctly. Implement partition support for vectorized query execution -- Key: HIVE-4480 URL: https://issues.apache.org/jira/browse/HIVE-4480 Project: Hive Issue Type: Sub-task Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga Add support for eager deserialization of row data using serde in the RecordReader layer. Also add support for partitions in this layer so that the vectorized batch is populated correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4454) Support partitioned tables in vectorized query execution.
[ https://issues.apache.org/jira/browse/HIVE-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey resolved HIVE-4454. Resolution: Duplicate Duplicate of HIVE-4480. Support partitioned tables in vectorized query execution. - Key: HIVE-4454 URL: https://issues.apache.org/jira/browse/HIVE-4454 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Partitioned tables are very common use case. Vectorized code path should support that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4481) Vectorized row batch should be initialized with additional columns to hold intermediate output.
Jitendra Nath Pandey created HIVE-4481: -- Summary: Vectorized row batch should be initialized with additional columns to hold intermediate output. Key: HIVE-4481 URL: https://issues.apache.org/jira/browse/HIVE-4481 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Vectorized row batch should be initialized with additional columns to hold intermediate output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4481) Vectorized row batch should be initialized with additional columns to hold intermediate output.
[ https://issues.apache.org/jira/browse/HIVE-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4481: --- Attachment: HIVE-4481.1.patch Vectorized row batch should be initialized with additional columns to hold intermediate output. --- Key: HIVE-4481 URL: https://issues.apache.org/jira/browse/HIVE-4481 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4481.1.patch Vectorized row batch should be initialized with additional columns to hold intermediate output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4479) Child expressions are not being evaluated hierarchically in a few templates.
[ https://issues.apache.org/jira/browse/HIVE-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648034#comment-13648034 ] Jitendra Nath Pandey commented on HIVE-4479: Review board entry: https://reviews.apache.org/r/10908/ Child expressions are not being evaluated hierarchically in a few templates. Key: HIVE-4479 URL: https://issues.apache.org/jira/browse/HIVE-4479 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4479.1.patch FilterColumnCompareColumn.txt, FilterStringColumnCompareScalar.txt and ScalarArithmeticColumn.txt are not evaluating the child expressions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4482) Template file VectorUDAFAvg.txt missing from public branch; CodeGen.java fails
Eric Hanson created HIVE-4482: - Summary: Template file VectorUDAFAvg.txt missing from public branch; CodeGen.java fails Key: HIVE-4482 URL: https://issues.apache.org/jira/browse/HIVE-4482 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Remus Rusanu In vectorization branch, file ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/VectorUDAFAvg.txt is missing. So CodeGen.java doesn't run to completion, because it references that file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4392) Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns
[ https://issues.apache.org/jira/browse/HIVE-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4392: -- Attachment: HIVE-4392.D10431.5.patch navis updated the revision HIVE-4392 [jira] Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns. Added tests Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D10431 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D10431?vs=33177id=33285#toc AFFECTED FILES metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/test/queries/clientpositive/ctas_colname.q ql/src/test/results/clientpositive/ctas_colname.q.out To: JIRA, ashutoshc, navis Cc: hbutani Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns -- Key: HIVE-4392 URL: https://issues.apache.org/jira/browse/HIVE-4392 Project: Hive Issue Type: Bug Components: Query Processor Environment: Apache Hadoop 0.20.1 Apache Hive Trunk Reporter: caofangkun Assignee: Navis Priority: Minor Attachments: HIVE-4392.D10431.1.patch, HIVE-4392.D10431.2.patch, HIVE-4392.D10431.3.patch, HIVE-4392.D10431.4.patch, HIVE-4392.D10431.5.patch For Example: hive (default) create table liza_1 as select *, sum(key), sum(value) from new_src; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201304191025_0003, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0003 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill job_201304191025_0003 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1 2013-04-22 11:09:28,017 Stage-1 map = 0%, reduce = 0% 2013-04-22 11:09:34,054 Stage-1 map = 0%, reduce = 100% 2013-04-22 11:09:37,074 Stage-1 map = 100%, reduce = 100% Ended Job = job_201304191025_0003 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a valid object name) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask MapReduce Jobs Launched: Job 0: Reduce: 1 HDFS Read: 0 HDFS Write: 12 SUCCESS Total MapReduce CPU Time Spent: 0 msec hive (default) create table liza_1 as select *, sum(key), sum(value) from new_src group by key, value; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201304191025_0004, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0004 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill job_201304191025_0004 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1 2013-04-22 11:11:58,945 Stage-1 map = 0%, reduce = 0% 2013-04-22 11:12:01,964 Stage-1 map = 0%, reduce = 100% 2013-04-22 11:12:04,982 Stage-1 map = 100%, reduce = 100% Ended Job = job_201304191025_0004 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a valid object name) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask MapReduce Jobs Launched: Job 0: Reduce: 1 HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec But the following tow Queries work: hive (default) create table liza_1 as select * from new_src; Total MapReduce jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201304191025_0006, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0006
[jira] [Commented] (HIVE-4392) Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns
[ https://issues.apache.org/jira/browse/HIVE-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648069#comment-13648069 ] Navis commented on HIVE-4392: - Added tests. Not changed aggregation columns. Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns -- Key: HIVE-4392 URL: https://issues.apache.org/jira/browse/HIVE-4392 Project: Hive Issue Type: Bug Components: Query Processor Environment: Apache Hadoop 0.20.1 Apache Hive Trunk Reporter: caofangkun Assignee: Navis Priority: Minor Attachments: HIVE-4392.D10431.1.patch, HIVE-4392.D10431.2.patch, HIVE-4392.D10431.3.patch, HIVE-4392.D10431.4.patch, HIVE-4392.D10431.5.patch For Example: hive (default) create table liza_1 as select *, sum(key), sum(value) from new_src; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201304191025_0003, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0003 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill job_201304191025_0003 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1 2013-04-22 11:09:28,017 Stage-1 map = 0%, reduce = 0% 2013-04-22 11:09:34,054 Stage-1 map = 0%, reduce = 100% 2013-04-22 11:09:37,074 Stage-1 map = 100%, reduce = 100% Ended Job = job_201304191025_0003 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a valid object name) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask MapReduce Jobs Launched: Job 0: Reduce: 1 HDFS Read: 0 HDFS Write: 12 SUCCESS Total MapReduce CPU Time Spent: 0 msec hive (default) create table liza_1 as select *, sum(key), sum(value) from new_src group by key, value; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Starting Job = job_201304191025_0004, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0004 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill job_201304191025_0004 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1 2013-04-22 11:11:58,945 Stage-1 map = 0%, reduce = 0% 2013-04-22 11:12:01,964 Stage-1 map = 0%, reduce = 100% 2013-04-22 11:12:04,982 Stage-1 map = 100%, reduce = 100% Ended Job = job_201304191025_0004 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a valid object name) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask MapReduce Jobs Launched: Job 0: Reduce: 1 HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec But the following tow Queries work: hive (default) create table liza_1 as select * from new_src; Total MapReduce jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201304191025_0006, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0006 Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job -kill job_201304191025_0006 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2013-04-22 11:15:00,681 Stage-1 map = 0%, reduce = 0% 2013-04-22 11:15:03,697 Stage-1 map = 100%, reduce = 100% Ended Job = job_201304191025_0006 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive-scratchdir/hive_2013-04-22_11-14-54_632_6709035018023861094/-ext-10001 Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1 Table default.liza_1 stats: [num_partitions: 0, num_files: 0, num_rows: 0, total_size: 0, raw_data_size: 0] MapReduce Jobs
[jira] [Updated] (HIVE-4462) Finish support for modulo (%) operator for vectorized arithmetic
[ https://issues.apache.org/jira/browse/HIVE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4462: -- Attachment: HIVE-4462.1.patch Finish support for modulo (%) operator for vectorized arithmetic Key: HIVE-4462 URL: https://issues.apache.org/jira/browse/HIVE-4462 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-4462.1.patch Support for vectorized modulo (%) is missing in CodeGen.java for several situations, e.g. most ColArithmeticScalar situations. This is to add modulo operator for all necessary situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: finish support for vectorized Modulo (%) operator
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10911/ --- Review request for hive. Description --- finish support for vectorized Modulo (%) operator This addresses bug HIVE-4462. https://issues.apache.org/jira/browse/HIVE-4462 Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/DoubleColModuloDoubleColumn.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/DoubleColModuloDoubleScalar.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/DoubleColModuloLongColumn.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/DoubleColModuloLongScalar.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/LongColModuloDoubleColumn.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/LongColModuloDoubleScalar.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/templates/CodeGen.java 9279101 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorScalarColArithmetic.java 7c8b9c3 Diff: https://reviews.apache.org/r/10911/diff/ Testing --- Thanks, Eric Hanson
[jira] [Commented] (HIVE-4462) Finish support for modulo (%) operator for vectorized arithmetic
[ https://issues.apache.org/jira/browse/HIVE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648083#comment-13648083 ] Eric Hanson commented on HIVE-4462: --- Code review available at https://reviews.apache.org/r/10911/ Finish support for modulo (%) operator for vectorized arithmetic Key: HIVE-4462 URL: https://issues.apache.org/jira/browse/HIVE-4462 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-4462.1.patch Support for vectorized modulo (%) is missing in CodeGen.java for several situations, e.g. most ColArithmeticScalar situations. This is to add modulo operator for all necessary situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4462) Finish support for modulo (%) operator for vectorized arithmetic
[ https://issues.apache.org/jira/browse/HIVE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-4462: -- Status: Patch Available (was: Open) Finish support for modulo (%) operator for vectorized arithmetic Key: HIVE-4462 URL: https://issues.apache.org/jira/browse/HIVE-4462 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-4462.1.patch Support for vectorized modulo (%) is missing in CodeGen.java for several situations, e.g. most ColArithmeticScalar situations. This is to add modulo operator for all necessary situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4480) Implement partition support for vectorized query execution
[ https://issues.apache.org/jira/browse/HIVE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarvesh Sakalanaga updated HIVE-4480: - Attachment: Hive-4480.1.patch Implement partition support for vectorized query execution -- Key: HIVE-4480 URL: https://issues.apache.org/jira/browse/HIVE-4480 Project: Hive Issue Type: Sub-task Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga Attachments: Hive-4480.1.patch Add support for eager deserialization of row data using serde in the RecordReader layer. Also add support for partitions in this layer so that the vectorized batch is populated correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4480) Implement partition support for vectorized query execution
[ https://issues.apache.org/jira/browse/HIVE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648105#comment-13648105 ] Sarvesh Sakalanaga commented on HIVE-4480: -- Patch uploaded Implement partition support for vectorized query execution -- Key: HIVE-4480 URL: https://issues.apache.org/jira/browse/HIVE-4480 Project: Hive Issue Type: Sub-task Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga Attachments: Hive-4480.1.patch Add support for eager deserialization of row data using serde in the RecordReader layer. Also add support for partitions in this layer so that the vectorized batch is populated correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4483) Input format to read vector data from RC
Sarvesh Sakalanaga created HIVE-4483: Summary: Input format to read vector data from RC Key: HIVE-4483 URL: https://issues.apache.org/jira/browse/HIVE-4483 Project: Hive Issue Type: Sub-task Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4484) Current hive is slower than previous versions
Navis created HIVE-4484: --- Summary: Current hive is slower than previous versions Key: HIVE-4484 URL: https://issues.apache.org/jira/browse/HIVE-4484 Project: Hive Issue Type: Task Environment: ubuntu 10.10, 4G, i7-8core Reporter: Navis Comparing logs for various patches, I've found query execution become slower than before. For example, (picked not-changed tests) {noformat} ppr_pushdown.q 135~140 sec : 2012-03-27 ~ 2012-07-17 140~160 sec : ~ 2012-11-28 160~220 sec : ~ 2013-03-30 220~250 src : ~ current (HIVE-4392) join_nulls.q 295~310 sec : 2012-03-27 ~ 2012-07-17 310~330 sec : ~ 2012-11-28 330~370 sec : ~ 2013-03-30 400~460 src : ~ current (HIVE-4392) {noformat} This explains much on recent prolonged test time. It might be from changes on test framework. But still need investigation before adding more functionality into hive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4480) Implement partition support for vectorized query execution
[ https://issues.apache.org/jira/browse/HIVE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarvesh Sakalanaga updated HIVE-4480: - Status: Patch Available (was: Open) Implement partition support for vectorized query execution -- Key: HIVE-4480 URL: https://issues.apache.org/jira/browse/HIVE-4480 Project: Hive Issue Type: Sub-task Reporter: Sarvesh Sakalanaga Assignee: Sarvesh Sakalanaga Attachments: Hive-4480.1.patch Add support for eager deserialization of row data using serde in the RecordReader layer. Also add support for partitions in this layer so that the vectorized batch is populated correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4485) beeline prints null as empty strings
Thejas M Nair created HIVE-4485: --- Summary: beeline prints null as empty strings Key: HIVE-4485 URL: https://issues.apache.org/jira/browse/HIVE-4485 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair beeline is printing nulls as emtpy strings. This is inconsistent with hive cli and other databases, they print null as NULL string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4485) beeline prints null as empty strings
[ https://issues.apache.org/jira/browse/HIVE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4485: Component/s: HiveServer2 beeline prints null as empty strings Key: HIVE-4485 URL: https://issues.apache.org/jira/browse/HIVE-4485 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Thejas M Nair Assignee: Thejas M Nair beeline is printing nulls as emtpy strings. This is inconsistent with hive cli and other databases, they print null as NULL string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4377) Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)
[ https://issues.apache.org/jira/browse/HIVE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4377: -- Attachment: HIVE-4377.D10377.2.patch navis updated the revision HIVE-4377 [jira] Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340). Added more comments Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D10377 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D10377?vs=32445id=33291#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out To: JIRA, navis Cc: njain Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340) -- Key: HIVE-4377 URL: https://issues.apache.org/jira/browse/HIVE-4377 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Gang Tim Liu Assignee: Navis Attachments: HIVE-4377.D10377.1.patch, HIVE-4377.D10377.2.patch thanks a lot for addressing optimization in HIVE-2340. Awesome! Since we are developing at a very fast pace, it would be really useful to think about maintainability and testing of the large codebase. Highlights which are applicable for D1209: 1. Javadoc for all public/private functions, except for setters/getters. For any complex function, clear examples (input/output) would really help. 2. Specially, for query optimizations, it might be a good idea to have a simple working query at the top, and the expected changes. For e.g.. The operator tree for that query at each step, or a detailed explanation at the top. 3. If possible, the test name (.q file) where the function is being invoked, or the query which would potentially test that scenario, if it is a query processor change. 4. Comments in each test (.q file) that should include the jira number, what is it trying to test. Assumptions about each query. 5. Reduce the output for each test whenever query is outputting more than 10 results, it should have a reason. Otherwise, each query result should be bounded by 10 rows. thanks a lot -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira