[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811172#comment-13811172 ] Hive QA commented on HIVE-3959: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12611523/HIVE-3959.6.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 4547 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/104/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/104/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Ashutosh Chauhan Priority: Minor Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.4.patch, HIVE-3959.5.patch, HIVE-3959.6.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811189#comment-13811189 ] Hive QA commented on HIVE-3959: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12611523/HIVE-3959.6.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 4547 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_smb_mapjoin_8 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/105/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/105/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Ashutosh Chauhan Priority: Minor Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.4.patch, HIVE-3959.5.patch, HIVE-3959.6.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810730#comment-13810730 ] Thejas M Nair commented on HIVE-3959: - Can you please rebase the patch (as HIVE-5610 changes are in) ? Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Ashutosh Chauhan Priority: Minor Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.4.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810886#comment-13810886 ] Thejas M Nair commented on HIVE-3959: - +1 Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Ashutosh Chauhan Priority: Minor Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.4.patch, HIVE-3959.5.patch, HIVE-3959.6.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811015#comment-13811015 ] Hive QA commented on HIVE-3959: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12611523/HIVE-3959.6.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 4547 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/95/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/95/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Ashutosh Chauhan Priority: Minor Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.4.patch, HIVE-3959.5.patch, HIVE-3959.6.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809854#comment-13809854 ] Thejas M Nair commented on HIVE-3959: - updated review board with comments Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Ashutosh Chauhan Priority: Minor Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805775#comment-13805775 ] Hive QA commented on HIVE-3959: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12610142/HIVE-3959.4.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 4481 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_like_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_star org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_serde {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1237/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1237/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Ashutosh Chauhan Priority: Minor Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803456#comment-13803456 ] Ashutosh Chauhan commented on HIVE-3959: [~gangtimliu] Would you like to review the rebased patch? Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Ashutosh Chauhan Priority: Minor Attachments: HIVE-3959.1.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13781438#comment-13781438 ] Gang Tim Liu commented on HIVE-3959: Yes,assign it to dilip Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Dilip Joseph Priority: Minor Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780918#comment-13780918 ] Ashutosh Chauhan commented on HIVE-3959: This is useful work. [~bmadhvani] / [~gangtimliu] Do you guys want to refresh this patch? Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Gang Tim Liu Priority: Minor Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2 When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620364#comment-13620364 ] Gang Tim Liu commented on HIVE-3959: rebase https://reviews.facebook.net/D9885 Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Gang Tim Liu Priority: Minor When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer
[ https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570910#comment-13570910 ] Bhushan Mandhani commented on HIVE-3959: Diff out at https://reviews.facebook.net/D8271 Still need some minor updates before I can submit the patch. Update Partition Statistics in Metastore Layer -- Key: HIVE-3959 URL: https://issues.apache.org/jira/browse/HIVE-3959 Project: Hive Issue Type: Improvement Components: Metastore, Statistics Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Priority: Minor When partitions are created using queries (insert overwrite and insert into) then the StatsTask updates all stats. However, when partitions are added directly through metadata-only partitions (either CLI or direct calls to Thrift Metastore) no stats are populated even if hive.stats.reliable is set to true. This puts us in a situation where we can't decide if stats are truly reliable or not. We propose that the fast stats (numFiles and totalSize) which don't require a scan of the data should always be populated and be completely reliable. For now we are still excluding rowCount and rawDataSize because that will make these operations very expensive. Currently they are quick metadata-only ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira