Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-04 Thread Sungwoo Park
Congratulations and huge thanks to Apache Hive team and contributors for releasing Hive 4. We have been watching the development of Hive 4 since the release of Hive 3.1, and it's truly satisfying to witness the resolution of all the critical issues at last after 5 years. Hive 4 comes with a lot of

Re: Release of Hive 4 and TPC-DS benchmark

2023-11-07 Thread Sungwoo Park
> > Based on HIVE-26654, it looks like we have 3 PR pending review: > 1. HIVE-26986 - Query 71 > 2. HIVE-27006 - Query 2 > 3. HIVE-27269 - Query 97 (is that ready to be reviewed?) > Yes, Seonggon just submitted a pull request for HIVE-27269. It is not a simple fix that I originally proposed - it i

Release of Hive 4 and TPC-DS benchmark

2023-11-03 Thread Sungwoo Park
Hi everyone, I would like to resume the discussion on the release of Hive 4 and the result of the TPC-DS benchmark. Currently there are four unresolved JIRAs marked 'hive-4.0.0-must' which must be resolved before the release of Hive 4 ([1], [2], [3], [4]). The most urgent one is perhaps HIVE

Re: Introduce Uniffle : A stability solution of Hive's shuffle

2023-09-29 Thread Sungwoo Park
In addition to the two main benefits summarized by Rory, I would like to add another benefit of using remote shuffle service: 3. If you run large jobs in public clouds, sometimes the amount of local storage attached to your instances can be a limiting factor. By using remote shuffle service, you c

Re: Move to JDK-11

2023-05-31 Thread Sungwoo Park
Hi, everyone. I have not tested the master branch with Java 11/17 yet, but I would like to share my experience with testing a fork of branch-3.1 with Java 11/17 (as part of developing Hive-MR3), in case that it can be useful for the discussion. I merged the patches listed in [1] HIVE-22415 and upd

Re: Question on hive.merge.nway.joins,

2023-05-26 Thread Sungwoo Park
optimization can miss a chance. Of course, I know > it can also positively work in some cases. > > Note that the version I used is a bit old, my memory could be wrong, and > again I am not sure about the concrete background of HIVE-21189. > > Thanks, > Okumin > > > On

Question on hive.merge.nway.joins,

2023-05-25 Thread Sungwoo Park
Hello, In HIVE-21189 [1], the default value for hive.merge.nway.joins is set to false. There is no record of why it was set to false, and I would like to understand the background for the decision. Specifically I wonder if the following situation is relevant to the decision. Example) MapJoinOp_1

Re: [DISCUSS] Nightly snaphot builds

2023-05-22 Thread Sungwoo Park
I think such nightly builds will be useful for testing and debugging in the future. I also wonder if we can somehow create builds even from previous commits (e.g., for the past few years). Such builds from previous commits don't have to be daily builds, and I think weekly builds (or even monthly b

Re: Request to join Hive slack channel

2023-05-18 Thread Sungwoo Park
I am sorry for spamming -- My email address is: glap...@gmail.com Thanks, --- Sungwoo Park On Fri, May 19, 2023 at 3:11 PM Sungwoo Park wrote: > If non-committers can join the slack channel, I would like to join, too. > An invitation will be appreciated very much (glapa...@gma

Re: Request to join Hive slack channel

2023-05-18 Thread Sungwoo Park
If non-committers can join the slack channel, I would like to join, too. An invitation will be appreciated very much (glapa...@gmail.com). Thanks, --- Sungwoo Park On Fri, May 19, 2023 at 2:49 PM Butao Zhang wrote: > Hi, Hive dev > > > I just saw this updated page: > https://c

Re: Can we get someone to review the PR for HIVE-24915?

2023-05-12 Thread Sungwoo Park
Hi, HIVE-25170 fixes the same bug as in your pull request. Thanks, --- Sungwoo On Fri, May 12, 2023 at 4:04 PM Suprith Chandrashekharachar < suprith.chandrashekharac...@treasure-data.com> wrote: > Hi, > > I opened this ticket about 2 years ago hoping to get a review. I didn't > hear any feedba

Re: Introducing a DI framework in Hive?

2023-04-13 Thread Sungwoo Park
I would like to add another question to the list of Laszlo. 4) When a specific DI framework is chosen, what kinds of new dependencies will be introduced? (Are they conflicting with existing dependencies of Hive?) Regards, --- Sungwoo Park On Thu, Apr 13, 2023 at 4:43 PM László Bodor wrote

Re: Introducing a DI framework in Hive?

2023-04-13 Thread Sungwoo Park
Hi Stamatis, For the correctness issue, we wanted to solve the problem ourselves and have made a few pull requests in [1] so far. (We would like to kindly request Hive committers to review the pull requests.) For HIVE-27226, we are working on a solution and will create a pull request when a solu

Re: Introducing a DI framework in Hive?

2023-04-12 Thread Sungwoo Park
LLAP are the new execution engines, these tests should be migrated as well. Sungwoo Park [1] https://issues.apache.org/jira/browse/HIVE-26654 [2] https://issues.apache.org/jira/browse/HIVE-27226 On Wed, Apr 12, 2023 at 10:12 PM Stamatis Zampetakis wrote: > Hey Laszlo, > > Dependency

Re: [DISCUSS] Move Jira notification emails out of dev@hive

2023-03-26 Thread Sungwoo Park
I like the proposal very much. (Then, hopefully this mailing list will be useful to outside contributors as well.) --- Sungwoo Park On Sat, 25 Mar 2023, Stamatis Zampetakis wrote: Hi everyone, In the last Hive board report someone mentioned that the volume of Jira notification emails to the

Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-03-22 Thread Sungwoo Park
Sungwoo Park wrote: Hello, I would like to expand the list of blockers with HIVE-27138 [1] which fixes NPE on mapjoin_filter_on_outerjoin.q. Currently mapjoin_filter_on_outerjoin.q is tested with MapReduce execution engine and shows no problem. However, it shows a few problems when tested with

Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-03-14 Thread Sungwoo Park
Hello, I would like to expand the list of blockers with HIVE-27138 [1] which fixes NPE on mapjoin_filter_on_outerjoin.q. Currently mapjoin_filter_on_outerjoin.q is tested with MapReduce execution engine and shows no problem. However, it shows a few problems when tested with Tez execution eng

[jira] [Created] (HIVE-27134) SharedWorkOptimizer merges TableScan operators that have different DPP parents

2023-03-11 Thread Sungwoo Park (Jira)
Sungwoo Park created HIVE-27134: --- Summary: SharedWorkOptimizer merges TableScan operators that have different DPP parents Key: HIVE-27134 URL: https://issues.apache.org/jira/browse/HIVE-27134 Project

Re: Asking for code review: HIVE-26968, HIVE-26986, HIVE-27006

2023-02-14 Thread Sungwoo Park
ey get reviewed. Best regards, Alessandro On Tue, 14 Feb 2023 at 15:06, Sungwoo Park wrote: Seonggon created three JIRAs a while ago which affect the result of TPC-DS queries, and I wonder if anyone would have time for reviewing the pull requests. HIVE-26968: SharedWorkOptimizer merges TableScan

Asking for code review: HIVE-26968, HIVE-26986, HIVE-27006

2023-02-14 Thread Sungwoo Park
Hive 4.0.0, it does not seem like a good plan to release Hive 4.0.0 that fails on some TPC-DS queries. Thanks! Sungwoo Park

[jira] [Created] (HIVE-27082) AggregateStatsCache.findBestMatch() in Metastore should test the inclusion of default partition name

2023-02-14 Thread Sungwoo Park (Jira)
Sungwoo Park created HIVE-27082: --- Summary: AggregateStatsCache.findBestMatch() in Metastore should test the inclusion of default partition name Key: HIVE-27082 URL: https://issues.apache.org/jira/browse/HIVE-27082

Re: Result of the TPC-DS benchmark using Iceberg,

2022-11-28 Thread Sungwoo Park
results for query 64. Because of several bugs in shared work optimization (and parallel edge fixer), it might make sense to set the default value of hive.optimize.shared.work to false in HiveConf.java. --- Sungwoo On Fri, 18 Nov 2022, Sungwoo Park wrote: Hello Stamatis, We use a recent or

Re: Result of the TPC-DS benchmark using Iceberg,

2022-11-18 Thread Sungwoo Park
your findings; interesting observations. If you can please also share the project versions that you used for running the experiments. Best, Stamatis On Tue, Nov 15, 2022 at 12:46 PM Sungwoo Park wrote: Hello, I ran the TPC-DS benchmark using Metastore (in the traditional way) and Iceberg, and

Result of the TPC-DS benchmark using Iceberg,

2022-11-15 Thread Sungwoo Park
Hello, I ran the TPC-DS benchmark using Metastore (in the traditional way) and Iceberg, and would like to share the result for those interested in Hive using Iceberg. The experiment used 1TB TPC-DS dataset stored as ORC. Here are a few findings. 1. Overall, Hive-Iceberg runs slightly faster

[jira] [Created] (HIVE-26732) Iceberg uses "null" and does not use the configuration key "hive.exec.default.partition.name" for default partitions.

2022-11-13 Thread Sungwoo Park (Jira)
Sungwoo Park created HIVE-26732: --- Summary: Iceberg uses "null" and does not use the configuration key "hive.exec.default.partition.name" for default partitions. Key: HIVE-26732 URL: https://is

[jira] [Created] (HIVE-26668) Upgrade ORC version to 1.6.11

2022-10-25 Thread Sungwoo Park (Jira)
Sungwoo Park created HIVE-26668: --- Summary: Upgrade ORC version to 1.6.11 Key: HIVE-26668 URL: https://issues.apache.org/jira/browse/HIVE-26668 Project: Hive Issue Type: Bug

[jira] [Created] (HIVE-26660) TPC-DS query 71 returns wrong results

2022-10-22 Thread Sungwoo Park (Jira)
Sungwoo Park created HIVE-26660: --- Summary: TPC-DS query 71 returns wrong results Key: HIVE-26660 URL: https://issues.apache.org/jira/browse/HIVE-26660 Project: Hive Issue Type: Bug

[jira] [Created] (HIVE-26659) TPC-DS query 16, 69, 94 return wrong results.

2022-10-22 Thread Sungwoo Park (Jira)
Sungwoo Park created HIVE-26659: --- Summary: TPC-DS query 16, 69, 94 return wrong results. Key: HIVE-26659 URL: https://issues.apache.org/jira/browse/HIVE-26659 Project: Hive Issue Type: Bug

[jira] [Created] (HIVE-26655) TPC-DS query 17 returns wrong results

2022-10-20 Thread Sungwoo Park (Jira)
Sungwoo Park created HIVE-26655: --- Summary: TPC-DS query 17 returns wrong results Key: HIVE-26655 URL: https://issues.apache.org/jira/browse/HIVE-26655 Project: Hive Issue Type: Bug

[jira] [Created] (HIVE-26654) Test with the TPC-DS benchmark

2022-10-20 Thread Sungwoo Park (Jira)
Sungwoo Park created HIVE-26654: --- Summary: Test with the TPC-DS benchmark Key: HIVE-26654 URL: https://issues.apache.org/jira/browse/HIVE-26654 Project: Hive Issue Type: Bug Affects

Re: Start releasing the master branch

2022-03-01 Thread Sungwoo Park
milar results have been reproduced by the Hive team, in order to make sure that we did not make errors in our tests. If it is okay to open a JIRA ticket that only reports failures in the TPC-DS test, we could also perform git bi-sect to locate the commit that begin to generate wrong results. --- Sungwoo

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-17 Thread Sungwoo Park
> > > 1. With hive.optimize.shared.work.dppunion=true, query 2 and 59 fail. > Please see the attachment for stack traces. > > Even thru the exception seem to be a reoccurance of the previous issue - > existing checks + HIVE-24360 should have restricted all incorrect cases. > I built in some debug

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-13 Thread Sungwoo Park
have automated the entire experiment, so if you would like to see the result of testing a new commit, I would be happy to rerun the experiment and get back to you.) Cheers, --- Sungwoo On Thu, Nov 12, 2020 at 10:49 PM Zoltan Haindrich wrote: > Hey Sungwoo! > > On 11/12/20 10:23 AM, S

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-12 Thread Sungwoo Park
Hi Zoltan, I used the same hive-site.xml for the previous test (which was okay) and the new test (which failed), so my guess is that it is perhaps due to a commit since the previous test. Let me try later to identify the commit that fails query 14, with the hope that identifying such a commit migh

Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-05 Thread Sungwoo Park
Hi Stamatis, Mustafa, Zoltán, This is the result of a new experiment. These are the changes that I made: 1. Reverted HIVE-24139. (It turns out that HIVE-24139 does not affect the result of the TPC-DS benchmark.) 2. Set hive.optimize.shared.work.dppunion to false in hive-site.xml. 3. Set tez.runt

Result of the TPC-DS benchmark on Hive master branch

2020-11-04 Thread Sungwoo Park
Hello, I have tested a recent commit of the master branch using the TPC-DS benchmark. I used Hive on Tez (not Hive-LLAP). The way I tested is: 1) create a database consisting of external tables from a 100GB TPC-DS text dataset 2) create a database consisting of ORC tables from the previous databa