when enable reducededuplication, count(distinct)+group by very slow

2023-12-19 Thread lisoda
Hi team. I found that when I enable reduceduplication, count(distinct)+GroupBy becomes very slow. Is there a problem with reduceduplication? test query info: | CONFIG | SQL | TIME | | hive.optimize.reducededuplication=true | select count(1) from(select uni_shop_id,partner,count(distinct

Re: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-18 Thread Simhadri G
We can modify the Dockerfile to wget the necessary driver and copy it to /opt/hive/lib/ . This should make it work. The diff is attached below: diff --git a/packaging/src/docker/Dockerfile b/packaging/src/docker/Dockerfile --- a/packaging/src/docker/Dockerfile (revision

Re: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-17 Thread Ayush Saxena
I think the similar problem is being chased as part of https://github.com/apache/hive/pull/4948 On Mon, 18 Dec 2023 at 09:48, Sanjay Gupta wrote: > > > > > Issue with Docker container using mysql RDBMS ( Failed to load driver) > > https://hub.docker.com/r/apache/hive > > According to readme > >

Fwd: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-17 Thread Sanjay Gupta
Issue with Docker container using mysql RDBMS ( Failed to load driver) https://hub.docker.com/r/apache/hive According to readme Launch Standalone Metastore With External RDBMS (Postgres/Oracle/MySql/MsSql) I want to use MySQL I tried com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver docker

Re: MR3 1.8 released

2023-12-15 Thread Sungwoo Park
For Chinese users, MR3 1.8 is now shipped in HiDataPlus (along with Celeborn). https://mp.weixin.qq.com/s/65bgrnFpXtORlb4FjlPMWA --- Sungwoo On Sat, Dec 9, 2023 at 9:08 PM Sungwoo Park wrote: > MR3 1.8 released > > On behalf of the MR3 team, I am pleased to announce the release of MR3 1.8. >

MR3 1.8 released

2023-12-09 Thread Sungwoo Park
MR3 1.8 released On behalf of the MR3 team, I am pleased to announce the release of MR3 1.8. MR3 is an execution engine similar in spirit to MapReduce and Tez which has been under development since 2015. Its main application is Hive on MR3. You can run Hive on MR3 on Hadoop, on Kubernetes, in

A deadLock problem

2023-12-07 Thread lisoda
Hi Team. [HIVE-27944] When HIVE-LLAP reads the ICEBERG table, a deadlock may occur. - ASF JIRA (apache.org) I submitted this ISSUE, if anyone can help me I would appreciate it. Tks.

Re:Re: hive can not read iceberg-parquet table

2023-11-22 Thread lisoda
Hi. Following your suggestion, I created three ISSUE: [HIVE-27901] Hive's performance for querying the Iceberg table is very poor. - ASF JIRA (apache.org) [HIVE-27900] hive can not read iceberg-parquet table - ASF JIRA (apache.org) [HIVE-27898] HIVE4 can't use ICEBERG table in subqueries - ASF

Re:Re: hive can not read iceberg-parquet table

2023-11-21 Thread lisoda
Sorry, I don't have an account with jira at the moment. I was rejected by the administrator when I applied for an account earlier. He thought that such issues could be discussed in an email. I'll try to apply for an account again.

Re:Re: hive can not read iceberg-parquet table

2023-11-21 Thread lisoda
1. TEZ_VERSION 0.10.3 SNAPSHOT 2. iceberg table is cow table. insert small data will get same error. 3.using orc-iceberg is ok. 4. disable vectorized and using parquet is ok.

Re: hive can not read iceberg-parquet table

2023-11-21 Thread Butao Zhang
Hi lisoda, Thank you for trying the Hive4-beta and reporting this issue. Based on the current information you provided, i can not reproduce this issue. Could you please give more clues? e.g. 1) Which Tez version are you using? Hive4-beta uses Tez 0.10.2 by default.

hive can not read iceberg-parquet table

2023-11-21 Thread lisoda
Hi team. I am currently testing HIVE-4.0.0-BETA. For better read performance, we use the Iceberg-Parquet table. However, we have found that HIVE is currently unable to handle iceberg-parquet tables correctly. Example: CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy STORED BY

Re: [ANNOUNCE] New committer: Butao Zhang (zhangbutao)

2023-11-21 Thread Sai Hemanth Gantasala
Congratulations Butao Zhang, Very well deserved. Your contributions to the Data connector feature are very impressive and much appreciated. Looking forward to much more!! Thanks, Sai. On Tue, Nov 21, 2023 at 1:06 PM Stamatis Zampetakis wrote: > Congratulations Butao, well deserved! Very glad

Re: [ANNOUNCE] New committer: Butao Zhang (zhangbutao)

2023-11-21 Thread Stamatis Zampetakis
Congratulations Butao, well deserved! Very glad to see another Iceberg expert joining the team. Best, Stamatis On Tue, Nov 21, 2023, 4:47 PM Butao Zhang wrote: > Thank you to the Hive community for this honor. I will continue to > contribute to the community with my efforts. > Thanks all! > >

Re: [ANNOUNCE] New committer: Butao Zhang (zhangbutao)

2023-11-21 Thread Butao Zhang
Thank you to the Hive community for this honor. I will continue to contribute to the community with my efforts. Thanks all! Thanks, Butao Zhang Replied Message | From | Ayush Saxena | | Date | 11/21/2023 15:02 | | To | dev , , Butao Zhang | | Subject | [ANNOUNCE] New committer:

[ANNOUNCE] New committer: Butao Zhang (zhangbutao)

2023-11-20 Thread Ayush Saxena
Hi All, Apache Hive's Project Management Committee (PMC) has invited Butao Zhang to become a committer, and we are pleased to announce that he has accepted. Butao Zhang welcome, thank you for your contributions, and we look forward to your further interactions with the community! Ayush Saxena

[no subject]

2023-11-20 Thread Rajbir singh
user-unsubscribe -- Regards, Rajbir

Re: [EXTERNAL] Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-17 Thread Krisztian Kasa
Hi Eugene, Hive has a feature called automatic query rewrite [1]. This feature needs up-to-date information about the materialized views available. [2] The feature can be disable by the setting: hive.materializedview.rewriting [3] Hope this helps. regards, Krisztian [1]

Re: [EXTERNAL] Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-16 Thread Butao Zhang
Sorry, I'm not sure of the final released time, but I think it will be soon. :) Maybe some other folks of Hive community know more about the GA release. Thanks, Butao Zhang Replied Message | From | lisoda | | Date | 11/17/2023 12:31 | | To | user | | Subject | Re: [EXTERNAL] Re:

Re: Question on Hive Metastore catalog support

2023-11-16 Thread Butao Zhang
Hi, maybe you can check this ticket https://issues.apache.org/jira/browse/HIVE-26227 Thanks, Butao Zhang Replied Message | From | Flavio Junqueira | | Date | 11/15/2023 17:26 | | To | | | Subject | Question on Hive Metastore catalog support | Hello there, I'm interested in

Re: [EXTERNAL] Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-16 Thread lisoda
May I ask when hive4 can be released? Replied Message | From | Butao Zhang | | Date | 11/17/2023 12:24 | | To | user@hive.apache.org | | Cc | | | Subject | Re: [EXTERNAL] Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting' | Thanks for the info. I checked

Re: [EXTERNAL] Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-16 Thread Butao Zhang
Thanks for the info. I checked Hive3.1.3, and there will have performance issues when HS2 invoking method get_materialized_views_for_rewritin. You can refer to this ticket https://issues.apache.org/jira/browse/HIVE-21631 which was fixed in Hive4. And if you do not need mv ability, here is a

Re: [EXTERNAL] Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-16 Thread Eugene Miretsky
Hey! Hive version is 3.1.3 On Wed, Nov 15, 2023 at 9:23 PM Butao Zhang wrote: > Hi, which version of hms are you using now? I have checked the master > branch and beta-1 branch source code, but I can't find the place where > this method *get_materialized_views_for_rewriting* is called by

Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-15 Thread Butao Zhang
Hi, which version of hms are you using now? I have checked the master branch and beta-1 branch source code, but I can't find the place where this method get_materialized_views_for_rewriting is called by mistake. Thanks, Butao Zhang Replied Message | From | Eugene Miretsky | |

Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-15 Thread Eugene Miretsky
Hey! We have a catalog with fairly a lot of databases and tables. Where we do a simple query (select * from table limit 5;) on an ideal cluster, it takes around 20seconds, sometimes longer (usually first run takes 40s+) Looking at the hive-metastore logs during most of the query time the logs

Question on Hive Metastore catalog support

2023-11-15 Thread Flavio Junqueira
Hello there, I'm interested in understanding the Hive Metastore catalog support. I see references in the metastore code to catalogs, for example:

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-10 Thread Simhadri G
Please ensure hive.stats.autogather is enabled as well. On Fri, Nov 10, 2023, 2:57 PM Denys Kuzmenko wrote: > `hive.iceberg.stats.source` controls where the stats should be sourced > from. When it's set to iceberg (default), we should go directly to iceberg > and bypass HMS. >

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-10 Thread Denys Kuzmenko
`hive.iceberg.stats.source` controls where the stats should be sourced from. When it's set to iceberg (default), we should go directly to iceberg and bypass HMS.

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-09 Thread Butao Zhang
Can you please check this property? We need ensure it is true. set hive.compute.query.using.stats=true; In addition, it looks like the table created by spark has lots of data. Can you create a new table and insert into several values by spark, and then create & count(*) this

Re:Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread lisoda
STEP1: CREATE TABLE USING SPARK: CREATE TABLE IF NOT EXISTS test.dwd.test_trade_table( `uni_order_id` string, `data_from` bigint,

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread Butao Zhang
Could you please provide detailed steps to reproduce this issue? e.g. how do you create the table? Thanks, Butao Zhang Replied Message | From | lisoda | | Date | 11/9/2023 14:25 | | To | | | Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is very poor. |

Re:Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread lisoda
Incidentally, I'm using a COW table, so there is no DELETE_FILE. 在 2023-11-09 10:57:35,"Butao Zhang" 写道: Hi lisoda. You can check this ticket https://issues.apache.org/jira/browse/HIVE-27347 which can use iceberg basic stats to optimize count(*) query. Note: it didn't take effect if

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread Butao Zhang
Hi lisoda. You can check this ticket https://issues.apache.org/jira/browse/HIVE-27347 which can use iceberg basic stats to optimize count(*) query. Note: it didn't take effect if having delete files. Thanks, Butao Zhang Replied Message | From | lisoda | | Date | 11/9/2023 10:43 |

Re:Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread lisoda
HI. I am testing with HIVE-4.0.0-BETA-1 version and I am using location_based_table. So far I found that HIVE still can't push some queries down to METADATA, e.g. COUNT(*). Is HIVE 4.0.0-BETA-1 still not able to support query push down? 在 2023-10-24 17:41:20,"Ayush Saxena" 写道:

Fwd: Release of Hive 4 and TPC-DS benchmark

2023-11-03 Thread Sungwoo Park
Forwarded to user@hive as I think many people are curious about the release of Hive 4. -- Forwarded message - From: Sungwoo Park Date: Sat, Nov 4, 2023 at 12:42 AM Subject: Release of Hive 4 and TPC-DS benchmark To: Hi everyone, I would like to resume the discussion on the

Re: Announce: Hive-MR3 with Celeborn,

2023-11-02 Thread Sungwoo Park
Celeborn and Uniffle can also be seen as a move to separate local storage from compute nodes. 1. In the old days, Hadoop was based on the idea of collocating compute and storage. 2. Later a new paradigm of separating compute and storage emerged and got popularized. 3. Now people want to not just

Re: Announce: Hive-MR3 with Celeborn,

2023-11-02 Thread Keyong Zhou
I think both Celeborn and Uniffle are good alternatives as a general shuffle service. I recommend that you try them : ). For any question about Celeborn, we're very glad to discuss in Celeborn's mail lists[1][2] or slack[3]. [1] u...@celeborn.apache.org [2] d...@celeborn.apache.org [3]

Re: Announce: Hive-MR3 with Celeborn,

2023-11-01 Thread Sungwoo Park
On Thu, Nov 2, 2023 at 1:43 PM Sungwoo Park wrote: > Have you done comparison between uniffle and celeborn..? >> > > We did not compare the performance of Uniffle and Celeborn (because > Hive-MR3-Celeborn has been released but Hive-MR3-Uniffle is not complete > yet). Much of the code in

Re: Announce: Hive-MR3 with Celeborn,

2023-11-01 Thread Sungwoo Park
> > Have you done comparison between uniffle and celeborn..? > We did not compare the performance of Uniffle and Celeborn (because Hive-MR3-Celeborn has been released but Hive-MR3-Uniffle is not complete yet). Much of the code in Hive-MR3-Celeborn is currently reused in Hive-MR3-Uniffle, so we

Re: Announce: Hive-MR3 with Celeborn,

2023-10-31 Thread Battula, Brahma Reddy
Thanks for bringing up this. Good to see that it supports spark and flink. Have you done comparison between uniffle and celeborn..? On 30/10/23, 8:01 AM, "Keyong Zhou" mailto:zho...@apache.org>> wrote: Great to hear this! It's encouraging that Celeborn helps MR3. Celeborn is a general

Re: Announce: Hive-MR3 with Celeborn,

2023-10-29 Thread Keyong Zhou
Great to hear this! It's encouraging that Celeborn helps MR3. Celeborn is a general purpose remote shuffle service that stores and serves shuffle data (and other intermediate data in the future) to help compute engines better use disaggregated architecture, as well as become more efficient and

Re: Metastore: How is the unique ID of new databases and tables determined?

2023-10-24 Thread Venu Reddy
Hi Eugene, HMS depends on DataNucleus for the identity value generation for the HMS tables. It is generated by DataNucleus when an object is made persistent. DataNucleus value generator will generate values uniquely across different JVMs. As Zoltan said, DataNucleus tracks with the SEQUENCE_TABLE

Re: Metastore: How is the unique ID of new databases and tables determined?

2023-10-24 Thread Zoltán Rátkai
Hi Eugene, the TBL_ID in TBLS table is handled by Datanucleus, so AUTO_INCREMENT won't help, since the TBL_ID is not defined as AUTO_INCREMENT. Datanucleus uses SEQUENCE_TABLE to store the actual value for primary keys. In this table this two rows is what you need to modify:

Re: Announce: Hive-MR3 with Celeborn,

2023-10-24 Thread lisoda
Thanks. I will try. Replied Message | From | Sungwoo Park | | Date | 10/24/2023 20:08 | | To | user@hive.apache.org | | Cc | | | Subject | Announce: Hive-MR3 with Celeborn, | Hi Hive users, Before the impending release of MR3 1.8, we would like to announce the release of Hive-MR3

Announce: Hive-MR3 with Celeborn,

2023-10-24 Thread Sungwoo Park
Hi Hive users, Before the impending release of MR3 1.8, we would like to announce the release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn 0.3.1). Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and Apache Uniffle [3] (which was discussed in this Hive

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-24 Thread Ayush Saxena
HIVE-27734 is in progress, as I see we have a POC attached to the ticket, we should have it in 2-3 week I believe. > Also, after the release of 4.0.0, will we be able to do all TPCDS queries on ICEBERG except for normal HIVE tables? Yep, I believe most of the TPCDS queries would be supported

Re:Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-24 Thread lisoda
Thanks. I would like to know if hive currently supports push to ICEBERG table partition under JOIN condition. Because I see HIVE-27734 is not yet complete, what is its progress so far? Also, after the release of 4.0.0, will we be able to do all TPCDS queries on ICEBERG except for normal HIVE

submitting tasks failed in Spark standalone mode due to missing failureaccess jar file

2023-10-24 Thread eab...@163.com
Hi Team. I use spark 3.5.0 to start Spark cluster with start-master.sh and start-worker.sh, when I use ./bin/spark-shell --master spark://LAPTOP-TC4A0SCV.:7077 and get error logs: ``` 23/10/24 12:00:46 ERROR TaskSchedulerImpl: Lost an executor 1 (already removed): Command exited with code

Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread Ayush Saxena
Hi Lisoda, The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements like vecotrization and stuff like that. If you wanna

Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread Albert Wong
Too bad. Tencent Games used StarRocks with Apache Iceberg to power their analytics. https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25. On Mon, Oct 23, 2023 at 10:55 AM lisoda wrote: > We are not going to use starrocks. >

Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread lisoda
We are not going to use starrocks. mpp architecture databases have natural limitations, and starrocks does not necessarily perform better than hive llap. Replied Message | From | Albert Wong | | Date | 10/24/2023 01:39 | | To | user@hive.apache.org | | Cc | | | Subject | Re: Hive's

Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread Albert Wong
I would try http://starrocks.io. StarRocks is an MPP OLAP database that can query Apache Iceberg and we can cache the data for faster performance. We also have additional features like building materialized views that span across Apache Iceberg, Apache Hudi and Apache Hive. Here is a video of

Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread lisoda
Hi Team. I recently was testing Hive query Iceberg table , I found that Hive query Iceberg table performance is very very poor . Almost impossible to use in the production environment . And Join conditions can not be pushed down to the Iceberg partition. I'm using the 1.3.1 Hive

Metastore: How is the unique ID of new databases and tables determined?

2023-10-22 Thread Eugene Miretsky
Hey! Looking for a way to control the ids (DB_ID and TABLE_ID) of newly created databases and tables. We have a somewhat complicated use case where we replicate the metastore (and data) from a source Hive cluster to a target cluster. However new tables can be added on both source and target. We

how to create a table in Hive using JDBC code?

2023-10-15 Thread Martin Moore
I want to use the following query to load local data into a Hive table using JDBC client: load data local inpath '/local-directory-path/file.csv' into ;// from local we can then do something like the following if the table is already defined in Hive metastore: String conStr =

Hive-3 IPv6-only networks support

2023-10-13 Thread Marchenko Oleksii
Hello, small question for the dev team: is Hive-3 IPv6 ready? Not the data storage, but Hive services runtime. I mean, if we take Hadoop out of the equation and pretend it magically began supporting running on IPv6-only networks -- will Hive be able to also run its services on IPv6-only

Re:

2023-10-12 Thread luckydog xf
Oh, I forget to add the email subject, apologize for that. On Thu, Oct 12, 2023 at 5:19 PM luckydog xf wrote: > Hi, listAccording to this link > https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+AdministrationJump > to Section unning the Metastore Without Hive > In

[no subject]

2023-10-12 Thread luckydog xf
Hi, listAccording to this link https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+AdministrationJump to Section unning the Metastore Without Hive In order to run metastore without hive, set the following. metastore.task.threads.always

Re: hive running udf in metastore can't load configuration with xinclude

2023-10-05 Thread Okumin
Hi Wojtek, I've not checked but I think your hive-site.xml has ``. Does it still happen if you put all parameters directly in hive-site.xml? If that resolves the issue, do you have a reason to use `include`?

Re: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya

2023-10-03 Thread butaozha...@163.com
Congratulations! Sourabh Badhya   发件人: user-return-27928-butaozhang1=163@hive.apache.org 代表 Ayush Saxena 发送时间: 星期三, 十月 4, 2023 12:10 下午收件人: d...@hive.apache.org 抄送: user@hive.apache.org 主题: Re: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya 

Re: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya

2023-10-03 Thread Ayush Saxena
Congratulations Sourabh!!! -Ayush > On 04-Oct-2023, at 9:28 AM, Sankar Hariappan > wrote: > > Congratulations Sourabh! Welcome to the Hive committers club!  > > > > Thanks, > > Sankar > > > > -Original Message- > From: Sourabh Badhya > Sent: Wednesday, October 4, 2023 9:19

RE: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya

2023-10-03 Thread Sankar Hariappan via user
Congratulations Sourabh! Welcome to the Hive committers club!  Thanks, Sankar -Original Message- From: Sourabh Badhya Sent: Wednesday, October 4, 2023 9:19 AM To: d...@hive.apache.org; user@hive.apache.org Subject: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya [You

[ANNOUNCE] New committer: Sourabh Badhya

2023-10-03 Thread Stamatis Zampetakis
Apache Hive's Project Management Committee (PMC) has invited Sourabh Badhya to become a committer, and we are pleased to announce that he has accepted. Sourabh has been doing some great work for the project. He has landed important fixes in critical parts of Hive and made significant

COLUMNS_V2 high RDS load

2023-10-02 Thread Patrick Duin
Hi, We've been investigating some high db load in our HMS server (version 2.3.9 on Mysql 5.7 aurora 2.11.2). This seems to be due to sort indexing being created for queries on the COLUMNS_V2 table. After some digging we think we see the same thing as this ticket/PR tries to solve:

Re: Request write access to the Hive wiki.

2023-09-21 Thread Albert Wong
In https://cwiki.apache.org/confluence/display/Hive/ on "user documentation", I'd like to add "StarRocks Integration". StarRocks is an OLAP database that can query data in Apache Hive ( https://docs.starrocks.io/en-us/latest/data_source/catalog/hive_catalog). On Thu, Sep 21, 2023 at 12:23 PM

Re: Request write access to the Hive wiki.

2023-09-21 Thread Ayush Saxena
Hi Albert,Can you share some more details like which page you want to modify and details around the content -AyushOn 22-Sep-2023, at 12:43 AM, Albert Wong wrote:username is albertatcelerdata.com-- Albert WongCommunity, Developer Relations, Technology Partnerships for StarRocks | CelerData949 689

Request write access to the Hive wiki.

2023-09-21 Thread Albert Wong
username is albertatcelerdata.com -- [image: linkedin] Albert Wong Community, Developer Relations, Technology Partnerships for StarRocks | CelerData [image: mobilePhone] 949 689 6412 [image: emailAddress] albert.w...@celerdata.com

hive running udf in metastore can't load configuration with xinclude

2023-09-21 Thread Wojtek Meler
Ive noticed strange behaviour of hive. When you run query against partitioned table like this: select * from mytable  where log_date = date_add(2023-09-10,1) limit 3  (mytable is partitioned by log_date string column) hive is trying to evaluate date_add inside metastore and throws

Re: Inquiry about Stable Release Timeline for Hive-Serde 4.X

2023-09-21 Thread Ayush Saxena
Hi, The GA release is in planning stage. We have some blockers, once we get them sorted, we will be pushing for a new release. At best it would take a minimum of 3 months, though that ain't a strict timeline... Thanx -Ayush On Thu, 21 Sept 2023 at 17:39, Mergu Ravi wrote: > > I'm currently

fail to run Hive Metastore Service on Postgres hot-standby replication

2023-09-12 Thread Mingmin Xu
Hello, I'm deploying a readonly HMS on a replication DB to offload some traffic on the primary DB, however it fails with error message as below, although the default value for `datanucleus.transactionIsolation` is `read-committed`, we tried to set it as `repeatable-read`, same problem. ``` Caused

Re: Problem encountered when following hive docker quickstart.

2023-09-05 Thread Ayush Saxena
Hi Away Hua, Thanx for the report. I think that is indeed a bug in the docker image. Here it is calling initOrUpgrade schema here [1] Where as initOrUpgrade is introduced in HIVE-20357 which is there post 4.0.0-alpha-1 [2] You can raise a hive ticket for this & the fix most probably should be to

Problem encountered when following hive docker quickstart.

2023-09-04 Thread Away Hua
I followed the *QuickStart* section of hive quickstart to start hiveserver2 with version 3.1.3 in docker container. However, I can't start hiveserver2 container successfully. This failed container outputs the following content, + : derby +

Registration open for Community Over Code North America

2023-08-28 Thread Rich Bowen
Hello! Registration is still open for the upcoming Community Over Code NA event in Halifax, NS! We invite you to register for the event https://communityovercode.org/registration/ Apache Committers, note that you have a special discounted rate for the conference at US$250. To take advantage of

Unsubscribe

2023-08-21 Thread sudeep mishra
> >

Re: [ANNOUNCE] Apache Hive 4.0.0-beta-1 Released

2023-08-21 Thread Battula, Brahma Reddy
Nice!! Thanks to all who all make this happen.. Any draft plan GA for 4.0.0. ( If it's already discussed, please provide the reference.) On 15/08/23, 12:13 PM, "Stamatis Zampetakis" mailto:zabe...@apache.org>> wrote: The Apache Hive team is proud to announce the release of Apache Hive

Re: Re: Tez & fetch task conversion

2023-08-20 Thread Okumin
Hi Wojtek, Thanks for explaining the detail. I understand you have a larger amount of data than `hive.fetch.task.conversion.threshold`. Taking a glance, SimpleFetchOptimizer is likely to respect LIMIT if `hive.fetch.task.caching` is disabled and all predicates are for partition pruning. The case

Re: Specifying YARN Node (Label) for LLAP AM

2023-08-19 Thread Mich Talebzadeh
Thank you for the information, Aaron. I explored the MR3 link you provided and found it intriguing. However, the latest email I received from another member seemed to deviate from the technical discussion's focus, potentially leading us off track and hinder objectivity. Therefore, with regret, I

Re: Specifying YARN Node (Label) for LLAP AM

2023-08-19 Thread Sungwoo Park
Hello, For more recent benchmark results, please see [1] where we compare Trino 418, Spark 3.4.0, and Hive 3.1.3 (on MR3 1.7) using TPC-DS 10TB. Spark takes about 19600 seconds to complete all the queries, whereas Trino and Hive take about 7400 seconds only. The experiment does not use Hive-LLAP,

Re: Specifying YARN Node (Label) for LLAP AM

2023-08-19 Thread Aaron Grubb
You might also be interested in knowing that there has been discussions about deprecating Hive on Spark: https://lists.apache.org/thread/sspltkv3ovbsjmoct72p4m1ooqk2g740 On Sat, 2023-08-19 at 10:17 +, Aaron Grubb wrote: Hi Mich, It's not a question of cannot but rather a) is it worth

Re: Specifying YARN Node (Label) for LLAP AM

2023-08-19 Thread Aaron Grubb
Hi Mich, It's not a question of cannot but rather a) is it worth converting our pipelines from Hive to Spark and b) is Spark more performant than LLAP, and in both cases the answer seems to be no. 2016 is a lifetime ago in technological time and since then there's been a major release of Hive

Re: Specifying YARN Node (Label) for LLAP AM

2023-08-18 Thread Mich Talebzadeh
interesting! In 2016 I gave a presentation in London, in Future of DataOrganised by Hortonworks July 20, 2016, Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations! Then I thought Spark as an underlying

Re: Specifying YARN Node (Label) for LLAP AM

2023-08-18 Thread Aaron Grubb
Hi Mich, Yes, that's correct On Fri, 2023-08-18 at 15:24 +0100, Mich Talebzadeh wrote: Hi, Are you using LLAP (Long live and prosper) as a Hive engine? HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom

Re: Specifying YARN Node (Label) for LLAP AM

2023-08-18 Thread Mich Talebzadeh
Hi, Are you using LLAP (Long live and prosper) as a Hive engine? HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

RE: Specifying YARN Node (Label) for LLAP AM

2023-08-18 Thread Aaron Grubb
For those interested, I managed to define a way to launch the LLAP application master and daemons on separate, targeted machines. It was inspired by an article I found [1] and implemented using YARN Node Labels [2] and Placement Constraints [3] with a modification to the file

Fwd: BUG ? Loss of all table data when deleting partitions performed directly on hdfs (after metastore synchro)

2023-08-17 Thread Jeremy Ferrer
Hello, For a partitioned table (with an int type partition) in the following format: CREATE EXTERNAL TABLE `poc.titi`( `n` string) PARTITIONED BY ( `id` integer) SERDE LINE FORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUT FORMAT

[ANNOUNCE] Apache Hive 4.0.0-beta-1 Released

2023-08-15 Thread Stamatis Zampetakis
The Apache Hive team is proud to announce the release of Apache Hive version 4.0.0-beta-1. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides, among others: * Tools to

Remote Hive Metastore - Error - NoMatchingRule: No rules applied

2023-08-10 Thread Sathish Kumar
HI Team, Could you please help me to fix the below ? Not sure where the blocker is, on SASL or Kerberos Auth_to_local Rules? Thanks *PROBLEM STATEMENT : Connections established from dataproc client Node to MSS(metastore service) Instance are not Successful.* # We were able to build

Odp: Re: Tez & fetch task conversion

2023-08-03 Thread Wojtek Meler
Disabling hive.limit.optimize.enable make situation even worse - TEZ job scans all files in partition which is unnecessary. Ive run debugger and discovered that partition size in my table obtained from metastore  exceed   hive.fetch.task.conversion.threshold.  It seems that since hive-1.0.1

Re: Questions Regarding Bucket Map Join in Hive

2023-07-12 Thread smart li
Subject: Successful Implementation of Bucket Map Join Hi, I hope this message finds you well. I wanted to express my gratitude for the detailed instructions you provided on setting up the Bucket Map Join. Your guidance proved to be extremely helpful and, following your steps, I am pleased to

Introduce Uniffle : A stability solution of Hive's shuffle

2023-07-11 Thread roryqi
Dear Apache Hive community, We are delighted to announce the support of Tez on Uniffle. Uniffle havs supported Apache Spark, Apache,Hadoop MapReduce and Apache Tez. Uniffle is a remote shuffle service. In several situations, Uniffle will provide great help. 1. If you use AWS spot instances

Re: Roadmap information

2023-07-06 Thread JOHN MILLER
You have the wrong email address On Thu, Jul 6, 2023, 3:16 PM Attila Turoczy wrote: > Hi Cristian, > > We are going to release the 4.0 beta soon (hopefully within 1 week) There > are still 2 tickets that need to resolve to be confident to release 4.0 as > a GA. > I think it will happen soon,

Re: Roadmap information

2023-07-06 Thread Attila Turoczy
Hi Cristian, We are going to release the 4.0 beta soon (hopefully within 1 week) There are still 2 tickets that need to resolve to be confident to release 4.0 as a GA. I think it will happen soon, and as the community unblock these issues and release the 4.0 we plan to do it more often and

Re: Roadmap information

2023-07-06 Thread Cristian Astorino
Hi, any update on the release of the stable Hive 4.0.0? Thanks, Cristian Il mar 29 nov 2022, 12:06 Stamatis Zampetakis ha scritto: > Hi Cristian, > > The 4.0.0-alpha-2 was released on 16 November 2022. The next scheduled > release is most likely the stable 4.0.0 [1]. > The usual release

Re: Tez & fetch task conversion

2023-07-06 Thread Okumin
Hi Wojtek, I tried to submit the query with the given configurations on Hive 4.0.0-alpha-2 on Tez on YARN. In my environment, the query is converted to a single fetch task. Could you please give us the precise revision of Hive, your table definition, the amount of data, and so on? Also, I'm

Tez & fetch task conversion

2023-07-05 Thread Wojtek Meler
Hi, after switching to Hive 4.0 and Tez on yarn Ive noticed that simple fetch queries run much longer. I have following configuration: hive.fetch.task.conversion=more hive.fetch.task.conversion.threshold=1073741824 hive.limit.optimize.enable=true hive.limit.optimize.fetch.max=5

Re: Questions Regarding Bucket Map Join in Hive

2023-07-01 Thread Okumin
Hi, I understand you are trying MapReduce! I recommend you use Tez unless you have special reasons. Tez is the recommended engine and I guess more community members use Hive 3 on Tez. It means you are more likely to get answers when you encounter trouble. Quickly, I succeeded in enabling Bucket

Re: Questions Regarding Bucket Map Join in Hive

2023-06-25 Thread smart li
Hello, First of all, I would like to express my gratitude for your responses and assistance. I’m currently encountering a scenario where my Hive is not choosing BucketMapJoin, and I wonder whether this is due to its underlying execution engine, which is MapReduce. In addition, I am operating in

Re: Questions Regarding Bucket Map Join in Hive

2023-06-25 Thread Okumin
Hi smart li, As far as I tried with Hive 3.1.2 on Tez, Bucket Map Join was probably triggered. My configurations could be different from yours, though. # How I tested ## hive-site.xml https://github.com/zookage/zookage/blob/v0.2.3/kubernetes/base/common/config/hive/hive-site.xml ## Prepare

Questions Regarding Bucket Map Join in Hive

2023-06-25 Thread smart li
Hello Hive Users, I’m currently trying to understand how Bucket Map Join works in Hive, but I’m encountering some issues that I need help with. Here’s what I did: Firstly, I created a Hive table using the following statement: create table map_join_tb( id int ) clustered by (id) into 32 buckets;

standalone-metastore authorization with S3

2023-06-21 Thread Dogukan
Hello everyone, I am having trouble with metadata authorization. Is there any way we can enforce metadata authorization using `StorageBasedAuthorizationProvider ` and s3 compatible object storage (minio in my case) ? Referring to the class documentation, it says

<    1   2   3   4   5   6   7   8   9   10   >