[ANNOUNCE] Welcoming Xixu Wang and Ke Deng as Kudu committers and PMC members

2024-09-17 Thread Andrew Wong
Hi Kudu community, I'm happy to announce that the Kudu PMC has voted to add Xixu Wang and Ke Deng as new committers and PMC members. Both Xixu and Ke have been steadily contributing for the last two years or so, writing features, submitting bug fixes, and reviewing code. Some of Xixu's major cont

[ANNOUNCE] Welcoming Márton Greber as Kudu committer and PMC member

2023-11-14 Thread Andrew Wong
Hi Kudu community, I'm happy to announce that the Kudu PMC has voted to add Márton Greber as a new committer and PMC member. Some of Márton's contributions include: - Getting Kudu to build and run on Apple silicon - Improving feature parity of the Python client with a number of features - Various

Welcoming Yuqi Du as Kudu committer and PMC member

2023-06-06 Thread Andrew Wong
Hi Kudu community, I'm happy to announce that the Kudu PMC has voted to add Yuqi Du as a new committer and PMC member. Just some of Yuqi's contributions include: - Designing and implementing automatic partition leader rebalancing - Adding several bug fixes, performance improvements, and tooling a

[ANNOUNCE] Welcoming Abhishek Chennaka as Kudu committer and PMC member

2023-02-22 Thread Andrew Wong
Hi Kudu community, I'm happy to announce that the Kudu PMC has voted to add Abhishek Chennaka as a new committer and PMC member. Some of Abhishek's contributions include: - Improving the usability of ksck and backup/restore tooling - Introducing bootstrapping metrics and webserver pages - Adding

Re: Performance problems of using spark SQL to read kudu data!

2020-11-29 Thread Andrew Wong
6,the > spark_submit commands is: > spark-submit --master yarn --deploy-mode cluster --name test > --queue bigdata_pro --conf spark.dynamicAllocation.maxExecutors=20 > --executor-cores 1 --executor-memory 8g --driver-memory 8g > --class uc.com.Test hdfs://ns1/user/hue/Test.jar > >Saprk and kudu version: the Spark version is 2.4.0 and kudu version is > 1.10.0. > > In addition to increasing the number of hash partitions under each range > partition, is there any way to increase the number of tasks for spark to read > kudu data through parameters? > >Thanks! > > > > > > > -- Andrew Wong

Re: [proposal] Kudu operating system requirements changes

2020-11-25 Thread Andrew Wong
irements >> - Upgrade dependencies the required C++14 and higher >> - Introduce new dependencies that require or benefit from C++14 and >> higher >> - Potential performance improvements >> >> If you have any concerns about these changes your feedback would be >> appreciated. If you are in support of these changes a response indicating >> your support is encouraged as well. >> >> Thank you, >> Grant >> > -- Andrew Wong

Re: Better handling disk usage / tablet placement in hybrid cluster

2020-05-21 Thread Andrew Wong
the others. Any way to do this, or > other suggestions? > > We're on 1.10.0+cdh6.3.3 > > Thanks in advance! > > -mauricio > > ** I know this is obv not ideal and we're addressing it long term, but > it's HW we got from acquisitions and we're on-prem so not as easy as > changing an instance type :) > > > -- Andrew Wong

Re: real-time pipeline with Kudu

2020-05-12 Thread Andrew Wong
Great post, Boris! We're happy to help. Thanks for sharing, and for being an active member of the community :) On Tue, May 12, 2020 at 12:23 PM Boris Tyukin wrote: > Hi guys, > > there are not a lot of real-life experiences with Kudu and I wanted to > share with you my blog post where I describe

Re: [ANNOUNCE] Welcoming Bankim Bhavsar as Kudu committer and PMC member

2020-04-18 Thread Andrew Wong
Congratulations Bankim! Keep up the great work 🎉 On Sat, Apr 18, 2020 at 3:28 PM Adar Dembo wrote: > Hi Kudu community, > > I'm happy to announce that the Kudu PMC has voted to add Bankim > Bhavsar as a new committer and PMC member. > > Bankim has been actively writing Kudu code for the last six

Re: Tablet Server with almost 1TB of WALs (and large number of open files)

2020-03-31 Thread Andrew Wong
ade), etc, and in many cases these > upgrades also bring significant changes/deprecations in other components, > like Parquet, which means we have to change (and in some cases rewrite) our > code that uses Parquet or Kafka, since these products are rapidly evolving, > and many times

Re: Tablet Server with almost 1TB of WALs (and large number of open files)

2020-03-30 Thread Andrew Wong
ted tablet IDs, do your logs indicate the nature of the > background operations performed for those tablets? > > Some of these questions can also be answered via Kudu metrics.There's > the ops_behind_leader tablet-level metric, which can tell you how far > behind a replica may be. Unfortunately I can't find a metric for > average number of WAL segments retained (or a histogram); I thought we > had that, but maybe not. > -- Andrew Wong

Re: Partitioning Rules of Thumb

2020-03-09 Thread Andrew Wong
;>> >>>> If a table ends up taking 18Gb after replication (so with 3x >>>> replication it is ~9Gb per tablet if I do not partition), should I aim for >>>> 1Gb tablets (6 tablets before replication) or should I aim for 500Mb >>>> tablets if my cluster capacity allows so (12 tablets before replication)? >>>> confused why they say "at least" not "at most" - does it mean I should >>>> design it so a tablet takes 2Gb or 3Gb in this example? >>>> >>>> Assume that I have tons of CPU cores on a cluster... >>>> Based on my quick test, it seems that queries are faster if I have more >>>> tablets/partitions...In this example, 18 tablets gave me the best timing >>>> but tablet size was around 300-400Mb. But the doc says "at least 1Gb". >>>> >>>> Really confused what the doc is saying, please help >>>> >>>> Boris >>>> >>>> -- Andrew Wong

Re: [ANNOUNCE] Welcoming Yifan Zhang as Kudu committer and PMC member

2020-01-06 Thread Andrew Wong
icas from such tablet > servers, which is a valuable building block for tserver > decommissioning. > - Most recently, deduplicating RPCs sent by Kudu masters to tablet servers. > > Please join me in congratulating Yifan! > -- Andrew Wong

Re: [ANNOUNCE] Welcoming Lifu He, Yao Xu, and Yao Zhang as Kudu committers and PMC members

2019-08-26 Thread Andrew Wong
Inc.) where > they also help operate their very large Kudu deployments. All three > have been instrumental in growing Kudu's presence within China as well > as helping new Chinese users come up to speed with Kudu. > > Please join me in congratulating Lifu, Yao, and Yao! > -- Andrew Wong

Re: is this mean the disk read rate was too slow

2019-07-15 Thread Andrew Wong
ad blocks read >> membership_card_id 381.10k 3.85M 18 >> tbill_code 372.82k 5.99M 27 >> goods_id 426.24k 281.3K 8 >> dates 426.24k 8.5K 8 >> business_id 426.24k 2.7K 8 >> goods_name 426.24k 291.1K 8 >> paid_in_amt 376.93k 1019.6K 24 >> profit 376.93k 1.14M 24 >> total 3.21M 12.55M 125is this mean the disk read rate was too slow ? >> >> >> 2019-07-15 >> -- >> lk_hadoop >> > -- Andrew Wong

Re: [ANNOUNCE] Welcoming Yingchun Lai as a Kudu committer and PMC member

2019-06-06 Thread Andrew Wong
Well done Yingchun and congratulations! Keep up the good work! :) Andrew On Wed, Jun 5, 2019 at 11:25 AM Todd Lipcon wrote: > Hi Kudu community, > > I'm happy to announce that the Kudu PMC has voted to add Yingchun Lai as a > new committer and PMC member. > > Yingchun has been contributing to K

[ANNOUNCE] Apache Kudu 1.9.0 Released

2019-03-12 Thread Andrew Wong
The Apache Kudu team is happy to announce the release of Kudu 1.9.0! Kudu is an open source storage engine for structured data which supports low-latency random access together with efficient analytical access patterns. It is designed within the context of the Apache Hadoop ecosystem and supports

Re: Unable to load consensus metadata (the metadata file is missed and the part of tablets is unavailable)

2018-06-05 Thread Andrew Wong
The root cause of the issue is a bit nuanced and it boils down to the fact that the consensus metadata doesn't always get fsynced, and a hard shut down can thus lead to the posted behavior. This comment

Re: subscriber

2017-12-15 Thread Andrew Wong
To subscribe to the user list, send a message on over to user-subscr...@kudu.apache.org! On Thu, Dec 14, 2017 at 10:54 PM, zha...@broadtech.com.cn < zha...@broadtech.com.cn> wrote: > subscriber > > -- > zha...@broadtech.com.cn > -- Andrew Wong

Re: Data inconsistency after restart

2017-12-06 Thread Andrew Wong
rted then I can understand why we ended up with >>>> inconsistent data. But, if I understand you correct, you are saying that >>>> these jobs are not critical for ingestion. In the link you provided I read >>>> "Impala scans are currently performed as READ_LATEST

Re: Data inconsistency after restart

2017-12-05 Thread Andrew Wong
re or somesuch). > Following up on what I posted, take a look at https://kudu.apache.org/docs/transaction_semantics.html#_read_operations_scans. It seems definitely possible that not all of the rows had finished inserting when counting, or that the scans were sent to a stale replica. On Tue, Dec

Re: Data inconsistency after restart

2017-12-05 Thread Andrew Wong
t server at a time or > something like that)? > > The table design uses 50 tablets per day (times 90 days). It is 8 TB of > data after 3xreplication over 5 tablet servers. > > Thanks, > Petter > > > -- Andrew Wong

Re: RPC negotiation error

2017-12-04 Thread Andrew Wong
llo( > Negotiator.java:178) > at org.apache.kudu.client.TabletClient.channelConnected( > TabletClient.java:586) > at org.apache.kudu.client.shaded.org.jboss.netty.channel. > SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler. > java:100) > at org.apache.kudu.client.TabletClient.handleUpstream( > TabletClient.java:595) > > Thanks, > Zhen > -- Andrew Wong

Re: Help me understand Kudu scalability limitations

2017-11-29 Thread Andrew Wong
ted. We can use it > for HDFS of course and Kafka or something else but my concern is why Kudu > cannot use more than 8Tb per node. Is it something that is going to change > in future maybe? > > On Wed, Nov 29, 2017 at 1:06 PM, Andrew Wong wrote: > >> Hi Boris, >> &g

Re: co-locating kudu table servers with HDFS data nodes

2017-11-29 Thread Andrew Wong
G before > replication ) . Does KUDU tries to distribute data across tablet servers > for each table i.e. slow performance with too much sparse data. i.e. for > small table what is better fewer disk partitions ( host-partition ) vs > evenly distributed across worker nodes. > > Thanks, > Sunil Parmar > -- Andrew Wong

Re: Need help on starting a new Kudu service

2017-11-29 Thread Andrew Wong
agent > Thu Nov 23 20:21:45 EST 2017: KUDU_HOME: > /opt/cloudera/parcels/KUDU-1.4.0-1.cdh5.12.1.p0.10/lib/kudu > Thu Nov 23 20:21:45 EST 2017: CONF_DIR: > /run/cloudera-scm-agent/process/315-kudu-KUDU_TSERVER > Thu Nov 23 20:21:45 EST 2017: CMD: tserver > Thu Nov 23 20:21:45 EST 2017: Found master(s) on pocnnr1n1.raymond.com > Wrote minidump to > /var/log/kudu/minidumps/kudu-tserver/5537e0e2-0271-b1eb-5242dea7-5a3e5df1.dmp > > > > > ** > *Sincerely yours,* > > > *Raymond* > -- Andrew Wong

Re: Help me understand Kudu scalability limitations

2017-11-29 Thread Andrew Wong
above I am going to have a bunch of tables with lots of rows. > > I do not have an option to pick a different hardware configuration for our > cluster. > > thanks > -- Andrew Wong

Re: Confused where to post user type questions

2017-11-29 Thread Andrew Wong
; We also have official channel for paying CDH customers but there are > benefits to use informal ones :) > > Thanks for such an amazing product and everything you do! > > Boris > -- Andrew Wong

Re: Time-travel reads via SQL query

2017-11-28 Thread Andrew Wong
ts more than 5 mins in the past >> you need to increase the "--tablet_history_max_age_sec" flag so that the >> history won't get garbage collected. >> >> HTH >> -david >> >> On Mon, Nov 27, 2017 at 9:42 PM, Andrew Wong wrote: >> >&

Re: Time-travel reads via SQL query

2017-11-27 Thread Andrew Wong
| Twitter > <https://twitter.com/impactradius> | Facebook > <https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn > <https://www.linkedin.com/company/impact-radius-inc-> > -- Andrew Wong

Re: How to control network traffic when ts crash.

2017-09-04 Thread Andrew Wong
e >> the configure num_tablets_to_copy_simultaneously from 10 to 1, Can it be >> meet my needs? >> >> King Lee >> >> 2017-09-05 3:35 GMT+08:00 Andrew Wong : >> >>> Hi Li, >>> >>> What errors are you seeing when the network traff

Re: How to control network traffic when ts crash.

2017-09-04 Thread Andrew Wong
ration > between ts. Network traffic will full and can not write or write > normally. it’s unacceptable. Is there any good way to control network > traffic when ts crash, and write or read service is not affect. Thanks! > -- Andrew Wong

Re: [KUDU] Adding tablet server data directories

2017-08-14 Thread Andrew Wong
ential. If you are not the intended recipient, > or the person responsible for delivering the message to the intended > recipient, you are hereby notified that any dissemination, distribution or > copying of this communication is strictly prohibited. All unintended > recipients are obliged to delete this message and destroy any printed > copies. > > > -- Andrew Wong