Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2024-01-10 Thread Ken Krugler
} public long getValue() { return value; } public void setValue(long value) { this.value = value; } } } > > > > On Fri, Oct 20, 2023 at 7:26 AM Ken Krugler <mailto:kkrugler_li...@transpac.com>> wrot

Re: [DISCUSS] FLIP-413: Enable unaligned checkpoints by default

2024-01-07 Thread Ken Krugler
increase the aligned checkpoints timeout from 0ms to >>> 5s. I think this change is the right one to do for the majority of Flink >>> users. >>> >>> For more rationale please take a look into the short FLIP-413 [1]. >>> >>> What do you all th

Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2024-01-05 Thread Ken Krugler
modification, and we can avoid over-design in > the current FLIP. > > Thanks for your feedback! > > Best, > Fang Yong > > On Fri, Dec 29, 2023 at 12:38 PM Ken Krugler <mailto:kkrugler_li...@transpac.com>> wrote: >> Hi Xintong, >> >> I agree that decoupli

Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-28 Thread Ken Krugler
scussion. At the moment, I'm not aware of any community consensus on > doing so. And even if in the future we decide to do so, the changes needed > should be the same w/ or w/o this FLIP. So I'd suggest not to block this > FLIP on these issues. > > WDYT? > > Best, > >

Re: [DISCUSS] FLIP-398: Improve Serialization Configuration And Usage In Flink

2023-12-14 Thread Ken Krugler
fluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink > > Best, > Fang Yong -- Ken Krugler http://www.scaleunlimited.com Custom big data solutions Flink & Pinot

Using Fury serializer with Flink

2023-12-11 Thread Ken Krugler
same as Kryo. And it supposedly handles schema evolution, though I haven’t tried that, assuming you configure the builder properly. Currently they’re not guaranteeing binary compatibility across 0.x releases until 1.0 is out (currently at 0.4.1), so might make sense to wait for that. — Ken ------

Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2023-10-19 Thread Ken Krugler
s, > > Dong and Xuannan > > [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073749 -- Ken Krugler http://www.scaleunlimited.com Custom big data solutions Flink & Pinot

Re: [DISCUSS] FLIP-317: Upgrade Kryo from 2.24.0 to 5.5.0

2023-05-29 Thread Ken Krugler
ed for adoption. > > I'm looking forward to hearing everyone's feedback and suggestions. > > Thank you, > Kurt > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-317%3A+Upgrade+Kryo+from+2.24.0+to+5.5.0 -- Ken Krugler http://www.scaleunlimited.com Custom big data solutions Flink, Pinot, Solr, Elasticsearch

Re: Failed to commit consumer offsets for checkpoint 108768

2023-03-17 Thread Ken Krugler
committing the > latest consumed offsets. > Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: > The coordinator is not available. > > Regards, > Sucheth Shivakumar > website : https://sucheths.com > mobile : +1(650)-576-8050 > San Mateo, United States -

Re: [Consulting] Should a Flink class loader be a singleton?

2022-12-22 Thread Ken Krugler
lib folder, instead, i >> used sql-client.sh embedded -j /huid-flink-bundle-xxx.jar. >> >> So I'm wondering if the root reason is that they are loaded by two >> classloader instances? And it seems that FlinkUserCodeClassLoaders is a >> singleton, but ParentFirstClassLoader and ChildFirstClassLoader aren't from >> flink source code. >> -- Ken Krugler http://www.scaleunlimited.com Custom big data solutions Flink, Pinot, Solr, Elasticsearch

Re: Stack Overflow Question re compressing broadcast state

2022-11-21 Thread Ken Krugler
[2] https://issues.apache.org/jira/browse/FLINK-30113 > <https://issues.apache.org/jira/browse/FLINK-30113> > > On 18/11/2022 02:13, Ken Krugler wrote: >> Hi Dawid, >> >> Thanks for getting back to me. >> >> And yes, I read "Compression

Re: Stack Overflow Question re compressing broadcast state

2022-11-17 Thread Ken Krugler
it, as KeyedState should be preferred in > majority of cases. > > Best, > > Dawid > > On 16/11/2022 23:27, Ken Krugler wrote: >> Hi all, >> >> Just posted this question on SO: How to enable compression for Flink >> broadcast state checkpoints >>

Stack Overflow Question re compressing broadcast state

2022-11-16 Thread Ken Krugler
eading the code wrong. Hoping someone (like Dawid Wysakowicz) can chime in here, thanks! — Ken ------ Ken Krugler http://www.scaleunlimited.com Custom big data solutions Flink, Pinot, Solr, Elasticsearch

Fwd: Stateful Functions with Flink 1.15 and onwards

2022-10-26 Thread Ken Krugler
gt;> cluster which will soon be upgrading to Flink 1.15. If I need to re-write >> all our code as a native Flink job (rather than a remote stateful function) >> then I need to get started right away. >> >> Many thanks, >> Fil >> -- Ken Krugler http://www.scaleunlimited.com Custom big data solutions Flink, Pinot, Solr, Elasticsearch

Re: More issues with top-level build for Tika 1.25 rc1 - Waited more than 5 minutes for a SAXParser

2020-11-23 Thread Ken Krugler
23, 2020, at 4:34 AM, Arvid Heise wrote: > > Hi Ken, > > just to double check, did you intend to send this mail to the tika dev > list? I actually don't know what to do with your email. > > Best, > > Arvid > > On Sat, Nov 21, 2020 at 11:43 PM Ken Krugler >

More issues with top-level build for Tika 1.25 rc1 - Waited more than 5 minutes for a SAXParser

2020-11-21 Thread Ken Krugler
for a SAXParser. Consider increasing the XMLReaderUtils.POOL_SIZE … and so on… Any suggestions? Thanks! — Ken -- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr

Re: Avoiding "too many open files" during leftOuterJoin with Flink 1.11/batch

2020-09-25 Thread Ken Krugler
> Are there possibly other sources for files being created? (anything in the > user-code?) > Could it be something annoying like the buffers rapidly switching between > spilling/reading from memory, creating a new file on each spill, overwhelming > the OS? > > On 9/17/2020

Avoiding "too many open files" during leftOuterJoin with Flink 1.11/batch - now with stack trace

2020-09-17 Thread Ken Krugler
eal source of the problem), but the leftOuterJoin failed first. -- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr

Avoiding "too many open files" during leftOuterJoin with Flink 1.11/batch

2020-09-17 Thread Ken Krugler
PS - a CoGroup is happening at the same time, so it’s possible that’s also hurting (or maybe the real source of the problem), but the leftOuterJoin failed first. -- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr

Re: HadoopOutputFormat has issues with LocalExecutionEnvironment?

2020-09-03 Thread Ken Krugler
do you mean by "algorithm version 2"? Where can you set this? (Sorry > for the question, I'm not an expert with Hadoop's FileOutputCommitter) > > Note to others: There's a related discussion here: > https://issues.apache.org/jira/browse/FLINK-19069 > > Best,

HadoopOutputFormat has issues with LocalExecutionEnvironment?

2020-08-25 Thread Ken Krugler
if Flink should always be using version 2 of the algorithm, as that’s more performant when there are a lot of results (which is why it was added). Thanks, — Ken -- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassa

Re: Inconsistent behavior between different states with ListState.get().iterator().remove()

2020-07-24 Thread Ken Krugler
ers > might have come to rely on it. > > The next best thing would be throwing an exception for RocksDB to at least > not silently ignore ineffective #remove() calls. > > Best, > Aljoscha > > On 23.07.20 20:40, Ken Krugler wrote: >> Hi devs, >> If you use the FsState

Inconsistent behavior between different states with ListState.get().iterator().remove()

2020-07-23 Thread Ken Krugler
ing RocksDB (serde cost for entire list), so I’d be in favor of the remove() call throwing an exception, at least with RocksDB. ------ Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr

Re: Potential block size issue with S3 binary files

2019-09-03 Thread Ken Krugler
ds reasonable. >> >> I am adding Arvid to the thread - IIRC he authored that tool in his >> Stratosphere days. And my a stroke of luck, he is now working on Flink >> again. >> >> @Arvid - what are your thoughts on Ken's suggestions? >> >> On Fri

Re: Potential block size issue with S3 binary files

2019-08-30 Thread Ken Krugler
y value, and store the POJO as the data value. Then you could also leverage Hadoop support for compression at either record or block level. — Ken > > On Thu, Aug 29, 2019 at 4:49 AM Ken Krugler <mailto:kkrugler_li...@transpac.com>> wrote: > Hi all, > > Wondering if anyone e

Re: Unit tests consuming a lot of disk

2019-06-14 Thread Ken Krugler
s. > > Is this a known issue? Or would this be something worth investigating as an > improvement? > > Thanks, > Tim -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com Custom big data solutions & training Flink, Solr, Hadoop, Cascading & Cassandra

Re: Support for controlling slot assignment based on CPU requirements

2019-06-13 Thread Ken Krugler
ph vertex do not goes into the same > slot/TM. > > Thank you~ > > Xintong Song > > > > On Thu, Jun 13, 2019 at 4:58 AM Ken Krugler > wrote: > >> Hi all, >> >> I’m running a complex (batch) workflow that has a step where it trains >> Fa

Support for controlling slot assignment based on CPU requirements

2019-06-12 Thread Ken Krugler
INK/Flink+Improvement+Proposals>, and didn’t see anything that covered this. I image it would need to be something like YARN’s support for per-node vCore capacity and per-task vCore requirements, but on a per-TM/per-operator basis. ------ Ken Krugler +1 530-210-637

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-03 Thread Ken Krugler
;>>>> >>>>> Local aggregation is a widely-adopted method to reduce the performance >>>>> degraded by data skew. We can decompose the aggregating operations into >>>> two >>>>> phases. In the first phase, we aggregate the el

Re: [DISCUSS] Apache Flink at Season of Docs

2019-04-11 Thread Ken Krugler
t 02:11, Aizhamal Nurmamat kyzy >>>> wrote: >>>> >>>>> Hello everyone, >>>>> >>>>> @Fabian Hueske - SoD setup is a little bit >>> different. >>>>> The ASF determined that each project would be allowed to apply >>>>> i

Re: [DISCUSS] Apache Flink at Season of Docs

2019-04-10 Thread Ken Krugler
, if you, as an organization decide to apply. > > I think it will be great if Flink participates in it too! > > Thanks, > Aizhamal > > [1] https://developers.google.com/season-of-docs/ > [2] https://developers.google.com/season-of-docs/docs/timeline -

Re: Ratelimiting in the Flink Kafka connector

2019-01-31 Thread Ken Krugler
, taking into account more/other factors. >>> >>> Both, adding the limiter and making the consumer code more adoptable >> could >>> be split into separate work also. >>> >>> BTW is there a JIRA for this? >>> >>> Thomas >

Re: [DISCUSS] Detection Flink Backpressure

2019-01-03 Thread Ken Krugler
uffers) instead. >>>> >>>> >>>>Best, >>>> Yun Gao >>>> >>>> >>>>[1] https://issues.apache.org/jira/browse/FLINK-10981 >>>>[2] https://issues.apache.org/jira/browse/FLINK-11082 >>>> >>>> >>>> -- >>>> From:裴立平 >>>> Send Time:2019 Jan. 3 (Thu.) 13:39 >>>> To:dev >>>> Subject:[DISCUSS] Detection Flink Backpressure >>>> >>>> Recently I want to optimize the way to find the positions where the >>>> backpressures occured . >>>> >>>> I read some blogs about flink-backpressure and have a rough idea of it . >>>> >>>> The method which Flink adopted is thread-stack-sample , it's heavy and >>>> no-lasting . >>>> >>>> The positions where backpressures occured are very important to the >>>> developers . >>>> >>>> They should be treated as monitor-metrics . >>>> >>>> Any other choice that we can take to detection the flink backpressures ? >>>> >>> >> >> -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com Custom big data solutions & training Flink, Solr, Hadoop, Cascading & Cassandra

Re: Unbalanced processing of KeyedStream

2019-01-02 Thread Ken Krugler
n. But to me it feels a bit hack-ish. > I would like to know if this is only solution with Flink or do I miss > something? > Can there be more API-ish support for such use-case from Flink? Is there a > reason why there is none? Or is there? > > On Wed, Jan 2, 2019 at 5:29 PM Ke

Re: Unbalanced processing of KeyedStream

2019-01-02 Thread Ken Krugler
edStream()` more useable for certain > scenarios. > * any other option I have? > > Many thanks in advance. > > Best, > Jozef -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com Custom big data solutions & training Flink, Solr, Hadoop, Cascading & Cassandra

Re: Enrich testing doc with more unit test examples using AbstractStreamOperator

2018-09-25 Thread Ken Krugler
gt; structure the content. > > I would really appreciate any feedback from you. Thanks in advance. > > Best Regards, > Tony Wei > > [1] > https://github.com/apache/flink/compare/master...tony810430:flink-testing-doc -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com Custom big data solutions & training Flink, Solr, Hadoop, Cascading & Cassandra

Re: Facing issue in RichSinkFunction

2018-07-05 Thread Ken Krugler
ethod, you create those objects using settings from other (serializable) fields that aren’t transient. — Ken -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com Custom big data solutions & training Flink, Solr, Hadoop, Cascading & Cassandra

Re: [WEBSITE] Proposal to rework the Flink website

2018-06-05 Thread Ken Krugler
Along these lines, it would help to add a sitemap (and the robots.txt required to reference it) for flink.apache.org and ci.apache.org (for /projects/flink) You can see what Tomcat did along these lines - http://tomcat.apache.org/robots.txt references http://tomcat.apache.org/sitemap.xml,

Re: Failure restarting Flink 1.5.0 job from checkpoint

2018-05-31 Thread Ken Krugler
t; Hi Ken, > > I think you might have independently discovered this issue: > https://issues.apache.org/jira/browse/FLINK-9458 > <https://issues.apache.org/jira/browse/FLINK-9458> > > Best, > Aljoscha > >> On 31. May 2018, at 01:46, Ken Krugler wrote: >>

Failure restarting Flink 1.5.0 job from checkpoint

2018-05-30 Thread Ken Krugler
Hi devs, I coded up a simple iteration that uses a KeyedProcessFunction, as a way of showing how to use timers to do state iteration. This worked fine, but then I wanted to try out checkpoints. I modified the KeyedProcessFunction to throw an exception after a fixed number of calls. When this

Re: Confusing error message in MemCheckpointStreamFactory with large state size?

2018-05-04 Thread Ken Krugler
/browse/FLINK-9300> — Ken > On Thu, May 3, 2018 at 7:56 PM, Ken Krugler <kkrugler_li...@transpac.com> > wrote: > >> Currently in the MemCheckpointStreamFactory.checkSize() method, it can >> throw an IOException with: >> >>throw n

[jira] [Created] (FLINK-9300) Improve error message when in-memory state is too large

2018-05-04 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-9300: -- Summary: Improve error message when in-memory state is too large Key: FLINK-9300 URL: https://issues.apache.org/jira/browse/FLINK-9300 Project: Flink Issue Type

[jira] [Created] (FLINK-9299) ProcessWindowFunction documentation Java examples have errors

2018-05-04 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-9299: -- Summary: ProcessWindowFunction documentation Java examples have errors Key: FLINK-9299 URL: https://issues.apache.org/jira/browse/FLINK-9299 Project: Flink

Confusing error message in MemCheckpointStreamFactory with large state size?

2018-05-03 Thread Ken Krugler
e> So shouldn’t it suggest using the RocksDB state backend instead? — Ken -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com Custom big data solutions & training Flink, Solr, Hadoop, Cascading & Cassandra

CoProcessFunction doesn't support timer on keyed stream

2018-04-25 Thread Ken Krugler
Hi devs, I’m using Flink 1.5-SNAPSHOT, and I’ve got a connected stream that I’m using with a CoProcessFunction. One of the streams is keyed, and the other is broadcast. As per the documentation

Validating markdown doc files before creating a PR

2018-04-16 Thread Ken Krugler
Hi all, I’m curious how devs check that their .md file edits are correct. I've tried a few different plugins and command line utilities, but the mix of HTML and markdown has meant none of them render properly. Thanks, — Ken -- Ken Krugler http://www.scaleunlimited.com

Re: Documentation glitch w/AsyncFunction?

2018-04-14 Thread Ken Krugler
intention is clearer. > > bq. resultFuture.complete(Collections.singleton(new > Tuple2<>(str, result))); > > Looking at existing code in unit tests, the complete() call is on the > parameter. > > Cheers > > On Sat, Apr 14, 2018 at 8:34 AM, Ken Krugler

Documentation glitch w/AsyncFunction?

2018-04-14 Thread Ken Krugler
Hi devs, https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/asyncio.html Has this example of asyncInvoke: > @Override > public void asyncInvoke(final String str,

Re: Multi-stream question

2018-04-07 Thread Ken Krugler
stream3) and produces an output stream. Can the > checkpointing and other logic accomodate this if I write sufficient custom > code in the operator? > > Michael > >> On Apr 7, 2018, at 10:42 AM, Ken Krugler <kkrugler_li...@transpac.com> wrote: >> >>

Re: [DISCUSS] Inverted (child-first) class loading

2018-03-12 Thread Ken Krugler
> - forbid duplicate classes > - parent first conflict resolution > - child first conflict resolution > > Having number one as the default and let the error message suggest options > two and three as options would definitely make users aware of the issue... > &g

Re: [DISCUSS] Inverted (child-first) class loading

2018-03-09 Thread Ken Krugler
I can’t believe I’m suggesting this, but perhaps the Elasticsearch “Hammer of Thor” (aka “jar hell”) approach would be appropriate here. Basically they prevent a program from running if there are duplicate classes on the classpath. This causes headaches when you really need a different version

[jira] [Created] (FLINK-8849) Wrong link from concepts/runtime to doc on chaining

2018-03-04 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-8849: -- Summary: Wrong link from concepts/runtime to doc on chaining Key: FLINK-8849 URL: https://issues.apache.org/jira/browse/FLINK-8849 Project: Flink Issue Type

Re: Terminating streaming test

2018-02-06 Thread Ken Krugler
gt; But probably this is a rather common testing need that's already solved?! > > Thanks, > Thomas -- Ken Krugler http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr

Re: Visualizing topologies

2017-03-01 Thread Ken Krugler
DOT is a common format for graph and flow >> visualizations. >> >> Let's see what others think. >> >> Best, Fabian >> >> >> 2017-02-24 0:15 GMT+01:00 Ken Krugler <kkrugler_li...@transpac.com>: >> >>> Hi Ufuk, >>&g

Re: Visualizing topologies

2017-02-23 Thread Ken Krugler
that’s reasonable, I’ll open an issue and attach the code. — Ken > On Wed, Feb 22, 2017 at 3:01 AM, Pattarawat Chormai > <pat.chor...@gmail.com> wrote: >> Hi Ken, >> >> Maybe you can look into this one : http://flink.apache.org/visualizer/. >> >> - Pa

Re: 1.2-SNAPSHOT documentation

2016-12-07 Thread Ken Krugler
l...@apache.org> wrote: >>> >>> Hi Ken, >>> >>> the docs on the website need to be built manually at the moment. So they >>> might be out of sync. >>> If you want the most recent documentation you can checkout the git >>> repository and

[jira] [Created] (FLINK-3880) Use ConcurrentHashMap for Accumulators

2016-05-06 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-3880: -- Summary: Use ConcurrentHashMap for Accumulators Key: FLINK-3880 URL: https://issues.apache.org/jira/browse/FLINK-3880 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-3839) Support wildcards in classpath parameters

2016-04-27 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-3839: -- Summary: Support wildcards in classpath parameters Key: FLINK-3839 URL: https://issues.apache.org/jira/browse/FLINK-3839 Project: Flink Issue Type: Improvement

RE: cascading-flink 1.0 results

2016-03-31 Thread Ken Krugler
omparePlatformsTest issue. I'll exclude > that test case. > > Thanks, Fabian > > 2016-03-30 21:46 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>: > >> Hi Fabian, >> >> I've been trying out the cascading-flink 1.0 branch (updated to >> cascadin

cascading-flink 1.0 results

2016-03-30 Thread Ken Krugler
in support later. > > Best, Fabian > > [1] https://github.com/dataartisans/cascading-flink/tree/flink-1.0 > > 2016-03-30 6:08 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>: > >> Hi Fabian, >> >>> From: Fabian H

RE: Expected duration for cascading-flink tests?

2016-03-29 Thread Ken Krugler
tried updating to Flink 1.0.0 (from 0.10.0), but so far I've run into some compilation errors, e.g. in FlinkFlowStep.java it can't find the JavaPlan class. Thanks again for the help, -- Ken > " > Best, Fabian > > 2016-03-29 20:36 GMT+02:00 Ken Krugler <kkrugler_li...@t

RE: Expected duration for cascading-flink tests?

2016-03-29 Thread Ken Krugler
) … Caused by: java.net.BindException: Address already in use … 2. FlowStepJob.blockOnJob throws a cascading.flow.FlowException All caused by a 100 second timeout Is the above expected? Thanks, -- Ken > From: Ken Krugler > Sent: March 28, 2016 3:39:12pm PDT > To: dev@flink.a

Expected duration for cascading-flink tests?

2016-03-28 Thread Ken Krugler
call() method. Maybe this is a sign that it's time for a new Mac :) Thanks, -- Ken -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr

RE: Guarantees for object reuse modes and documentation

2016-03-03 Thread Ken Krugler
gt; Thanks, Ken! I was wondering how other systems handle these issues. > > Fortunately, the deep copy - shallow copy problem doesn't arise in > Flink: when we copy an object, it is always a deep copy (at least, I > hope so :)). > > Best, > Gábor > > > >

RE: Guarantees for object reuse modes and documentation

2016-02-19 Thread Ken Krugler
> > > Greg and Gabor, please correct me if I did not get your points right or > missed something. > > What do others think? > > > Fabian > > > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#object-reuse-behavio

RE: User Feedback

2016-02-09 Thread Ken Krugler
ntation) impacts many open source projects. And everyone does in fact use Google :) Wouldn't adding a sitemap help here? Regards, -- Ken [snip] -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr