Re: Flink Kubernetes Operator 1.8.0 CRDs

2024-05-09 Thread Márton Balassi
Hi, What do you mean exactly by cannot be applied or replaced? What exactly is the issue? Are you installing fresh or trying to upgrade from a previous version? If the latter please follow this: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.8/docs/operations/upgrade/

Re: K8s operator - Stop Job with savepoint on session cluster via Java API

2023-09-03 Thread Márton Balassi
Hi Krzysztof, Please set upgradeMode to savepoint and change state from Running to Suspended on your application. This makes it so that you trigger an upgrade (as at least the job state changes) and for the upgrade we explicitly trigger a savepoint as you choose that for the upgrade mode. Importa

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.5.0 released

2023-05-17 Thread Márton Balassi
Thanks, awesome! :-) On Wed, May 17, 2023 at 2:24 PM Gyula Fóra wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink Kubernetes Operator 1.5.0. > > The Flink Kubernetes Operator allows users to manage their Apache Flink > applications and their lifecycle th

Re: Flink Forward Session Question

2023-01-02 Thread Márton Balassi
Hi Rion, Unlike the previous Flink Forwards to the best of my knowledge the latest edition was not uploaded to YouTube. It might make sense to reach out to the authors directly. On Sat, Dec 31, 2022 at 5:35 PM Rion Williams wrote: > Hey Flinkers, > > Firstly, early Happy New Year’s to everyone

Re: Deploy Flink on YARN or Kubernetes.

2022-12-20 Thread Márton Balassi
Hi Ruibin, Given that you are starting fresh I would recommend going with Kubernetes and specifically checking out the Flink Kubernetes Operator. [1] I have worked with Yarn for years before I transitioned to Kubernetes a year ago and I am pleased that we made the jump. To address you point on a v

Re: K8S operator support status

2022-12-08 Thread Márton Balassi
Hi Ruibin, 1. Standalone is indeed supported since 1.2 ( https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/overview/#cluster-deployment-modes), I will correct the Known issues, that is just an oversight that we left it there - thanks for reporting. 2. Reac

Re: Flink 1.15.3 Docker image

2022-11-29 Thread Márton Balassi
Done, please let me know if you see anything unexpected. On Tue, Nov 29, 2022 at 7:07 PM Márton Balassi wrote: > Hi Ben, > > Thanks for reaching out. Since the image repo has been updated [1] I can > pick this up. Will let you know when done. > > [1] > https://github.co

Re: Flink 1.15.3 Docker image

2022-11-29 Thread Márton Balassi
Hi Ben, Thanks for reaching out. Since the image repo has been updated [1] I can pick this up. Will let you know when done. [1] https://github.com/apache/flink-docker/commit/a22c0f04972a1d8539d9213b52fc0728eac8c1fa On Tue, Nov 29, 2022 at 4:28 PM Roberts, Ben (Senior Developer) via user < user@f

Re: [DISCUSS] FLIP-265 Deprecate and remove Scala API support

2022-10-13 Thread Márton Balassi
k but forgot to do so. I'll do some >>> more outreach via other channels as well. >>> >>> @Users of Flink, I've made a proposal to deprecate and remove Scala API >>> support in a future version of Flink. Your feedback on this topic is very >>>

Re: Unable to start job using Flink Operator

2022-07-28 Thread Márton Balassi
Hi Morgan Karl, Could you please post the logs of your operator pod? A possible explanation is that your operator might be stuck in an unhealthy state, hence the webhook can not be accessed. Best, Marton On Thu, Jul 28, 2022 at 5:37 PM Geldenhuys, Morgan Karl < morgan.geldenh...@tu-berlin.de> wr

Re: Reply:DelegationTokenManager

2022-06-20 Thread Márton Balassi
Hi, For your information G (ccd) is actively working on this topic. [1] He is in the best position to answer your questions as far as I know. :-) [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework On Tue, Jun 21, 2022 at 8:38 AM vtygoss wrote:

Re: Add me to slack

2022-06-07 Thread Márton Balassi
Seems you have been added or found the link since then. :-) https://flink.apache.org/community.html#slack On Tue, Jun 7, 2022 at 12:28 AM Sahil Aulakh wrote: > Hi > > Please add me to the slack group. > > Thanks > Sahil Aulakh >

Re: Resizing kube container sizes dynamically for custom jobs

2022-05-11 Thread Márton Balassi
Hi Morgan, Jobs running in a session cluster share the taskmanagers, so you are not able to configure them on a per job basis. I welcome you to check out the Flink Kubernetes Operator's session job example [1] that highlights this behavior: You specify container resources when you submit the sessi

Re: flink operator sometimes cannot start jobmanager after upgrading

2022-05-02 Thread Márton Balassi
Hi ChangZhuo, Thanks for reporting this, I think I have just run into this myself too. Will try to reproduce it, but I do not fully comprehend it yet. If anyone has a way to reproduce it is more than welcome. :-) On Fri, Apr 29, 2022 at 12:16 PM ChangZhuo Chen (陳昌倬) wrote: > Hi, > > We found th

Re: Flink native k8s integration vs. operator

2022-01-13 Thread Márton Balassi
Hi All, I am pleased to see the level of enthusiasm and technical consideration already emerging in this thread. I wholeheartedly support building an operator and endorsing it via placing it under the Apache Flink umbrella (as a separate repository) as the current lack of it is clearly becoming an

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-10 Thread Márton Balassi
;> connector and other connectors. >> >> Best, >> Jark >> >> >> On Fri, 7 Aug 2020 at 17:54, Robert Metzger wrote: >> >>> Hi, >>> >>> Thank you for picking this up so quickly. I have no objections regarding >>> all the pro

Re: [DISCUSS] Upgrade HBase connector to 2.2.x

2020-08-07 Thread Márton Balassi
Hi Robert and Gyula, Thanks for reviving this thread. We have the implementation (currently for 2.2.3) and it is straightforward to contribute it back. Miklos (ccd) has recently written a readme for said version, he would be interested in contributing the upgraded connector back. The latest HBase

Re: [DISCUSS] FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API)

2020-07-30 Thread Márton Balassi
Hi All, Thanks for the write up and starting the discussion. I am in favor of unifying the APIs the way described in the FLIP and deprecating the DataSet API. I am looking forward to the detailed discussion of the changes necessary. Best, Marton On Wed, Jul 29, 2020 at 12:46 PM Aljoscha Krettek

Re: Building with Hadoop 3

2019-12-04 Thread Márton Balassi
Wearing my Cloudera hat I can tell you that we have done this exercise for our distros of the 3.0 and 3.1 Hadoop versions. We have not contributed these back just yet, but we are open to do so. If the community is interested we can contribute those changes back to flink-shaded and suggest the nece

Re: [DISCUSS] Dropping Scala 2.10

2017-09-19 Thread Márton Balassi
Hi Aljoscha, I am in favor of the change. No concerns on my side, just one remark that I have talked to Sean last week (ccd) and he mentioned that he has faced some technical issues while driving the transition from 2.10 to 2.12 for Spark. It had to do with changes in the scope of implicits. You m

Re: FlinkML and DataStream API

2016-12-21 Thread Márton Balassi
Thanks for mentioning it, Theo. Here it is: https://github.com/streamline-eu/ML-Pipelines/tree/stream-ml Look at these examples: https://github.com/streamline-eu/ML-Pipelines/commit/314e3d940f1f1ac7b762ba96067e13d806476f57 On Wed, Dec 21, 2016 at 9:38 PM, wrote: > I'm interested in that code y

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-06 Thread Márton Balassi
+1. It keeps it both organized and to a reasonable minimum overhead. Would you volunteer for starting the mail thread each month then, Kostas? Best, Marton On Tue, Dec 6, 2016 at 6:42 AM, Kostas Tzoumas wrote: > Hi folks, > > I'd like to see how the community feels about a monthly "Who is hir

Re: how can I name a sink?

2016-10-17 Thread Márton Balassi
No problem, it is great that you have found the solution. On Mon, Oct 17, 2016 at 12:16 PM, 侯林蔚 wrote: > More information. > my code are like this: > [image: 内嵌图片 1] > > and I find I can name a sink by change code like this : > > [image: 内嵌图片 2] > > sorry for my reckless behavior. > > 2016-10-

Re: Regarding

2016-10-08 Thread Márton Balassi
wrote: > Hi, > > Yes , I am getting those errors on Netbeans even for a basic WordCount > implementation.Also, yes, I am on windows. > > Thanks, > Rashmi > > On Sat, Oct 8, 2016 at 11:36 PM, Márton Balassi > wrote: > >> Hey, >> >> So you get t

Re: Regarding

2016-10-08 Thread Márton Balassi
Hi Rashmi, The issue is that although you have a JobManager running (the master component of a Flink cluster scheduling the jobs) there are no TaskManagers running (the components doing the actual work). Hence you got the log line "Resources available to scheduler: Number of instances=0, total num

Re: Flink JDBCOutputFormat logs wrong WARN message

2016-09-22 Thread Márton Balassi
Done. Go ahead, Swapnil. Best, Marton On Thu, Sep 22, 2016 at 1:03 PM, Swapnil Chougule wrote: > Hi Fabian/ Chesnay > Can anybody give me permission to assign JIRA (created for same.)? > > Thanks, > Swapnil > > On Tue, Sep 20, 2016 at 6:18 PM, Swapnil Chougule > wrote: > >> Thanks Chesnay & Fa

Re: Streaming issue help needed

2016-09-14 Thread Márton Balassi
Dear Vaidya, This seems weird, me guess is that somehow that Time and AbstractTime implementations are not from the same Flink version. According to your Maven build you should be using Flink 0.10.2. Since then there have been changes to windowing, are you tied to that version or would it be feas

Re: error for building flink-runtime from source

2016-07-12 Thread Márton Balassi
Hi Radu, Which version of Flink are you building? Looking at the current master builds they are coming in green recently [1]. If you are solely building flink-runtime the issue might be that you are using different version of flink-core (a dependency of flink-runtime) and flink-runtime. Could yo

Re: Streaming Exception error message Explanation

2016-07-07 Thread Márton Balassi
Hi Subash, Unfortunately you can not reference a DataStream (loop) within a Flink operator. To handle both casual and feedback data I suggest using CoOperators. Have a look at this mockup I did some time ago for a conceptually similar problem. [1] [1] https://github.com/streamline-eu/ML-Pipelines

Re: Flink and Calcite

2016-07-06 Thread Márton Balassi
Hey Radu, It is in master, you find the related module under flink-libraries/flink-table in the directory structure. [1] [1] https://github.com/apache/flink/blob/master/flink-libraries/flink-table/pom.xml#L77-L81 Best, Marton On Wed, Jul 6, 2016 at 3:49 PM, Radu Tudoran wrote: > Hi, > > > >

Re: Flink on YARN - how to resize a running cluster?

2016-06-30 Thread Márton Balassi
Hi Josh, Yes, currently that is a reasonable workaround. Best, Marton On Thu, Jun 30, 2016 at 12:38 PM, Josh wrote: > Hi Till, > > Thanks, that's very helpful! > So I guess in that case, since it isn't possible to increase the job > parallelism later, it might be sensible to use say 10x the p

Re: Why would slot sharing be undesirable?

2016-06-04 Thread Márton Balassi
Hi Leon, Basically you are trading away utilizing all of your resources in the cluster to cut network IO. With putting more and more tasks into the same slot you are pushing a job from being potentially network bound to usually CPU bound, as the parallel instances of tasks in the same slot sharing

Re: Flink Version 1.1

2016-05-18 Thread Márton Balassi
Hey Simon, The general policy is that the community aims to release a major version every 3 months. That would mean the next release coming out in early to mid June. I am not aware of the 1.1.0 schedule yet, but it is about time to start the discussion on that. Are you looking for a specific feat

Re: Using FlinkML algorithms in Streaming

2016-05-11 Thread Márton Balassi
[image: WeboGraffiti] > http://webograffiti.com > > > On Wednesday, 11 May 2016 7:19 PM, Márton Balassi < > balassi.mar...@gmail.com> wrote: > > > Hey Piyush, > > Would you like to train or predict on the streaming data? > > Best, > > Marton > >

Re: Using FlinkML algorithms in Streaming

2016-05-11 Thread Márton Balassi
Hey Piyush, Would you like to train or predict on the streaming data? Best, Marton On Wed, May 11, 2016 at 3:44 PM, Piyush Shrivastava wrote: > Hello all, > > I want to perform linear regression using FlinkML's > MultipleLinearRegression() function on streaming data. > > This function takes a

Re: Thanks everyone

2016-04-22 Thread Márton Balassi
Hi Prez, Thanks for sharing, the community is always glad to welcome new Flink users. Best, Marton On Sat, Apr 23, 2016 at 6:01 AM, Prez Cannady wrote: > We’ve completed our first full sweep on a five node Flink cluster and it > went beautifully. On behalf of my team, thought I’d say thanks

Re: throttled stream

2016-04-16 Thread Márton Balassi
There is a utility in flink-streaming-examples that might be useful, but is generally the same idea that Niels suggests. [1] [1] https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/utils/ThrottledIterator.java On Su

Re: SourceFunction Scala

2016-03-06 Thread Márton Balassi
Hey Ankur, Add the following line to your imports, and have a look at the referenced FAQ. [1] import org.apache.flink.streaming.api.scala._ [1] https://flink.apache.org/faq.html#in-scala-api-i-get-an-error-about-implicit-values-and-evidence-parameters Best, Marton On Sun, Mar 6, 2016 at 8:17

Re: Kafka issue

2016-03-03 Thread Márton Balassi
I have inspected mvn dependency:tree in the meantime, the maven build fortunately looks healthy fortunately, it seems my IntelliJ is very keen on the freshly acquired dependencies it has gathered recently for scala 2.11. On Thu, Mar 3, 2016 at 1:04 PM, Márton Balassi wrote: > Hey guys, &g

Re: Kafka issue

2016-03-03 Thread Márton Balassi
Hey guys, I have run into the same issue when developing against the master. Now after Max's commit supposedly fixing the issue reimporting the project gives me all the dependencies for 2.10, except for scala-compiler and scala-reflect, which come in version 2.11. It seems very weird. Do you have

Re: Mapping two datasets

2016-02-25 Thread Márton Balassi
ent a custom method to load required data for a split > within a map operation, which will be less expensive than a join for my > case. > > Thank you, > Saliya > > On Thu, Feb 25, 2016 at 11:45 AM, Márton Balassi > wrote: > >> Hey Saliya, >> >> I would ad

Re: Mapping two datasets

2016-02-25 Thread Márton Balassi
Hey Saliya, I would add a uniqe ID to both the DataSets, the variable you referred to as 'i'. Then you can join the two DataSets on the field containing 'i' and do the mapping on the joined result. Hope this helps, Marton On Thu, Feb 25, 2016 at 5:38 PM, Saliya Ekanayake wrote: > Hi, > > I've

Re: [VOTE] Release Apache Flink 1.0.0 (RC1)

2016-02-25 Thread Márton Balassi
Thanks for creating the candidate Robert and for the heads-up, Slim. I would like to get a PR [1] in before 1.0.0 as it breaks hashing behavior of DataStream.keyBy. The PR has the feature implemented and the java tests adopted, there is still a bit of outstanding fix for the scala tests. Gábor Hor

Re: Use jvm to run flink on single-node machine with many cores

2016-02-21 Thread Márton Balassi
Dear Ana, If you are using a single machine with multiple cores, but need convenient access to the configuration I would personally recommend using the local cluster option in the flink distribution. [1] If you want to avoid having a flink distro on the machine, then Robert's solution is the way t

Re: Flink writeAsCsv

2016-02-04 Thread Márton Balassi
Hey Radu, As you are using the streaming api I assume that you call env.execute() in both cases. Is that the case? Do you see any errors appearing? My first call would be if your data type is not a tuple type then writeAsCsv does not work by default. Best, Marton On Thu, Feb 4, 2016 at 11:36 A

Re: DataStreamUtils and Scala

2016-02-02 Thread Márton Balassi
Opened the PR. [1] Will merge the re-add "getJavaStream()" method commit as soon as travis passes if no objections, the second approach can be discussed on github. [1] https://github.com/apache/flink/pull/1574 Best, Marton On Mon, Feb 1, 2016 at 10:56 PM, Márton Balassi wrote: >

Re: DataStreamUtils and Scala

2016-02-01 Thread Márton Balassi
getJavaStream()" method. There are probably >> other cases when people need it, and it does not hurt to expose it. >> >> Also, it allows you to combine programs written partly against the Java >> and Scala API (if you would ever want to do that). >> >> Stephan &g

Re: DataStreamUtils and Scala

2016-02-01 Thread Márton Balassi
Hey Cory, Sorry, I did not mean to break your code. One solution that I could suggest is to do it the way we have it for the batch api, namely having a scala version for DataStreamUtils too. It might be placed under flink-contrib for the time being. Would that solution fit your needs? On Mon, Fe

Re: Flink Execution Plan

2016-01-14 Thread Márton Balassi
Hey Alex, Flink has 3 abstractions having a Graph suffix in place currently for streaming jobs: * StreamGraph: Used for representing the logical plan of a streaming job that is under construction in the API. This one is the only streaming specific in this list. * JobGraph: Used for representi

Re: Understanding Kmeans in Flink

2015-12-28 Thread Márton Balassi
Hey Hajira, Basically lines 2) to 5) determine the "mean" (centroid) of the new clusters that we have just defined by assigning the points in line 1). As calculating the mean is a non-associative function we break it down to two associative parts: summation and counting - which is followed by divi

Re: Flink application with HBase

2015-12-22 Thread Márton Balassi
Hi Thomas, You can use both of the suggested solutions. The benefit that you might get from HBaseOutputformat that it is already tested and integrated with Flink as opposed to you having to connect to HBase in a general SinkFunction. Best, Marton On Dec 22, 2015 1:04 PM, "Thomas Lamirault" wro

Re: Problems with using ZipWithIndex

2015-12-12 Thread Márton Balassi
Hey Filip, As you are using the scala API it is easier to use the Scala DataSet utils, which are accessible after the following import: import org.apache.flink.api.scala.utils._ Then you can do the following: val indexed = data.zipWithIndex Best, Marton On Sat, Dec 12, 2015 at 7:48 PM, Fili

Re: Error when using scala api .fromElements() / .fromCollection()

2015-12-07 Thread Márton Balassi
Hey Alex, Try adding the following import: import org.apache.flink.streaming.api.scala._ This adds all the implcit utilities that Flink needs to determine type info. Best, Marton On Mon, Dec 7, 2015 at 10:24 AM, lofifnc wrote: > Hi, > > I'm getting an error when using .fromElements() of the

Re: Question about flink message processing guarantee

2015-12-01 Thread Márton Balassi
Dear Jerry, If you do not enable checkpointing you get the at most once processing guarantee (some might call that no guarantee at all). When you enable checkpointing you can choose between exactly and at least once semantics. The latter provides better latency. Best, Marton On Tue, Dec 1, 2015

Re: key

2015-11-30 Thread Márton Balassi
Hey Radu, To add to Till's comment: do you need the whole Event type to be the key are would you like to group the records based on the value of one of your attributes (the 2 longs, int or string as mentioned)? If the latter is true Flink comes with utilities to use standard types as keys. In the

Re: Does 'DataStream.writeAsCsv' suppose to work like this?

2015-10-26 Thread Márton Balassi
y written to HDFS. > > Best regards, > Max > > On Mon, Oct 26, 2015 at 8:36 AM, Márton Balassi > wrote: > >> The problem persists in the current master, simply a format.flush() is >> needed here [1]. I'll do a quick hotfix, thanks for the report again! >> >

Re: Does 'DataStream.writeAsCsv' suppose to work like this?

2015-10-26 Thread Márton Balassi
a#L99 On Mon, Oct 26, 2015 at 8:23 AM, Márton Balassi wrote: > Hey Rex, > > Writing half-baked records is definitely unwanted, thanks for spotting > this. Most likely it can be solved by adding a flush at the end of every > invoke call, let me check. > > Best, > > Marton

Re: Does 'DataStream.writeAsCsv' suppose to work like this?

2015-10-26 Thread Márton Balassi
Hey Rex, Writing half-baked records is definitely unwanted, thanks for spotting this. Most likely it can be solved by adding a flush at the end of every invoke call, let me check. Best, Marton On Mon, Oct 26, 2015 at 7:56 AM, Rex Ge wrote: > Hi, flinkers! > > I'm new to this whole thing, > an

Re: Flink+avro integration

2015-10-19 Thread Márton Balassi
Hi Andrew, 1a, In general Flink can read and write Avro data through the AvroInputFormat and AvroOutputtFormat in both the batch and the streaming API. In general you can write the following: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream dataSt

Re: Powered by Flink

2015-10-19 Thread Márton Balassi
Thanks for starting and big +1 for making it more prominent. On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske wrote: > Thanks for starting this Kostas. > > I think the list is quite hidden in the wiki. Should we link from > flink.apache.org to that page? > > Cheers, Fabian > > 2015-10-19 14:50 GMT

Re: setSlotSharing NPE: Starting a stream consumer in a thread

2015-10-03 Thread Márton Balassi
Suneel, Flink comes with a built-in AvroOutputFormat. Is that good enough for you? On Sun, Oct 4, 2015 at 6:01 AM, Suneel Marthi wrote: > While on that Marton, would it make sense to have a > dataStream.writeAsJson() method? > > On Sat, Oct 3, 2015 at 11:54 PM, Márton Balassi >

Re: setSlotSharing NPE: Starting a stream consumer in a thread

2015-10-03 Thread Márton Balassi
Hi Jay, As for the NPE: the file monitoring function throws it when the location is empty. Try running the datagenerator first! :) This behaviour is unwanted though, I am adding a JIRA ticket for it. Best, Marton On Sun, Oct 4, 2015 at 5:28 AM, Márton Balassi wrote: > Hi Jay, > > C

Re: setSlotSharing NPE: Starting a stream consumer in a thread

2015-10-03 Thread Márton Balassi
Hi Jay, Creating a batch and a streaming environment in a single Java source file is fine, they just run separately. (If you run it from an IDE locally they might conflict as the second one would try to launch a local executor on a port that is most likely already taken by the first one.) I would

Re: Kinesis Connector

2015-09-17 Thread Márton Balassi
Hi Giancarlo, I have no knowledge of someone working on such a project. However it would be a valuable contribution, if you were to start the effort please keep us notified, I would also suggest to file a JIRA ticket for it. Best, Marton On Thu, Sep 17, 2015 at 12:54 PM, Giancarlo Pagano wrote

Re: Using Flink with Redis question

2015-09-04 Thread Márton Balassi
Hey Jerry, Jay is on the right track, this issue has to do with the Flink operator life cycle. When you hit execute all your user defined classes get serialized, so that they can be shipped to the workers on the cluster. To execute some code before your FlatMapFunction starts processing the data y

Re: Question on flink and hdfs

2015-09-03 Thread Márton Balassi
Hi Jerry, Yes, you can. Best, Marton On Thu, Sep 3, 2015 at 8:57 PM, Jerry Peng wrote: > Hello, > > Does flink require hdfs to run? I know you can use hdfs to checkpoint and > process files in a distributed fashion. So can flink run standalone > without hdfs? >

Re: nosuchmethoderror

2015-09-02 Thread Márton Balassi
Dear Ferenc, The Kafka consumer implementations was modified from 0.9.0 to 0.9.1, please use the new code. [1] I suspect that your com.nventdata.kafkaflink.sink.FlinkKafkaTopicWriterSink depends on the way the Flink code used to look in 0.9.0, if you take a closer look Robert changed the function

Re: About exactly once question?

2015-08-27 Thread Márton Balassi
Dear Zhangrucong, >From your explanation it seems that you have a good general understanding of Flink's checkpointing algorithm. Your concern is valid, by default a sink C with emits tuples to the "outside world" potentially multiple times. A neat trick to solve this issue for your user defined si

Re: Custom Class for state checkpointing

2015-08-18 Thread Márton Balassi
Hey Rico, Currently the Checkpointed interface has the limitation the return type of the snapshotstate method (the generic paramter of Checkpointed) has to be java Serializable. I suspect that is the problem here. This is a limitation that we plan to lift soon. Marton On Tue, Aug 18, 2015 at 11:

Re: Forward Partitioning & same Parallelism: 1:1 communication?

2015-08-12 Thread Márton Balassi
nnections underneath or > is there some custom logic for streaming jobs? > > What happens if operator B has 2 times the parallelism of operator A? For > example if there were parallel tasks A1 and A2 and B1-B4: would A1 send to > B1 *and* B2 or just B1? > > – Ufuk > > On 12 Aug 2

Re: Forward Partitioning & same Parallelism: 1:1 communication?

2015-08-12 Thread Márton Balassi
Dear Nica, Yes, forward partitioning means that if subsequent operators share parallelism then the output of an upstream operator is sent to exactly one downstream operator. This makes sense for operators working on individual records, e.g. a typical map-filter pair, because as a consequence Flink

Re: Flink Streaming and flink-staging

2015-08-09 Thread Márton Balassi
Dear Jay, 1 - a) To use Flink Streaming all you need to do is to have it as a dependency as described in the documentation. [1] This would build it against the last snapshot that we have deployed which is updated on every successful continuous integration build (typically multiple times a day). If

Re: Best way to write data to HDFS by Flink

2015-06-28 Thread Márton Balassi
ions, it would be amazing if you could contribute this code to >> Flink. >> >> It seems like a very common use case, so this functionality will be >> useful to other user as well! >> >> Greetings, >> Stephan >> >> >> On Tue, Jun 23, 2

Re: Best way to write data to HDFS by Flink

2015-06-23 Thread Márton Balassi
imilar partition API or configuration > for this. > Thanks. > > > > Best regards > Hawin > > On Wed, Jun 10, 2015 at 10:31 AM, Hawin Jiang > wrote: > >> Thanks Marton >> I will use this code to implement my testing. >> >> >> >> Best

Re: Kafka0.8.2.1 + Flink0.9.0 issue

2015-06-23 Thread Márton Balassi
saw L286 to L296 are not correct information in pom.xml. >> Thanks. >> >> >> >> org.apache.maven.plugins > >maven-assembly-plugin 2.4 > > jar-with-dependencies >> >> >> On Thu, Jun 11, 2015 at 1:43 AM, Márton Balassi > > wrote:

Re: Reading separate files in parallel tasks as input

2015-06-14 Thread Márton Balassi
Hi Dani, The batch API does not expose an addSourse-like method, but you can always write your own inputformat and pass that directly to constructor of the DataSource. DataSource extends DataSet, so you will get all the usual methods in the end. For an example you can have a look e.g. here. [1] [

Re: Kafka0.8.2.1 + Flink0.9.0 issue

2015-06-11 Thread Márton Balassi
r ,and Flink slave2 on Datanode2 > Kafka, Zookeeper and Flink slave3 on Datanode3 > > > > On Thu, Jun 11, 2015 at 1:16 AM, Márton Balassi > wrote: > >> Dear Hawin, >> >> No problem, I am gald that you are giving our Kafka connector a try. :) >> The dependenc

Re: Kafka0.8.2.1 + Flink0.9.0 issue

2015-06-11 Thread Márton Balassi
; > > > org.apache.hadoop > > *hadoop*-*auth* > > 2.6.0 > > > > > > org.apache.hadoop > > *hadoop*-common > > 2.6.0 > > > > > > org.apache.hadoop > > *hadoop*-core > >

Re: Kafka0.8.2.1 + Flink0.9.0 issue

2015-06-11 Thread Márton Balassi
Dear Hawin, This looks like a dependency issue, the java compiler does not find the kafka dependency. How are you trying to run this example? Is it from an IDE or submitting it to a flink cluster with bin/flink run? How do you define your dependencies, do you use maven or sbt for instance? Best,

Re: Best way to write data to HDFS by Flink

2015-06-10 Thread Márton Balassi
Dear Hawin, You can pass a hdfs path to DataStream's and DataSet's writeAsText and writeAsCsv methods. I assume that you are running a Streaming topology, because your source is Kafka, so it would look like the following: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnv

Re: Building master branch is failed

2015-05-29 Thread Márton Balassi
Thanks, Max. On Fri, May 29, 2015 at 3:04 PM, Maximilian Michels wrote: > Fixed it on the master. > > Problem were some classes belonging to package "org.apache.flink.api.java" > were in the folder "src/main/java/*org.apache.flink*/api/java/" instead > of "src/main/java/org/apache/flink/api/java

Re: Building master branch is failed

2015-05-29 Thread Márton Balassi
mvn clean install -DskipTests -Dmaven.javadoc.skip=true runs for me. The scala shell seems to have some javadoc issue, the test throws an Address already in use for me - but that might be just my problem. On Fri, May 29, 2015 at 12:33 PM, Aljoscha Krettek wrote: > I think it's caused by the Scal

Re: Write Stream to HBase

2015-05-20 Thread Márton Balassi
I'm merging the pull request, it was blocked by the streaming operator rework so it is free to go since yesterday. I do agree that it needs some additional love before it can be on the master, but I am positive that it should be there this week. On May 20, 2015 11:16 AM, "Robert Metzger" wrote:

Re: Sorting in a WindowedDataStream

2015-04-14 Thread Márton Balassi
Dear Niklas, To do that you can use WindowedDataStream.mapWindow(). This gives you an iterator to all the records in the window and you can do whatever you wish with them. One thing to note if sorting windows of the stream might add considerable latency to your job. Best, Marton On Tue, Apr 14

Re: Flink meetup group in Stockholm

2015-04-08 Thread Márton Balassi
Nice, big +1 for the project. On Wed, Apr 8, 2015 at 4:39 PM, Gyula Fóra wrote: > Hey Everyone! > > We our proud to announce the first Apache Flink meetup group in Stockholm. > > Join us at http://www.meetup.com/Apache-Flink-Stockholm/ > > We are looking forward to organise our first event in Ma

Re: GSoC project proposal: Query optimisation layer for Flink Streaming

2015-03-24 Thread Márton Balassi
Thanks for the proposal Wepngong and for the ping Robert. Sorry for my late reply. I like the general concept, and I do think that this topic is really "Flink-ish" in the sense of focusing on the optimization. Let me add some comments: Synopsis: * By reducing the system overhead the throughpu

Re: Flink execution time benchmark

2015-03-23 Thread Márton Balassi
Hi Giacomo, You are currently using the Flink Streaming API. Is that your intention or would you like to measure batch execution? Regarding your code: StreamExecutionEnvironment.readTextStream(filePath) monitors a file/directory and streams the updates to that location [1] - potentially indefinit

Fwd: Flink questions

2015-03-12 Thread Márton Balassi
Dear Emmanuel, I'm Marton, one of the Flink Streaming developers - Robert forwarded your issue to me. Thanks for trying out our project. 1) Debugging: TaskManager logs are currently not forwarded to the UI, but you can find them on the taskmanager machines in the log folder of your Flink distribu

Re: HDFS Clustering

2015-02-24 Thread Márton Balassi
Hey, Just add the the right prefix pointing to your hdfs filepath: bin/flink run -v flink-java-examples-*-WordCount.jar hdfs://hostname:port/PATH/TO/INPUT hdfs://hostname:port/PATH/TO/OUTPUT Best, Marton On Tue, Feb 24, 2015 at 11:13 AM, Giacomo Licari wrote: > Hi guys, > I'm Giacomo from It

Re: Community vote for Hadoop Summit result

2015-01-30 Thread Márton Balassi
ongrats guys! >> >> On Thu, Jan 29, 2015 at 4:06 PM, Márton Balassi >> wrote: >> > Hi everyone, >> > >> > Thanks for your support for the Flink talks at the community choice for >> the >> > next Hadoop Summit Europe. The results are out

Community vote for Hadoop Summit result

2015-01-29 Thread Márton Balassi
Hi everyone, Thanks for your support for the Flink talks at the community choice for the next Hadoop Summit Europe. The results are out. [1] Our submission "Real-Time Stream Processing with Apache Flink” [2] has been selected as the winner in the Future Hadoop track by Community Choice voting tha