Re: Handling large state (incremental snapshot?)

2016-04-06 Thread Hironori Ogibayashi
I tried RocksDB, but the result was almost the same. I used the following code and put 2.6million distinct records into Kafka. After processing all records, the state on the HDFS become about 250MB and time needed for the checkpoint was almost 5sec. Processing throughput was FsStateBackend->

Re: Possible use case: Simulating iterative batch processing by rewinding source

2016-04-06 Thread Christophe Salperwyck
Hi, I am interested too. For my part, I was thinking to use HBase as a backend so that my data are stored sorted. Nice to have to generate timeseries in the good order. Cheers, Christophe 2016-04-06 21:22 GMT+02:00 Raul Kripalani : > Hello, > > I'm getting started with Flink

Re: Back Pressure details

2016-04-06 Thread Zach Cox
Yeah I don't think that's the case for my setup either :) I wrote a simple Flink job that just consumes from Kafka and sinks events/sec rate to Graphite. That consumes from Kafka several orders of magnitude higher than the job that also sinks to Elasticsearch. As you said, the downstream back

Re: Back Pressure details

2016-04-06 Thread Ufuk Celebi
Ah sorry, I forgot to mention that in the docs. The way that data is pulled from Kafka is bypassing Flink's task Thread. The topic is consumed in a separate Thread and the task Thread is just waiting. That's why you don't see any back pressure for Kafka sources. I would expect your Kafka source

Re: Checkpoint state stored in backend, and deleting old checkpoint state

2016-04-06 Thread Zach Cox
Hi Stephan - incremental checkpointing sounds really interesting and useful, I look forward to trying it out. Thanks, Zach On Wed, Apr 6, 2016 at 4:39 AM Stephan Ewen wrote: > Hi Zach! > > I am working on incremental checkpointing, hope to have it in the master > in the next

Re: Back Pressure details

2016-04-06 Thread Zach Cox
The new back pressure docs are great, thanks Ufuk! I'm sure those will help others as well. In the Source => A => B => Sink example, if A and B show HIGH back pressure, should Source also show HIGH? In my case it's a Kafka source and Elasticsearch sink. I know currently our Kafka can provide data

Possible use case: Simulating iterative batch processing by rewinding source

2016-04-06 Thread Raul Kripalani
Hello, I'm getting started with Flink for a use case that could leverage the window processing abilities of Flink that Spark does not offer. Basically I have dumps of timeseries data (10y in ticks) which I need to calculate many metrics in an exploratory manner based on event time. NOTE: I don't

Re: Running Flink jobs directly from Eclipse

2016-04-06 Thread Christophe Salperwyck
I exported it in an environment variable before starting Flink: flink-0.10.1/bin/yarn-session.sh -n 64 -s 4 -jm 4096 -tm 4096 2016-04-06 15:36 GMT+02:00 Serhiy Boychenko : > What about YARN(and HDFS) configuration? I put yarn-site.xml directly into > classpath? Or I can

Re: Accessing RDF triples using Flink

2016-04-06 Thread Ritesh Kumar Singh
Hi Flavio, 1. How do you access your rdf dataset via flink? Are you reading it as a normal input file and splitting the records or you have some wrappers in place to convert the rdf data into triples? Can you please share some code samples if possible? 2. I am using Jena TDB

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Till Rohrmann
Great to hear that you solved your problem :-) On Wed, Apr 6, 2016 at 2:29 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > Till, > Found the issue, it was my bad assumption about GlobalConfiguration, > what I thought was once the configuration is read from the client machine

RE: Running Flink jobs directly from Eclipse

2016-04-06 Thread Serhiy Boychenko
What about YARN(and HDFS) configuration? I put yarn-site.xml directly into classpath? Or I can set the variables in the execution environment? I will give it a try tomorrow morning, will report back and if successful blog about it ofc ☺ From: Christophe Salperwyck

Re: CEP blog post

2016-04-06 Thread Ufuk Celebi
On Wed, Apr 6, 2016 at 2:18 PM, Matthias J. Sax wrote: > "Getting Started" in main page shows "Download 1.0" instead of 1.0.1 We always had it like that, but I agree that it can be confusing. 1.0 indicates the "series" and the download page shows the exact version. We can

Re: CEP blog post

2016-04-06 Thread Till Rohrmann
That is a good point Ufuk. Will add the note. On Wed, Apr 6, 2016 at 2:03 PM, Ufuk Celebi wrote: > The website has been updated for 1.0.1. :-) > > @Till: If you don't mention it in the post, it makes sense to have a > note at the end of the post saying that the code examples

Re: Accessing RDF triples using Flink

2016-04-06 Thread Flavio Pompermaier
Ho Ritesh, I have sone experience with Rdf and Flink. What do you mean for accessing a Jena model? How do you create it? >From my experience reading triples from jena models is evil because it has some problems with garbage collection. On 6 Apr 2016 00:51, "Ritesh Kumar Singh"

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Balaji Rajagopalan
Till, Found the issue, it was my bad assumption about GlobalConfiguration, what I thought was once the configuration is read from the client machine GlobalConfiguration params will passed on to the task manager nodes, as well, it was not and values from default was getting pickup, which was

Re: CEP blog post

2016-04-06 Thread Matthias J. Sax
"Getting Started" in main page shows "Download 1.0" instead of 1.0.1 -Matthias On 04/06/2016 02:03 PM, Ufuk Celebi wrote: > The website has been updated for 1.0.1. :-) > > @Till: If you don't mention it in the post, it makes sense to have a > note at the end of the post saying that the code

Re: CEP blog post

2016-04-06 Thread Ufuk Celebi
The website has been updated for 1.0.1. :-) @Till: If you don't mention it in the post, it makes sense to have a note at the end of the post saying that the code examples only work with 1.0.1. On Mon, Apr 4, 2016 at 3:35 PM, Till Rohrmann wrote: > Thanks a lot to all for

[ANNOUNCE] Flink 1.0.1 Released

2016-04-06 Thread Ufuk Celebi
The Flink PMC is pleased to announce the availability of Flink 1.0.1. The official release announcement: http://flink.apache.org/news/2016/04/06/release-1.0.1.html Release binaries: http://apache.openmirror.de/flink/flink-1.0.1/ Please update your Maven dependencies to the new 1.0.1 version and

Re: Running Flink jobs directly from Eclipse

2016-04-06 Thread Christophe Salperwyck
For me it was taking the local jar and uploading it into the cluster. 2016-04-06 13:16 GMT+02:00 Shannon Carey : > Thanks for the info! It is a bit difficult to tell based on the > documentation whether or not you need to put your jar onto the Flink master > node and run the

Re: Running Flink jobs directly from Eclipse

2016-04-06 Thread Shannon Carey
Thanks for the info! It is a bit difficult to tell based on the documentation whether or not you need to put your jar onto the Flink master node and run the flink command from there in order to get a job running. The documentation on

Re: Back Pressure details

2016-04-06 Thread Ufuk Celebi
Hey Zach, just added some documentation, which will be available in ~ 30 mins here: https://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/back_pressure_monitoring.html If you think that something is missing there, I would appreciate some feedback. :-) Back pressure is

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Till Rohrmann
Hmm I'm not a Redis expert, but are you sure that you see a successful ping reply in the logs of the TaskManagers and not only in the client logs? Another thing: Is the redisClient thread safe? Multiple map tasks might be accessing the set and get methods concurrently. Another question: The code

Re: Flink CEP AbstractCEPPatternOperator fail after event detection

2016-04-06 Thread Till Rohrmann
Hi Norman, which version of Flink are you using? We recently fixed some issues with the CEP library which looked similar to your error message. The problem occurred when using the CEP library with processing time. Switching to event or ingestion time, solve the problem. The fixes to make it also

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Balaji Rajagopalan
Till, I have checked from all the taskmanager nodes I am able to establish a connection by installing a redis-cli on those nodes. The thing is in the constructor I am able to set and get values, also I am getting PONG for the ping. But once object is initialized when I try to call

Re: Checkpoint state stored in backend, and deleting old checkpoint state

2016-04-06 Thread Stephan Ewen
Hi Zach! I am working on incremental checkpointing, hope to have it in the master in the next weeks. The current approach is a to have a full self-contained checkpoint every once in a while, and have incremental checkpoints most of the time. Having a full checkpoint every now and then spares you

Flink CEP AbstractCEPPatternOperator fail after event detection

2016-04-06 Thread norman sp
Hi, I'm trying out the new CEP library but have some problems with event detection. In my case Flink detects the event pattern: A followed by B within 10 seconds. But short time after event detection when the event pattern isn't matched anymore, the program crashes with the error message:

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Till Rohrmann
Hi Balaji, from the stack trace it looks as if you cannot open a connection redis. Have you checked that you can access redis from all your TaskManager nodes? Cheers, Till On Wed, Apr 6, 2016 at 7:46 AM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > I am trying to use AWS EMR

Re: Integrate Flink with S3 on EMR cluster

2016-04-06 Thread Ufuk Celebi
Yes, for sure. I added some documentation for AWS here: https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/aws.html Would be nice to update that page with your pull request. :-) – Ufuk On Wed, Apr 6, 2016 at 4:58 AM, Chiwan Park wrote: > Hi Timur, > >

Re: Running Flink jobs directly from Eclipse

2016-04-06 Thread Christophe Salperwyck
>From my side I was starting the YARN session from the cluster: flink-0.10.1/bin/yarn-session.sh -n 64 -s 4 -jm 4096 -tm 4096 Then getting the IP/port from the WebUI and then from Eclipse: ExecutionEnvironment env = ExecutionEnvironment.createRemoteEnvironment("xx.xx.xx.xx", 40631,

Running Flink jobs directly from Eclipse

2016-04-06 Thread Serhiy Boychenko
Cheerz, I have been working last few month on the comparison of different data processing engines and recently came across Apache Flink. After reading different academic papers on comparison of Flink with other data processing I would definitely give it a shot. The only issue I am currently

Re: Powered by Flink

2016-04-06 Thread Suneel Marthi
I was gonna hold off on that until we get Mahout 0.12.0 out of the door (targeted for this weekend). I would add Apache NiFi to the list. Future : Apache Mahout Apache BigTop Openstack and Kubernetes (skunkworks) On Wed, Apr 6, 2016 at 3:03 AM, Sebastian wrote: > You

Re: Powered by Flink

2016-04-06 Thread Sebastian
You should also add Apache Mahout, whose new Samsara DSL also runs on Flink. -s On 06.04.2016 08:50, Henry Saputra wrote: Thanks, Slim. I have just updated the wiki page with this entries. On Tue, Apr 5, 2016 at 10:20 AM, Slim Baltagi > wrote:

Re: Powered by Flink

2016-04-06 Thread Henry Saputra
Thanks, Slim. I have just updated the wiki page with this entries. On Tue, Apr 5, 2016 at 10:20 AM, Slim Baltagi wrote: > Hi > > The following are missing in the ‘Powered by Flink’ list: > >- *king.com * > >

Re: State in external db (dynamodb)

2016-04-06 Thread Sanne de Roever
FYI Cassandra has a TTL on data: https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_t.html On Wed, Apr 6, 2016 at 7:55 AM, Shannon Carey wrote: > Hi, new Flink user here! > > I found a discussion on user@flink.apache.org about using DynamoDB as a > sink. However,