Flink + Avro GenericRecord - first field value overwrites all other fields

2016-05-11 Thread Tarandeep Singh
Hi, I am using DataSet API and reading Avro files as DataSet. I am seeing this weird behavior that record is read correctly from file (verified by printing all values) but when when this record is passed to Flink chain/DAG (e.g. KeySelector), every field in this record has the same value as the fi

Re: Local Cluster have problem with connect to elasticsearch

2016-05-11 Thread Tzu-Li (Gordon) Tai
Hi Rafal, >From your description, it seems like Flink is complaining because it cannot access the Elasticsearch API related dependencies as well. You'd also have to include the following into your Maven build, under : org.elasticsearch elasticsearch 2.3.2 jar false ${proj

Re: Local Cluster have problem with connect to elasticsearch

2016-05-11 Thread rafal green
Thanks a lot for many answer :) Last time I write this email cause I don't understand what is the difference between LOCAL cluster (one node) and IntelliJ IDEA. Now I know :P https://ci.apache.org/projects/flink/flink-docs-master/apis/cluster_execution.html If you read this *"Linking with modul

Re: Local Cluster have problem with connect to elasticsearch

2016-05-11 Thread Martin Neumann
Hi, Are you sure the elastic cluster is running correctly? Open a browser and try 127.0.0.1:9200 that should give you the overview of the cluster. If you don't get it there is something wrong with the setup. Its also a good way to double check the cluster.name (I got that wrong more than once) I

get start and end time stamp from time window

2016-05-11 Thread Martin Neumann
Hej, I have a windowed stream and I want to run a (generic) fold function on it. The result should have the start and the end time stamp of the window as fields (so I can relate it to the original data). *Is there a simple way to get the timestamps from within the fold function?* I could find the

Re: reading from latest kafka offset when flink starts

2016-05-11 Thread Aljoscha Krettek
Hi, are you per change using Kafka 0.9? Cheers, Aljoscha On Tue, 10 May 2016 at 08:37 Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > Robert, > Regarding the event qps 4500 events/sec may not be large no, but I am > seeing some issue in processing the events due to processing pow

Re: Local Cluster have problem with connect to elasticsearch

2016-05-11 Thread Stephan Ewen
Seeing how you put a loopback address into the transport addresses, are you sure that an ElasticSearch node runs on every machine? On Wed, May 11, 2016 at 7:41 PM, Stephan Ewen wrote: > ElasticSearch is basically saying that it cannot connect. > > Is it possible that the configuration of elastic

Re: Local Cluster have problem with connect to elasticsearch

2016-05-11 Thread Stephan Ewen
ElasticSearch is basically saying that it cannot connect. Is it possible that the configuration of elastic may be incorrect, or some of the ports may be blocked? On Mon, May 9, 2016 at 7:05 PM, rafal green wrote: > Dear Sir or Madam, > > Can you tell me why I have a problem with elasticsearch

Re: HBase write problem

2016-05-11 Thread Stephan Ewen
Just to narrow down the problem: The insertion into HBase actually works, but the job does not finish after that? And the same job (same source of data) that writes to a file, or prints, finishes? If that is the case, can you check what status each task is in, via the web dashboard? Are all tasks

Re: HBase write problem

2016-05-11 Thread Flavio Pompermaier
I can't help you with the choice of the db storage, as always the answer is "it depends" on a lot of factors :) For what I can tell you the problem could be that Flink support HBase 0.98, so it could worth to update Flink connectors to a more recent version (that should be backward compatible hope

synchronizing two streams

2016-05-11 Thread Alexander Gryzlov
Hello, We're implementing a streaming outer join operator based on a TwoInputStreamOperator with an internal buffer. In our use-case only the items whose timestamps are within a several-second interval of each other can join, so we need to synchronize the two input streams to ensure maximal yield.

Re: HBase write problem

2016-05-11 Thread Palle
Hadoop 2.7.2 HBase 1.2.1 I have this running from a Hadoop job, but just not from Flink. I will look into your suggestions, but would I be better off choosing another DB for storage? I can see that Cassandra gets some attention in this mailing list. I need to store app 2 bio key value pairs consi

Re: HBase write problem

2016-05-11 Thread Flavio Pompermaier
And which version of HBase and Hadoop are you running? Did you try to put the hbase-site.xml in the jar? Moreover, I don't know how much reliable is at the moment the web client UI..my experience is that the command line client is much more reliable. You just need to run from the flink dir somethi

Re: Using FlinkML algorithms in Streaming

2016-05-11 Thread Simone Robutti
Actually model portability and persistence is a serious limitation to practical use of FlinkML in streaming. If you know what you're doing, you can write a blunt serializer for your model, write it in a file and rebuild the model stream-side with deserialized informations. I tried it for an SVM mo

Re: HBase write problem

2016-05-11 Thread Palle
I run the job from the cluster. I run it through the web UI. The jar file submitted does not contain the hbase-site.xml file. - Original meddelelse - > Fra: Flavio Pompermaier > Til: user > Dato: Ons, 11. maj 2016 09:36 > Emne: Re: HBase write problem > > Do you run the job from your I

Re: Using FlinkML algorithms in Streaming

2016-05-11 Thread Márton Balassi
Currently I am not aware of streaming learners support, you would need to implement that yourself at this point. As for streaming predictors for batch learners I have some preview code that you might like. [1] [1] https://github.com/streamline-eu/ML-Pipelines/blob/314e3d940f1f1ac7b762ba96067e13d8

Re: Using FlinkML algorithms in Streaming

2016-05-11 Thread Piyush Shrivastava
Hi Márton, I want to train and get the residuals.  Thanks and Regards,Piyush Shrivastava http://webograffiti.com On Wednesday, 11 May 2016 7:19 PM, Márton Balassi wrote: Hey Piyush, Would you like to train or predict on the streaming data? Best, Marton On Wed, May 11, 2016 at 3:44 PM,

Re: Using FlinkML algorithms in Streaming

2016-05-11 Thread Márton Balassi
Hey Piyush, Would you like to train or predict on the streaming data? Best, Marton On Wed, May 11, 2016 at 3:44 PM, Piyush Shrivastava wrote: > Hello all, > > I want to perform linear regression using FlinkML's > MultipleLinearRegression() function on streaming data. > > This function takes a

Using FlinkML algorithms in Streaming

2016-05-11 Thread Piyush Shrivastava
Hello all, I want to perform linear regression using FlinkML's MultipleLinearRegression() function on streaming data. This function takes a DataSet as an input and I cannot create a DataSet inside the MapFunction of a DataStream. How can I use this function on my DataStream?  Thanks and Regards,

Zookeeper Session Timeout

2016-05-11 Thread Konstantin Knauf
Hi everyone, I observed the following behavior with Flink 1.0.2 on Hadoop 2.4.1 with a yarn session in HA mode: 2016-05-10 18:39:14,546 INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 52444ms for sessionid 0x2544821cf2f818a, clos

Re: Bug while using Table API

2016-05-11 Thread Vasiliki Kalavri
Hi Simone, Fabian has pushed a fix for the streaming TableSources that removed the Calcite Stream rules [1]. The reported error does not appear anymore with the current master. Could you please also give it a try and verify that it works for you? Thanks, -Vasia. [1]: https://github.com/apache/fl

Re: Key factors for Flink's performance

2016-05-11 Thread Stephan Ewen
Hi Leon! I agree with Aljoscha that the term "microbatches" is confusing in that context. Flink's network layer is "buffer" oriented rather than "record oriented". Buffering it is a best effort to gather some elements in case where they come fast enough that this would not add much latency anyways

Re: Creating a custom operator

2016-05-11 Thread Fabian Hueske
2016-05-09 14:56 GMT+02:00 Simone Robutti : > >- You wrote you'd like to "instantiate a H2O's node in every task > manager". This reads a bit like you want to start H2O in the TM's JVM , but > I would assume that a H2O node runs as a separate process. So should it be > started inside the TM JVM or

Re: Regarding Broadcast of datasets in streaming context

2016-05-11 Thread Biplob Biswas
Hi Gyula, I tried doing something like the following in the 2 flatmaps, but i am not getting desired results and still confused how the concept you put forward would work: public static final class MyCoFlatmap implements CoFlatMapFunction{ Centroid[] centroids;

Re: HBase write problem

2016-05-11 Thread Flavio Pompermaier
Do you run the job from your IDE or from the cluster? On Wed, May 11, 2016 at 9:22 AM, Palle wrote: > Thanks for the response, but I don't think the problem is the classpath - > hbase-site.xml should be added. This is what it looks like (hbase conf is > added at the end): > > 2016-05-11 09:16:45

Re: Key factors for Flink's performance

2016-05-11 Thread Aljoscha Krettek
Hi, latency for Flink and Storm are pretty similar. The only reason I could see for Flink having the slight upper hand there is the fact that Storm tracks the progress of every tuple throughout the topology and requires ACKs that have to go back to the sinks. As for throughput you are right that F

Re: HBase write problem

2016-05-11 Thread Palle
Thanks for the response, but I don't think the problem is the classpath - hbase-site.xml should be added. This is what it looks like (hbase conf is added at the end): 2016-05-11 09:16:45,831 INFO org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=C:\systems\packages\flink-1.0.2\li