Re: Understanding Sliding Windows

2016-04-26 Thread Piyush Shrivastava
Hello Dominik, Thanks for the information. Since my window is getting triggered every 10 seconds, the results I am getting before 5 minutes would be irrelevant as I need to consider data coming in every 5 minutes. Is there a way I can skip the results that are output before the first 5 minutes?

Wildcards with --classpath parameter in CLI

2016-04-26 Thread Ken Krugler
Hi all, If I want to include all of the jars in a directory, I thought I could do --classpath file://http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr

Re: Discarding header from CSV file

2016-04-26 Thread nsengupta
Chiwan and other Flinksters,I am stuck with the following. Somehow, I am an unable to spot the error, if any! Please help.*I have this case class*:case class BuildingInformation(buildingID: Int, buildingManager: Int, buildingAge: Int, productID: String, country: String)*I intend to read from a CSV

Re: Discarding header from CSV file

2016-04-26 Thread nsengupta
Hello Chiwan, I was just about to post to declare my ignorance, because I searched again and realized that I failed to spot ReadCsvFile ! :-) You have been faster than me! Yes, I should use ReadCsvFile so that I get all the facilities built in. Many thanks for pointing out. -- N [image: --]

Re: Discarding header from CSV file

2016-04-26 Thread Chiwan Park
Hi, Nirmalya I recommend readCsvFile() method rather than readTextFile() to read CSV file. readCsvFile() provides some features for CSV file such as ignoreFirstLine() (what you are looking for), ignoreComments(), and etc. If you have to use readTextFile() method, I think, you can ignore column

Discarding header from CSV file

2016-04-26 Thread nsengupta
What is the recommended way of discarding the Column Header(s) from a CSV file, if I am using /environment.readTextFile() / facility? Obviously, we don't know beforehand, which of the nodes will read the Header(s)? So, we cannot use usual tricks like drop(1)? I don't recall well: has this

Re: "No more bytes left" at deserialization

2016-04-26 Thread Timur Fayruzov
I built master with scala 2.11 and hadoop 2.7.1, now get a different exception (still serialization-related though): java.lang.Exception: The data preparation for task 'CHAIN CoGroup (CoGroup at com.whitepages.data.flink.FaithResolution$.pipeline(FaithResolution.scala:162)) -> Filter (Filter at

Re: Job hangs

2016-04-26 Thread Timur Fayruzov
Robert, Ufuk, logs, execution plan and a screenshot of the console are in the archive: https://www.dropbox.com/s/68gyl6f3rdzn7o1/debug-stuck.tar.gz?dl=0 Note that when I looked in the backpressure view I saw back pressure 'high' on following paths: Input->code_line:123,124->map->join

Re: [External] Re: Consuming Messages from Kafka

2016-04-26 Thread Robert Metzger
Hi Josh, The JobManager log won't contain this output. Check out these slides I did a while ago, they explain how you can retrieve the logs from the TaskManagers: http://www.slideshare.net/robertmetzger1/apache-flink-hands-on#14 On Tue, Apr 26, 2016 at 9:41 PM, Conlin, Joshua [USA]

Re: [External] Re: Consuming Messages from Kafka

2016-04-26 Thread Conlin, Joshua [USA]
"StringLogSink" just looks like: System.out.println(msg); LOG.info("Logging message: " + msg); And LOG is from slf4j. In the Flink UI that is running on Yarn, I see no counts, nor log statements or stdout under JobManager. It seems to make no difference if I submit the job through yarn

Re: Consuming Messages from Kafka

2016-04-26 Thread Dominik Choma
Hi, You can check if any messages are going through dataflow on flink web dashboard https://flink.apache.org/img/blog/new-dashboard-screenshot.png Dominik Choma > Wiadomość napisana przez Conlin, Joshua [USA]

Consuming Messages from Kafka

2016-04-26 Thread Conlin, Joshua [USA]
Hello, I am new to Flink and trying to learn this framework. Seems great so far. I am trying to translate my existing storm Topology to a Flink job and I am having trouble consuming data from Kafka. Here's what my Job looks like: public static void main(String[] args) throws Exception {

Re: Initializing global data

2016-04-26 Thread Nirmalya Sengupta
Hello Stefano , Thanks for sharing your views. Now, that you make me think, I know that your recommendation works well. I will go ahead, following your suggestions. -- Nirmalya -- Software Technologist http://www.linkedin.com/in/nirmalyasengupta "If you have

Re: classpath issue on yarn

2016-04-26 Thread Robert Metzger
Hi Aris, Did you build the 1.0.2 flink-dist yourself? If not, which exact version did you download? For example this file: http://www.apache.org/dyn/closer.lua/flink/flink-1.0.2/flink-1.0.2-bin-hadoop2-scala_2.11.tgz has a clean flink-dist jar. On Tue, Apr 26, 2016 at 12:28 PM, aris kol

Re: Getting java.lang.Exception when try to fetch data from Kafka

2016-04-26 Thread prateek arora
Hi Robert , Hi I have java program to send data into kafka topic. below is code for this : private Producer producer = null Serializer keySerializer = new StringSerializer(); Serializer valueSerializer = new ByteArraySerializer(); producer = new KafkaProducer

Re: Return unique counter using groupReduceFunction

2016-04-26 Thread Fabian Hueske
Hi Biplob, Flink is a distributed, data parallel system which means that there are several instances of you ReduceFunction running in parallel, each with its own timestamp counter. If you want to have a unique timestamp, you have to set the parallelism of the reduce operator to 1, but then the

Re: Flink first() operator

2016-04-26 Thread Fabian Hueske
Actually, memory should not be a problem since the full data set would not be materialized in memory. Flink has a streaming runtime so most of the data would be immediately filtered out. However, reading the whole file causes of course a lot of unnecessary IO. 2016-04-26 17:09 GMT+02:00 Biplob

Regarding Broadcast of datasets in streaming context

2016-04-26 Thread Biplob Biswas
Hi, I have yet another question, this time maintaining a global list of centroids. I am trying to implement the clustream algorithm and for that purpose I have the initial set of centres in a flink dataset. Now I need to update the set of centres for every data tuple that comes from the stream.

Re: "No more bytes left" at deserialization

2016-04-26 Thread Till Rohrmann
Then let's keep finger crossed that we've found the culprit :-) On Tue, Apr 26, 2016 at 6:02 PM, Timur Fayruzov wrote: > Thank you Till. > > I will try to run with new binaries today. As I have mentioned, the error > is reproducible only on a full dataset, so coming up

Re: "No more bytes left" at deserialization

2016-04-26 Thread Till Rohrmann
Hi Timur, I’ve got good and not so good news. Let’s start with the not so good news. I couldn’t reproduce your problem but the good news is that I found a bug in the duplication logic of the OptionSerializer. I’ve already committed a patch to the master to fix it. Thus, I wanted to ask you,

Return unique counter using groupReduceFunction

2016-04-26 Thread Biplob Biswas
Hi, I am using a groupreduce function to aggregate the content of the objects but at the same time i need to return a unique counter from the function but my attempts are failing and the identifiers are somehow very random and getting duplicated. Following is the part of my code which is

Re: Understanding Sliding Windows

2016-04-26 Thread Dominik Choma
Piyush, You created sliding window witch is triggered every 10 seconds Flink fires up this window every 10 seconds, without waiting at 5 min buffer to be filled up It seems to me that first argument is rather "maximum data buffer retention" than " the initial threshold" Dominik Dominik

Re: Job hangs

2016-04-26 Thread Timur Fayruzov
Hello Robert, I observed progress for 2 hours(meaning numbers change on dashboard), and then I waited for 2 hours more. I'm sure it had to spill at some point, but I figured 2h is enough time. Thanks, Timur On Apr 26, 2016 1:35 AM, "Robert Metzger" wrote: > Hi Timur, > >

Re: Initializing global data

2016-04-26 Thread Stefano Baghino
Hi Nirmalya, I'm not really sure setGlobalJobParameters is what you're looking for. If the ReferableData more then some simple configuration (and judging from its type it looks like so) maybe you can try to leverage broadcast variables. You can read more about them here:

Re: Control Trigger behavior based on external datasource

2016-04-26 Thread Hironori Ogibayashi
Till, Thank you for your answer. That's true that there is the case window operator have not received all data for the key. I will go with the second idea. Thanks! Hironori 2016-04-26 17:46 GMT+09:00 Till Rohrmann : > Hi Hironori, > > I would go with the second approach,

Understanding Sliding Windows

2016-04-26 Thread Piyush Shrivastava
Hi all,I wanted to know how exactly sliding windows produce results in Flink.Suppose I create a sliding window of 5 minutes which is refreshed in every 10 seconds: .timeWindow(Time.minutes(5), Time.seconds(10)) So in every 10 seconds we are looking at data from the past 5 minutes. But what

Re: Submit Flink Jobs to YARN running on AWS

2016-04-26 Thread Robert Metzger
I've started my own EMR cluster and tried to launch a Flink job from my local machine on it. I have to admin that configuring the EMR launched Hadoop for external access is quite a hassle. I'm not even able to submit Flink to the YARN cluster because the client can not connect to the

Re: Gracefully stop long running streaming job

2016-04-26 Thread Maximilian Michels
I have to warn you that the Storm SpoutWrapper and the TwitterSource are currently the only stoppable sources. However, we could make more stoppable, e.g. the KafkaConsumer. On Tue, Apr 19, 2016 at 12:38 AM, Robert Schmidtke wrote: > I'm on 0.10.2 which seems to be still

About flink stream table API

2016-04-26 Thread Zhangrucong
Hello: I want to learn the flink stream API. The stream sql is the same with calcite? In the flowing link, the examples of table api are dataset, where I can see the detail introduction of streaming table API.

Re: YARN terminating TaskNode

2016-04-26 Thread Maximilian Michels
Hi Timur, Indeed, if you use JNI libraries then the memory will be off-heap and the -XmX limit will not be respected. Currently, we don't expect users to use JNI memory allocation. We might want to enforce a more strict direct memory limit in the future. In this case, you would get an

Re: "No more bytes left" at deserialization

2016-04-26 Thread Aljoscha Krettek
Could this be caused by the disabled reference tracking in our Kryo serializer? From the stack trace it looks like its failing when trying to deserialize the traits that are wrapped in Options. On Tue, 26 Apr 2016 at 10:09 Ufuk Celebi wrote: > Hey Timur, > > I'm sorry about

Re: Submit Flink Jobs to YARN running on AWS

2016-04-26 Thread Robert Metzger
Hi Abhi, I'll try to reproduce the issue and come up with a solution. On Tue, Apr 26, 2016 at 1:13 AM, Bajaj, Abhinav wrote: > Hi Fabian, > > Thanks for your reply and the pointers to documentation. > > In these steps, I think the Flink client is installed on the master

Re: Control Trigger behavior based on external datasource

2016-04-26 Thread Till Rohrmann
Hi Hironori, I would go with the second approach, because it is not guaranteed that all events of a given key have been received by the window operator if the data source says that all events for this key have been read. The events might still be in flight. Furthermore, it integrates more nicely

Re: Job hangs

2016-04-26 Thread Robert Metzger
Hi Timur, thank you for sharing the source code of your job. That is helpful! Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much more IO heavy with the larger input data because all the joins start spilling? Our monitoring, in particular for batch jobs is really not very

Control Trigger behavior based on external datasource

2016-04-26 Thread Hironori Ogibayashi
Hello, I am using GlobalWindow and my custom trigger (similar to ContinuousProcessingTimeTrigger). In my trigger I want to control the TriggerResult based on external datasource. That datasource has flags for each key which describes if stream for that key has been finished (so, can be purged).

Re: Job hangs

2016-04-26 Thread Ufuk Celebi
No. If you run on YARN, the YARN logs are the relevant ones for the JobManager and TaskManager. The client log submitting the job should be found in /log. – Ufuk On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov wrote: > I will do it my tomorrow. Logs don't show

Re: "No more bytes left" at deserialization

2016-04-26 Thread Ufuk Celebi
Hey Timur, I'm sorry about this bad experience. >From what I can tell, there is nothing unusual with your code. It's probably an issue with Flink. I think we have to wait a little longer to hear what others in the community say about this. @Aljoscha, Till, Robert: any ideas what might cause

Re: Job hangs

2016-04-26 Thread Ufuk Celebi
Hey Timur, is it possible to connect to the VMs and get stack traces of the Flink processes as well? We can first have a look at the logs, but the stack traces will be helpful if we can't figure out what the issue is. – Ufuk On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann

Re: Job hangs

2016-04-26 Thread Till Rohrmann
Could you share the logs with us, Timur? That would be very helpful. Cheers, Till On Apr 26, 2016 3:24 AM, "Timur Fayruzov" wrote: > Hello, > > Now I'm at the stage where my job seem to completely hang. Source code is > attached (it won't compile but I think gives a