Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-23 Thread Flavio Pompermaier
I've slightly modified the program to shorten the length on the entire job and this time I had this Exception: 2016-05-23 09:26:51,438 ERROR org.apache.flink.runtime.io.disk.iomanager.IOManager - IO Thread 'IOManager writer thread #1' terminated due to an exception. Shutting down I/O Mana

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-23 Thread Flavio Pompermaier
Changing - taskmanager.memory.fraction, from 0.9 to 0.7 - taskmanager.memory.off-heap, from false to true - decreasing the slots of each tm from 3 to 2 I had this error: 2016-05-23 09:55:42,534 ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code: CH

Re: Flink's WordCount at scale of 1BLN of unique words

2016-05-23 Thread Matthias J. Sax
Are you talking about a streaming or a batch job? You are mentioning a "text stream" but also say you want to stream 100TB -- indicating you have a finite data set using DataSet API. -Matthias On 05/22/2016 09:50 PM, Xtra Coder wrote: > Hello, > > Question from newbie about how Flink's WordCou

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-23 Thread Flavio Pompermaier
Changing - taskmanager.memory.fraction, from 0.7 to 0.9 - taskmanager.memory.off-heap, from true to false - decreasing the slots of each tm from 2 to 1 I had this Exception: java.lang.Exception: The data preparation for task 'GroupReduce (GroupReduce at main(AciDataInference.java:331))'

Re: [RichFlattMapfunction] Configuration File

2016-05-23 Thread simon peyer
Hi Aljoscha Thanks for your reply. Regarding question 2, the web dashboard does provide a properties section, besides ( Plan Timeline

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-23 Thread Maximilian Michels
Hi Flavio, These error messages are quite odd. Looks like an off by one error in the serializer/deserializer. Must be somehow related to the Kryo serialization stack because it doesn't seem to occur with Flink's serialization system. Does the job run fine if you don't register the custom Kryo ser

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-23 Thread Flavio Pompermaier
With this last settings I was able to terminate the job the second time I retried to run it, without restarting the cluster.. If I don't register the serializer for DateTime the job doesn't start at all (from Flink 1.x you have to register it [1]). I can't understand what's wrong :( [1] https://cw

Fwd: HDFS namenode and Flink

2016-05-23 Thread thomas
Hello flinkers, We will activate namenode HDFS high availability in our cluster, and I want to know if there is additional configuration for flink ? We actually use YARN for launching our flink application, and hdfs filesystem to store the state backend Thanks Thomas

Re: HDFS namenode and Flink

2016-05-23 Thread Stefano Baghino
I think the only keys of interest for your needs (highly available with HDFS state backend) are state.backend: filesystem state.backend.fs.checkpointdir: hdfs:///path/to/checkpoints # fill in according to your needs recovery.zookeeper.storageDir: /path/to/znode # again, fill in according to your n

Re: HDFS namenode and Flink

2016-05-23 Thread Stefano Baghino
One last quick note: if you're going to run individual jobs on YARN instead of a long running session, make sure you provide each job with a separate set of directories for (surely) ZK storage and (possibly*) state backend, otherwise the state of the jobs will end up entangled and you may experienc

Re: problem of sharing TCP connection when transferring data

2016-05-23 Thread Ufuk Celebi
Yes, that is a correct description of the state of things. A way to improve this is to introduce flow control in the application layer, where consumers only receive buffers when they have buffers available. They could announce on the channel how many buffers they have before they receive anything.

Re: [RichFlattMapfunction] Configuration File

2016-05-23 Thread Maximilian Michels
Hi Simon, As Aljoscha said, the best way is to supply the configuration as class fields. Alternatively, if you overload the open(..) method, it should also show up in the Properties/Configuration tab on the Web interface. Cheers, Max On Mon, May 23, 2016 at 11:43 AM, simon peyer wrote: > Hi Alj

Re: [RichFlattMapfunction] Configuration File

2016-05-23 Thread simon peyer
Hi Max Thanks a lot. I found now this solution: Passing it as a Configuration object to single functions The example below shows how to pass the parameters a

Combining streams with static data and using REST API as a sink

2016-05-23 Thread Josh
Hi all, I am new to Flink and have a couple of questions which I've had trouble finding answers to online. Any advice would be much appreciated! 1. What's a typical way of handling the scenario where you want to join streaming data with a (relatively) static data source? For example, if I

keyBy on a collection of Pojos

2016-05-23 Thread Al-Isawi Rami
Hi, I was trying to test some specific issue, but now I cannot seem to get the very basic case working. It is most likely that I am blind to something, would anyone have quick look at it? https://gist.github.com/rami-alisawi/d6ff33ae2d4d6e7bb1f8b329e3e5fa77 It is just a collection of pojos wher

Re: keyBy on a collection of Pojos

2016-05-23 Thread Flavio Pompermaier
*Conditions* for a class to be treated as a POJO by Flink: - The class must be public - It must have a public constructor without arguments - All fields either have to be public or there must be getters and setters for all non-public fields. If the field name is foo the getter and s

Re: Combining streams with static data and using REST API as a sink

2016-05-23 Thread Al-Isawi Rami
Hi Josh, I am no expert in Flink yet, but here are my thoughts on this: 1. what about you stream an event to flink everytime the DB of items have an update? then in some background thread you get the new data from the DB let it be through REST (if it is only few updates a day) then load the res

Re: keyBy on a collection of Pojos

2016-05-23 Thread Al-Isawi Rami
Thanks Flavio, but as you can see in my code I have already declared my pojo to achieve those conditions: public class PojoExample { public int count; public String productId; public PojoExample() { } } So it cannot be that. -Rami On 23 May 2016, at 16:30, Flavi

Import Configuration File in Flink Cluster

2016-05-23 Thread simon peyer
Hi together Currently I'm using flink on a docker cluster on AWS. I would like to use property files with the integrated ParameterTool.fromPropertiesFile function of Flink. Locally this version works absolutely fine: val configuration = ParameterTool.fromPropertiesFile("src/main/resources/confi

Re: Combining streams with static data and using REST API as a sink

2016-05-23 Thread Josh
Hi Rami, Thanks for the fast reply. 1. In your solution, would I need to create a new stream for 'item updates', and add it as a source of my Flink job? Then I would need to ensure item updates get broadcast to all nodes that are running my job and use them to update the in-memory ite

Re: keyBy on a collection of Pojos

2016-05-23 Thread Flavio Pompermaier
You don't have getters and setters for count and productId. Your class should be public class PojoExample { public int count; public String productId; public PojoExample() {} public int getCount() { return count; } public void setCount(int count) { this.count = count; } public String getProdu

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-23 Thread Maximilian Michels
What error do you get when you don't register the Kryo serializer? On Mon, May 23, 2016 at 11:57 AM, Flavio Pompermaier wrote: > With this last settings I was able to terminate the job the second time I > retried to run it, without restarting the cluster.. > If I don't register the serializer for

Re: Import Configuration File in Flink Cluster

2016-05-23 Thread Maximilian Michels
Hi Simon, You'll have to write the property file to disk first to load it using the ParameterTool.fromPropertiesFile method. For example: // copy config from Java resource to a file File configOnDisk = new File("/path/to/config.properties"); Files.copy(getClass.getClassLoader.getResourceAsStream

Re: Import Configuration File in Flink Cluster

2016-05-23 Thread Stefano Baghino
Are you using Maven to package your project? I believe the resources plugin[1] can suit your needs. [1]: http://maven.apache.org/plugins/maven-resources-plugin/examples/include-exclude.html On Mon, May 23, 2016 at 3:56 PM, simon peyer wrote: > Hi together > > Currently I'm using flink on a dock

Re: Import Configuration File in Flink Cluster

2016-05-23 Thread simon peyer
Hi @Max So for each file in the src/main/resources folder, I first have to create a new file, copy the file from the resources folder to this new file and then I'm able to parse it? @Stefano I think the files in src/main/resources are integrated automatically right? Or am I missing something

Re: keyBy on a collection of Pojos

2016-05-23 Thread Al-Isawi Rami
Thanks, setters and getters for public fields have no purpose. Also per the conditions you have mentioned: "All fields either have to be public or there must be getters and setters for all non-public fields.” Since my fields are declared public there are no impact on adding getters and setters.

Re: keyBy on a collection of Pojos

2016-05-23 Thread Flavio Pompermaier
Sorry Rami, you're right :) Unfortunattely I've never used Flink streaming so I cannot be helpful there.. Myabe is it something related to the default triggering policy of the streaming environment? On Mon, May 23, 2016 at 5:06 PM, Al-Isawi Rami wrote: > Thanks, setters and getters for public fi

Re: keyBy on a collection of Pojos

2016-05-23 Thread Deepak Sharma
Can you try serializing your POJO ? Thanks Deepak On Mon, May 23, 2016 at 8:42 PM, Flavio Pompermaier wrote: > Sorry Rami, you're right :) > Unfortunattely I've never used Flink streaming so I cannot be helpful > there.. > Myabe is it something related to the default triggering policy of the >

Re: Combining streams with static data and using REST API as a sink

2016-05-23 Thread Al-Isawi Rami
Hi, 1. I have no experience in broadcast variables, I suggest you give it a try. 2. I misunderstood you, I thought you were calling for Flink to serve the results and become REST API provider, where others can call those API. What you are saying now is that you want a sink that does HTTP calls

Re: writeAsCSV with partitionBy

2016-05-23 Thread KirstiLaurila
Is there any plans to implement this kind of feature (possibility to write to data specified partitions) in the near future? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/writeAsCSV-with-partitionBy-tp4893p7099.html Sent from the Apache Fli

Re: Import Configuration File in Flink Cluster

2016-05-23 Thread Maximilian Michels
Hi Simon, AFAIK this is the way to go. We could add a method to the ParameterTool which loads from a resource to make it more convenient. Cheers, Max On Mon, May 23, 2016 at 4:42 PM, simon peyer wrote: > Hi > > @Max > So for each file in the src/main/resources folder, I first have to create a >

Re: Import Configuration File in Flink Cluster

2016-05-23 Thread simon peyer
Hi Max Thanks a lot for your helpful answer. It now works on the cluster. It would be great to have a method for loading from resources. -Cheers Simon > On 23 May 2016, at 17:52, Maximilian Michels wrote: > > Hi Simon, > > AFAIK this is the way to go. We could add a method to the > Paramete

Re: Logging Exceptions

2016-05-23 Thread David Kim
Hello! Just wanted to check up on this. :) I grepped around for `log.error` and it *seems* that currently the only events for logging out exceptions are for non-application related errors. Thanks! David On Fri, May 20, 2016 at 12:35 PM David Kim wrote: > Hello! > > Using flink 1.0.2, I notice

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-23 Thread Flavio Pompermaier
You can try with this: import org.apache.flink.api.java.ExecutionEnvironment; import org.joda.time.DateTime; import de.javakaffee.kryoserializers.jodatime.JodaDateTimeSerializer; public class DateTimeError { public static void main(String[] args) throws Exception { ExecutionEnvironm

回复:problem of sharing TCP connection when transferring data

2016-05-23 Thread wangzhijiang999
 Hi Ufuk,    Thank you for the detail explaination! As we confirmed that the task will set the autoread as false for the sharing channel when no available segment buffer. In further, when this task has available buffer again, it will notify the event to set the autoread as true. But in some

Re: Combining streams with static data and using REST API as a sink

2016-05-23 Thread Maximilian Michels
Hi Josh, 1) Use a RichFunction which has an `open()` method to load data (e.g. from a database) at runtime before the processing starts. 2) No that's fine. If you want your Rest API Sink to interplay with checkpointing (for fault-tolerance), this is a bit tricky though depending on the guarantees

Re: Logging Exceptions

2016-05-23 Thread Maximilian Michels
Hi David, I'm afraid Flink logs all exceptions. You'll find the exceptions in the /log directory. Cheers, Max On Mon, May 23, 2016 at 6:18 PM, David Kim wrote: > Hello! > > Just wanted to check up on this. :) > > I grepped around for `log.error` and it *seems* that currently the only > events

Re: problem of sharing TCP connection when transferring data

2016-05-23 Thread Ufuk Celebi
On Mon, May 23, 2016 at 6:55 PM, wangzhijiang999 wrote: >In summary, if one task set autoread as false, and when it notify the > available buffer, there are some messages during this time to be processed > first, if one message belongs to another failed task, the autoread for this > channel wo

Re: [RichFlattMapfunction] Configuration File

2016-05-23 Thread Maximilian Michels
Hi Simon, Great! I think this is only available in the DataSet API. Cheers, Max On Mon, May 23, 2016 at 2:23 PM, simon peyer wrote: > Hi Max > > Thanks a lot. > I found now this solution: > > Passing it as a Configuration object to single functions > > The example below shows how to pass the pa

Re: problem of sharing TCP connection when transferring data

2016-05-23 Thread Deepak Sharma
I am not Flink master or regular user of FLink , but would like to start contributing to Flink. Would it be possible to get involved on this issues and start contributing to Flink community? Thanks Deepak On Mon, May 23, 2016 at 10:49 PM, Ufuk Celebi wrote: > On Mon, May 23, 2016 at 6:55 PM, wa

Re: keyBy on a collection of Pojos

2016-05-23 Thread Fabian Hueske
Actually, the program works correctly (according to the DataStream API) Let me explain what happens: 1) You do not initialize the count variable, so it will be 0 (summing 0s results in 0) 2) DataStreams are considered to be unbound (have an infinite size). KeyBy does not group the records because

Re: writeAsCSV with partitionBy

2016-05-23 Thread Fabian Hueske
Hi Kirsti, I'm not aware of anybody working on this issue. Would you like to create a JIRA issue for it? Best, Fabian 2016-05-23 16:56 GMT+02:00 KirstiLaurila : > Is there any plans to implement this kind of feature (possibility to write > to > data specified partitions) in the near future? > >

Re: Combining streams with static data and using REST API as a sink

2016-05-23 Thread Josh
Hi Max, Thanks, that's very helpful re the REST API sink. For now I don't need exactly once guarantees for the sink, so I'll just write a simple HTTP sink implementation. But may need to move to the idempotent version in future! For 1), that sounds like a simple/easy solution, but how would I han

Re: problem of sharing TCP connection when transferring data

2016-05-23 Thread Ufuk Celebi
On Mon, May 23, 2016 at 7:30 PM, Deepak Sharma wrote: > Would it be possible to get involved on this issues and start contributing > to Flink community? Hey Deepak! Nice to see that you are also interested in this. If you are new to Flink I would recommend to start contributing by looking into o

Re: problem of sharing TCP connection when transferring data

2016-05-23 Thread Deepak Sharma
Thanks Ufuk. Sure I would pick some starter issues and get to these low level issues. -Deepak On 23 May 2016 11:21 pm, "Ufuk Celebi" wrote: > On Mon, May 23, 2016 at 7:30 PM, Deepak Sharma > wrote: > > Would it be possible to get involved on this issues and start > contributing > > to Flink com

Re: Logging Exceptions

2016-05-23 Thread David Kim
Hi Max! Unfortunately, that's not the behavior I'm seeing. I verified my log4.properties is configured properly because I do see messages in the /log directory. However, for this stack trace (grabbed from the web dashboard), I do not see it in my log file: java.lang.RuntimeException: Could not

Re: Import Configuration File in Flink Cluster

2016-05-23 Thread Bajaj, Abhinav
I was gonna post the exact question and noticed this thread. It will be great if we can have a method in parameter tool to load from resources. Thanks Simon :) Abhinav Bajaj Senior Engineer HERE Predictive Analytics Office: +12062092767 Mobile: +17083299516 HERE Seattle 701 Pike Street, #2000,

Re: HDFS namenode and Flink

2016-05-23 Thread thomas
‎Ok, we have all this configuration set up, so it will be fine :-)Thanks for getting response ! Thomas

Re: writeAsCSV with partitionBy

2016-05-23 Thread KirstiLaurila
Yeah, created this one https://issues.apache.org/jira/browse/FLINK-3961 -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/writeAsCSV-with-partitionBy-tp4893p7118.html Sent from the Apache F