sorting data into sink

2018-03-13 Thread Telco Phone
Does any know if this is a correct assumption DataStream sorted = stream.keyBy("partition"); Will automattically put same record to the same sink thread ? The behavior I am seeing is that a Sink setup with multiple threads is see data from the same hour. Any good examples of how to sort data so

count and window question with kafka

2017-10-30 Thread Telco Phone
I have a process that will take 250,000 records from kafka and produce a file. (Using a CustomFileSync)  Currently I just have the following: DataStream stream =env.addSource(new FlinkKafkaConsumer010("topic"", schema, properties)).setParallelism(40).flatMap(new SchemaRecordSplit()).setParalle

Processing files

2017-10-23 Thread Telco Phone
All, Im looking to process files in a directory based on files that are coming in via file transfer. The files are renamed once the transfer is done to a .DONE. These are binary files and I need to process billions per day. What I want to do is process the file and then create a new file called .

Re: Key by Task number

2017-04-18 Thread Telco Phone
number of partitions in your topic. If you have 240 partitions that's fine, but if you have less than other subtasks will be idle. Only one task can read from one partition in parallel. On Tue, Apr 18, 2017 at 3:38 PM Telco Phone wrote: I am trying to use the task number as a

Key by Task number

2017-04-18 Thread Telco Phone
I am trying to use the task number as a keyby value to help fan out the work load reading from kafka. Given:        DataStream stream =                env.addSource(new FlinkKafkaConsumer010("topicA", schema, properties)                ).setParallelism(240).flatMap(new SchemaRecordSplit()).se

Odd error

2017-03-22 Thread Telco Phone
Getting this: DataStream stream =                env.addSource(new FlinkKafkaConsumer08<>("raw", schema, properties)                ).setParallelism(30).flatMap(new RecordSplit()).setParallelism(30).                        name("Raw splitter").keyBy("id","keyByHelper","schema"); Field expressio

Threading issue

2017-03-21 Thread Telco Phone
I am looking to get  readers from kafka / keyBy and Sink working with all 60 threads. For the most part it is working correctly DataStream stream =env.addSource(new FlinkKafkaConsumer08<>("kafkatopic", schema, properties)).setParallelism(60).flatMap(new SchemaRecordSplit()).setParallelism(60).n

access to key in sink

2016-12-24 Thread Telco Phone
I am trying to access the keyBy value in the "open" method in a RichSink Is there a way to access the actual keyBy value in the RichSink ? DataStream stream = env.addSource(new FlinkKafkaConsumer08<>("test", schema, properties) ).setParallelism(1).keyBy("partition");