UTF-16 support for TextInputFormat

2018-08-08 Thread David Dreyfus
Hello - It does not appear that Flink supports a charset encoding of "UTF-16". It particular, it doesn't appear that Flink consumes the Byte Order Mark (BOM) to establish whether a UTF-16 file is UTF-16LE or UTF-16BE. Are there any plans to enhance Flink to handle UTF-16 with BOM? Thank you, Davi

Re: UTF-16 support for TextInputFormat

2018-08-09 Thread David Dreyfus
f = ... > tif.setCharsetName("UTF-16"); > > Best, Fabian > > 2018-08-08 17:45 GMT+02:00 David Dreyfus : > >> Hello - >> >> It does not appear that Flink supports a charset encoding of "UTF-16". It >> particular, it doesn't appear th

Re: UTF-16 support for TextInputFormat

2018-08-13 Thread David Dreyfus
> > [1] > https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/io/TextInputFormat.java#L95 > > 2018-08-09 14:55 GMT+02:00 David Dreyfus : > >> Hi Fabian, >> >> Thank you for taking my email. >> TextInputFormat.s

Tasks, slots, and partitioned joins

2017-10-25 Thread David Dreyfus
Hello - I have a large number of pairs of files. For purpose of discussion: /source1/{1..1} and /source2/{1..1}. I want to join the files pair-wise: /source1/1 joined to /source2/1, /source1/2 joined to /source2/2, and so on. I then want to union the results of the pair-wise joins and per

Re: Tasks, slots, and partitioned joins

2017-10-26 Thread David Dreyfus
Hi Fabian, Thank you for the great, detailed answers. 1. So, each parallel slice of the DAG is placed into one slot. The key to high utilization is many slices of the source data (or the various methods of repartitioning it). Yes? 2. In batch processing, are slots filled round-robin on task manag

Execute multiple jobs in parallel (threading): java.io.OptionalDataException

2017-10-26 Thread David Dreyfus
Hello, I am trying to submit multiple jobs to flink from within my Java program. I am running into an exception that may be related: java.io.OptionalDataException. Should I be able to create multiple plans/jobs within a single session and execute them concurrently? If so, is there a working examp

Re: Not enough free slots to run the job

2017-10-26 Thread David Dreyfus
Hello, I know this is an older thread, but ... If some slots are left empty it doesn't necessarily mean that machine resources are wasted. Some managed memory might be unavailable, but CPU, heap memory, network, and disk are shared across slots. To the extent there are multiple operators executin

Data sources and slices

2017-10-26 Thread David Dreyfus
Hello, If I am on a cluster with 2 task managers with 64 CPUs each, I can configure 128 slots in accordance with the documentation. If I set parallelism to 128 and read a 64 MB file (one datasource with a single file), will flink really create 500K slices? Or, will it check the default blocksize o