Re: Additional data read inside dataset transformations

2017-09-07 Thread Fabian Hueske
Hi, traditionally, you would do a join, but that would mean to read all Parquet files that might contain relevant data which might be too much. If you want to read data from within a user function (like GroupReduce), you are pretty much up to your own. You could create a HadoopInputFormat

Additional data read inside dataset transformations

2017-09-07 Thread eSKa
Hello, I will describe my use case shortly with steps for easier understanding: 1) currently my job is loading data from parquet files using HadoopInputFormat along with AvroParquetInputFormat, with current approach: AvroParquetInputFormat inputFormat = new AvroParquetInputFormat();