Why does it have to be a stream? > Am 18.11.2018 um 23:29 schrieb Nicolas Paris <nicolas.pa...@riseup.net>: > > Hi > > I have pdf to load into spark with at least <filename, byte_array> > format. I have considered some options: > > - spark streaming does not provide a native file stream for binary with > variable size (binaryRecordStream specifies a constant size) and I > would have to write my own receiver. > > - Structured streaming allows to process avro/parquet/orc files > containing pdfs, but this makes things more complicated than > monitoring a simple folder containing pdfs > > - Kafka is not designed to handle messages > 100KB, and for this reason > it is not a good option to use in the stream pipeline. > > Somebody has a suggestion ? > > Thanks, > > -- > nicolas > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org