Spark: Using "node-local" files within functions?

2015-04-14 Thread Horsmann, Tobias
Hi, I am trying to use Spark in combination with Yarn with 3rd party code which is unaware of distributed file systems. Providing hdfs file references thus does not work. My idea to resolve this issue was the following: Within a function I take the HDFS file reference I get as parameter and co

Re: Spark: Using "node-local" files within functions?

2015-04-14 Thread Sandy Ryza
Hi Tobias, It should be possible to get an InputStream from an HDFS file. However, if your libraries only work directly on files, then maybe that wouldn't work? If that's the case and different tasks need different files, your way is probably the best way. If all tasks need the same file, a bett