Re: Can Spark read input data from HDFS centralized cache?

Ted Yu Mon, 25 Jan 2016 16:14:39 -0800

Please see also:
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html

According to Chris Nauroth, an hdfs committer, it's extremely difficult to
use the feature correctly.

The feature also brings operational complexity. Since off-heap memory is
used, you can accidentally use too much RAM on the host, resulting in OOM
in the JVM which is hard to debug.

Cheers

On Mon, Jan 25, 2016 at 1:39 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Have you read this thread ?
>
>
> http://search-hadoop.com/m/uOzYttXZcg1M6oKf2/HDFS+cache&subj=RE+hadoop+hdfs+cache+question+do+client+processes+share+cache+
>
> Cheers
>
> On Mon, Jan 25, 2016 at 1:23 PM, Jia Zou <jacqueline...@gmail.com> wrote:
>
>> I configured HDFS to cache file in HDFS's cache, like following:
>>
>> hdfs cacheadmin -addPool hibench
>>
>> hdfs cacheadmin -addDirective -path /HiBench/Kmeans/Input -pool hibench
>>
>>
>> But I didn't see much performance impacts, no matter how I configure
>> dfs.datanode.max.locked.memory
>>
>>
>> Is it possible that Spark doesn't know the data is in HDFS cache, and
>> still read data from disk, instead of from HDFS cache?
>>
>>
>> Thanks!
>>
>> Jia
>>
>
>

Re: Can Spark read input data from HDFS centralized cache?

Reply via email to