I am using FileSystem.get(URI uri, Configuration conf,
String user) to create FileSystem implementation(LocalFileSystem in this case)
instances. From what I know, FileSystem internally has a cache to retain the
objects based on uri and user. So if I call FileSystem.get(..) method multiple
times with same uri and user, then only one instance of LocalFileSystem needs
to be created and cached. However, I observed(with hadoop-core-1.0.0) that each
call creates a new instance of LocalFileSystem and puts it in the cache leading
to memory issues.
Please see the code below and let me know if I am doing
something wrong.
Thanks
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
public class FileSystemCacheIssue {
private static FileSystem
getFileSystem(String user) throws Exception {
Configuration
conf = new Configuration();
conf.set("fs.default.name", "file:///");
return
FileSystem.get(new URI("file:///"),conf,user);
}
public static void main(String[] args)
throws Exception {
for(int i = 0; i
< 1000; i++) {
getFileSystem("himanshg");
}
FileSystem
fs = getFileSystem("himanshg");
System.out.println(fs.getClass().getCanonicalName());
//put a
breakpoint here and look at the heap dump for number of LocalFileSystem
//instances,
Ideally I expect it to be 1, but there are 1001
System.out.println("Keep your debugger here and check.");
}
}