Jay, I have not tried the bigtop hcfs tests. Any tips on how to get started with those?
Our configuration looks similar except for the Gluster specific options and both *fs.default.name <http://fs.default.name> *(and *fs.defaultFS*) as we don't want OrangeFS to be the default fs for this Hadoop cluster. I don't think the problem is caused by a configuration issue as the tera* suite works. The problem is with how TestDFSIO determines the "fs" instance: FileSystem fs = FileSystem.get(config); This basically forces the fs to be fs.defaultFS. Shouldn't TestDFSIO be capable of handling a non-default URI set via: -Dtest.build.data=ofs://test/user/$USER/TestDFSIO I think TestDFSIO should use: FileSystem get(URI uri, Configuration conf) with *uri* being the test.build.data property, if specified, or a sensible default based on the defaultFS scheme and authority as well as the rest of the desired URI. This means test.build.dir should always be treated as a *URI* rather than a *String* so that the default value returned by the method getBaseDir, in class TestDFSIO, can be based off of the defaultFS. Currently, this isn't the case: private static String getBaseDir(Configuration conf) { return conf.get("test.build.data","/benchmarks/TestDFSIO"); } Thoughts? Thanks, Jeff On Thu, Oct 2, 2014 at 4:02 PM, Jay Vyas <jayunit100.apa...@gmail.com> wrote: > Hi jeff. Wrong fs means that your configuration doesn't know how to bind > ofs to the OrangeFS file system class. > > You can debug the configuration using fs.dumpConfiguration(....), and you > will likely see references to hdfs in there. > > By the way, have you tried our bigtop hcfs tests yet? We now support over > 100 Hadoop file system compatibility tests... > > You can see a good sample of what parameters should be set for a hcfs > implementation here: > https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml > > On Oct 2, 2014, at 12:42 PM, Jeffrey Denton <den...@clemson.edu> wrote: > > Hello all, > > I'm trying to run TestDFSIO using a different file system other than the > configured defaultFS and it doesn't work for me: > > $ hadoop org.apache.hadoop.fs.TestDFSIO > -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1 > -fileSize 10240 > > 14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7 > > 14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1 > > 14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0 > > 14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000 > > 14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir = > ofs://test/user/denton/TestDFSIO > > 14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > > 14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local > reads feature cannot be used because libhadoop cannot be loaded. > > 14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240 > bytes, 1 files > > *java.lang.IllegalArgumentException: Wrong FS: > ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci* > > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643) > > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191) > > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102) > > at > org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595) > > at > org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591) > > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591) > > at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290) > > at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650) > > At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16 > data nodes and 3 separate master nodes for the resource manager and two > namenodes; however, for this test, the data nodes are really being used > to run the map tasks with job output being written to 16 separate OrangeFS > servers. > > Ideally, we would like the 16 HDFS data nodes and two namenodes to be the > defaultFS, but would also like the capability to run jobs using other > OrangeFS installations. > > The above error does not occur when OrangeFS is configured to be the > defaultFS. Also, we have no problems running teragen/terasort/teravalidate > when OrangeFS IS NOT the defaultFS. > > So, is it possible to run TestDFSIO using a FS other than the defaultFS? > > If you're interested in the OrangeFS classes, they can be found here > <http://www.orangefs.org/svn/orangefs/branches/denton.hadoop2.trunk/src/client/hadoop/orangefs-hadoop2/src/main/java/org/apache/hadoop/fs/ofs/> > : > > I have not yet run any of the FS tests > <http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/testing.html> > released with 2.5.1 but hope to soon. > > Regards, > > Jeff Denton > OrangeFS Developer > Clemson University > den...@clemson.edu > > > > > > > >