Elek, Marton created HDFS-13108:
-----------------------------------

             Summary: Ozone: OzoneFileSystem: Simplified url schema for Ozone 
File System
                 Key: HDFS-13108
                 URL: https://issues.apache.org/jira/browse/HDFS-13108
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: ozone
    Affects Versions: HDFS-7240
            Reporter: Elek, Marton
            Assignee: Elek, Marton


A. Current state
 
1. The datanode host / bucket /volume should be defined in the defaultFS (eg.  
o3://datanode:9864/test/bucket1)
2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
keys from the bucket1)

It works very well, but there are some limitations.

B. Problem one 

The current code doesn't support fully qualified locations. For example 'dfs 
-ls o3://datanode:9864/test/bucket1/dir1' is not working.

C.) Problem two

I tried to fix the previous problem, but it's not trivial. The biggest problem 
is that there is a Path.makeQualified call which could transform unqualified 
url to qualified url. This is part of the Path.java so it's common for all the 
Hadoop file systems.

In the current implementations it qualifies an url with keeping the schema (eg. 
o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use the 
relative path as the end of the qualified url. For example:

makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
return o3://datanode:9864/dir1/file which is obviously wrong (the good would be 
o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround with 
using a custom makeQualified in the Ozone code and it worked from command line 
but couldn't work with Spark which use the Hadoop api and the original 
makeQualified path.

D.) Solution

We should support makeQualified calls, so we can use any path in the defaultFS.
 
I propose to use a simplified schema as o3://bucket.volume/ 

This is similar to the s3a  format where the pattern is s3a://bucket.region/ 

We don't need to set the hostname of the datanode (or ksm in case of service 
discovery) but it would be configurable with additional hadoop configuraion 
values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
(this is how the s3a works today, as I know).

We also need to define restrictions for the volume names (in our case it should 
not include dot any more).

ps: some spark output

2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__2440448967844904444.zip
 -> 
o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__2440448967844904444.zip

My default fs was o3://datanode:9864/test/bucket1, but spark qualified the name 
of the home directory.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to