[jira] [Created] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

Elek, Marton (JIRA) Mon, 05 Feb 2018 14:48:23 -0800

Elek, Marton created HDFS-13108:
-----------------------------------

             Summary: Ozone: OzoneFileSystem: Simplified url schema for Ozone 
File System
                 Key: HDFS-13108
                 URL: https://issues.apache.org/jira/browse/HDFS-13108
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: ozone
    Affects Versions: HDFS-7240
            Reporter: Elek, Marton
            Assignee: Elek, Marton

A. Current state
1. The datanode host / bucket /volume should be defined in the defaultFS (eg.
o3://datanode:9864/test/bucket1)
2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the
keys from the bucket1)

It works very well, but there are some limitations.

B. Problem one

The current code doesn't support fully qualified locations. For example 'dfs
-ls o3://datanode:9864/test/bucket1/dir1' is not working.

C.) Problem two

I tried to fix the previous problem, but it's not trivial. The biggest problem
is that there is a Path.makeQualified call which could transform unqualified
url to qualified url. This is part of the Path.java so it's common for all the
Hadoop file systems.

In the current implementations it qualifies an url with keeping the schema (eg.
o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use the
relative path as the end of the qualified url. For example:

makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will
return o3://datanode:9864/dir1/file which is obviously wrong (the good would be
o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround with
using a custom makeQualified in the Ozone code and it worked from command line
but couldn't work with Spark which use the Hadoop api and the original
makeQualified path.

D.) Solution

We should support makeQualified calls, so we can use any path in the defaultFS.
I propose to use a simplified schema as o3://bucket.volume/

This is similar to the s3a format where the pattern is s3a://bucket.region/

We don't need to set the hostname of the datanode (or ksm in case of service
discovery) but it would be configurable with additional hadoop configuraion
values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864
(this is how the s3a works today, as I know).

We also need to define restrictions for the volume names (in our case it should
not include dot any more).

ps: some spark output

2018-02-03 18:43:04 WARN Client:66 - Neither spark.yarn.jars nor
spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-02-03 18:43:05 INFO Client:54 - Uploading resource
file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__2440448967844904444.zip
->
o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__2440448967844904444.zip

My default fs was o3://datanode:9864/test/bucket1, but spark qualified the name
of the home directory.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

Reply via email to