HADOOP-14190. Add more on S3 regions to the s3a documentation. Contributed by Steve Loughran
Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/ee243e52 Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/ee243e52 Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/ee243e52 Branch: refs/heads/YARN-3926 Commit: ee243e5289212aa2912d191035802ea023367e19 Parents: fb5ee3f Author: Steve Loughran <ste...@apache.org> Authored: Wed Jun 28 10:22:13 2017 +0100 Committer: Steve Loughran <ste...@apache.org> Committed: Wed Jun 28 10:22:13 2017 +0100 ---------------------------------------------------------------------- .../src/site/markdown/tools/hadoop-aws/index.md | 109 +++++++++++++++---- .../hadoop-aws/src/test/resources/core-site.xml | 81 ++++++++++++++ 2 files changed, 168 insertions(+), 22 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hadoop/blob/ee243e52/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md ---------------------------------------------------------------------- diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md index 8c8df1b..182f060 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md @@ -29,7 +29,9 @@ HADOOP_OPTIONAL_TOOLS in hadoop-env.sh has 'hadoop-aws' in the list. ### Features -**NOTE: `s3:` has been phased out. Use `s3n:` or `s3a:` instead.** +**NOTE: `s3:` has been phased out; `s3n:`, while +distributed should now be considered deprecated. +Please use `s3a:` as the connector to data hosted in S3.** 1. The second-generation, `s3n:` filesystem, making it easy to share data between hadoop and other applications via the S3 object store. @@ -892,7 +894,7 @@ from placing its declaration on the command line. any call to setReadahead() is made to an open stream.</description> </property> -### Configurations different S3 buckets +### Configuring different S3 buckets Different S3 buckets can be accessed with different S3A client configurations. This allows for different endpoints, data read and write strategies, as well @@ -964,10 +966,11 @@ then declare the path to the appropriate credential file in a bucket-specific version of the property `fs.s3a.security.credential.provider.path`. -### Working with buckets in different regions +### Using Per-Bucket Configuration to access data round the world -S3 Buckets are hosted in different regions, the default being US-East. -The client talks to it by default, under the URL `s3.amazonaws.com` +S3 Buckets are hosted in different "regions", the default being "US-East". +The S3A client talks to this region by default, issing HTTP requests +to the server `s3.amazonaws.com`. S3A can work with buckets from any region. Each region has its own S3 endpoint, documented [by Amazon](http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region). @@ -987,50 +990,112 @@ While it is generally simpler to use the default endpoint, working with V4-signing-only regions (Frankfurt, Seoul) requires the endpoint to be identified. Expect better performance from direct connections âtraceroute will give you some insight. -Examples: +If the wrong endpoint is used, the request may fail. This may be reported as a 301/redirect error, +or as a 400 Bad Request: take these as cues to check the endpoint setting of +a bucket. -The default endpoint: +Here is a list of properties defining all AWS S3 regions, current as of June 2017: ```xml +<!-- + This is the default endpoint, which can be used to interact + with any v2 region. + --> <property> - <name>fs.s3a.endpoint</name> + <name>central.endpoint</name> <value>s3.amazonaws.com</value> </property> -``` -Frankfurt +<property> + <name>canada.endpoint</name> + <value>s3.ca-central-1.amazonaws.com</value> +</property> -```xml <property> - <name>fs.s3a.endpoint</name> + <name>frankfurt.endpoint</name> <value>s3.eu-central-1.amazonaws.com</value> </property> -``` -Seoul +<property> + <name>ireland.endpoint</name> + <value>s3-eu-west-1.amazonaws.com</value> +</property> -```xml <property> - <name>fs.s3a.endpoint</name> + <name>london.endpoint</name> + <value>s3.eu-west-2.amazonaws.com</value> +</property> + +<property> + <name>mumbai.endpoint</name> + <value>s3.ap-south-1.amazonaws.com</value> +</property> + +<property> + <name>ohio.endpoint</name> + <value>s3.us-east-2.amazonaws.com</value> +</property> + +<property> + <name>oregon.endpoint</name> + <value>s3-us-west-2.amazonaws.com</value> +</property> + +<property> + <name>sao-paolo.endpoint</name> + <value>s3-sa-east-1.amazonaws.com</value> +</property> + +<property> + <name>seoul.endpoint</name> <value>s3.ap-northeast-2.amazonaws.com</value> </property> -``` -If the wrong endpoint is used, the request may fail. This may be reported as a 301/redirect error, -or as a 400 Bad Request. +<property> + <name>singapore.endpoint</name> + <value>s3-ap-southeast-1.amazonaws.com</value> +</property> + +<property> + <name>sydney.endpoint</name> + <value>s3-ap-southeast-2.amazonaws.com</value> +</property> + +<property> + <name>tokyo.endpoint</name> + <value>s3-ap-northeast-1.amazonaws.com</value> +</property> + +<property> + <name>virginia.endpoint</name> + <value>${central.endpoint}</value> +</property> +``` -If you are trying to mix endpoints for different buckets, use a per-bucket endpoint -declaration. For example: +This list can be used to specify the endpoint of individual buckets, for example +for buckets in the central and EU/Ireland endpoints. ```xml <property> <name>fs.s3a.bucket.landsat-pds.endpoint</name> - <value>s3.amazonaws.com</value> + <value>${central.endpoint}</value> <description>The endpoint for s3a://landsat-pds URLs</description> </property> + +<property> + <name>fs.s3a.bucket.eu-dataset.endpoint</name> + <value>${ireland.endpoint}</value> + <description>The endpoint for s3a://eu-dataset URLs</description> +</property> + ``` +Why explicitly declare a bucket bound to the central endpoint? It ensures +that if the default endpoint is changed to a new region, data store in +US-east is still reachable. + + ### <a name="s3a_fast_upload"></a>Stabilizing: S3A Fast Upload http://git-wip-us.apache.org/repos/asf/hadoop/blob/ee243e52/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml ---------------------------------------------------------------------- diff --git a/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml b/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml index 7d2046b..d424aa4 100644 --- a/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml +++ b/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml @@ -30,6 +30,87 @@ <final>true</final> </property> + <property> + <name>fs.s3a.bucket.landsat-pds.endpoint</name> + <value>${central.endpoint}</value> + <description>The endpoint for s3a://landsat-pds URLs</description> + </property> + + <!-- + This is the default endpoint, which can be used to interact + with any v2 region. + --> + <property> + <name>central.endpoint</name> + <value>s3.amazonaws.com</value> + </property> + + <property> + <name>canada.endpoint</name> + <value>s3.ca-central-1.amazonaws.com</value> + </property> + + <property> + <name>frankfurt.endpoint</name> + <value>s3.eu-central-1.amazonaws.com</value> + </property> + + <property> + <name>ireland.endpoint</name> + <value>s3-eu-west-1.amazonaws.com</value> + </property> + + <property> + <name>london.endpoint</name> + <value>s3.eu-west-2.amazonaws.com</value> + </property> + + <property> + <name>mumbai.endpoint</name> + <value>s3.ap-south-1.amazonaws.com</value> + </property> + + <property> + <name>ohio.endpoint</name> + <value>s3.us-east-2.amazonaws.com</value> + </property> + + <property> + <name>oregon.endpoint</name> + <value>s3-us-west-2.amazonaws.com</value> + </property> + + <property> + <name>sao-paolo.endpoint</name> + <value>s3-sa-east-1.amazonaws.com</value> + </property> + + <property> + <name>seoul.endpoint</name> + <value>s3.ap-northeast-2.amazonaws.com</value> + </property> + + <property> + <name>singapore.endpoint</name> + <value>s3-ap-southeast-1.amazonaws.com</value> + </property> + + <property> + <name>sydney.endpoint</name> + <value>s3-ap-southeast-2.amazonaws.com</value> + </property> + + <property> + <name>tokyo.endpoint</name> + <value>s3-ap-northeast-1.amazonaws.com</value> + </property> + + <property> + <name>virginia.endpoint</name> + <value>${central.endpoint}</value> + </property> + + <!-- Turn security off for tests by default --> <property> <name>hadoop.security.authentication</name> --------------------------------------------------------------------- To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org