Hey, We want to run a Job, accessing S3, from EC2 instances. The Job runs in a self-provided Spark Cluster (1.3.0) on EC2 instances. In Irland everything works as expected.
i just tried to move data from Irland -> Frankfurt. AWS S3 is forcing v4 of their API there, means: access is only possible via: AWS4-HMAC-SHA256 This is still ok, but I dont get access there. What I tried already: All of the Approaches I tried with these URLs: A) "s3n://<key>:<secret>@<bucket>/<path>/" B) "s3://<key>:<secret>@<bucket>/<path>/" C) "s3n://<bucket>/<path>/" D) "s3://<bucket>/<path>/" 1a. setting Environment Variables in the operating system 1b. found something, to set AccessKey/Secret in SparkConf like that (I guess, this does not have any effect) sc.set("AWS_ACCESS_KEY_ID", id) sc.set("AWS_SECRET_ACCESS_KEY", secret) 2. tried to use a "more up to date" jets3t client (somehow I was not able to get the "new" version running) 3. tried in-URL basic authentication (A+B) 4. Setting the hadoop configuration: hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3.S3FileSystem"); hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key); hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret); hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3.S3FileSystem"); hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey"); hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret"); --> Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/%2FEAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message><RequestId>43F8F02E767DC4A2</RequestId><HostId>wgMeAEYcZZa/2BazQ9TA+PAkUxt5l+ExnT4Emb+1Uk5KhWfJu5C8Xcesm1AXCfJ9nZJMyh4wPX8=</HostId></Error> 2. setting Hadoop Configuration hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem"); hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key); hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret); hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem"); hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey"); hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret"); --> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' - ResponseCode=400, ResponseMessage=Bad Request 5. without Hadoop Config Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). 6. without Hadoop Config but passed in S3 URL with A) Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' - ResponseCode=400, ResponseMessage=Bad Request with B) Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). Drilled down in the Job, I can see, that the RestStorageService recognizes AWS4-HMAC-SHA256 ... but somehow it gets a ResponseCode 400 (log below) -> i replaced the key / encoded secret with XXX_*_XXX: 15/03/20 11:25:31 WARN RestStorageService: Retrying request with "AWS4-HMAC-SHA256" signing mechanism: GET https://frankfurt.ingestion.batch.s3.amazonaws.com:443/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F HTTP/1.1 15/03/20 11:25:31 WARN RestStorageService: Retrying request following error response: GET '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/' -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: Fri, 20 Mar 2015 11:25:31 GMT, Authorization: AWS XXX_MY_KEY_XXX:XXX_I_GUESS_SECRET_XXX], Response Headers: [x-amz-request-id: 7E6F85873D69D14E, x-amz-id-2: rGFW+kRfURzz3DlY/m/M8h054MmHu8bxJAtKVHUmov/VY7pBXvtMvbQTXxA7bffpu4xxf4rGmL4=, x-amz-region: eu-central-1, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:31 GMT, Connection: close, Server: AmazonS3] 15/03/20 11:25:32 WARN RestStorageService: Retrying request after automatic adjustment of Host endpoint from "frankfurt.ingestion.batch.s3.amazonaws.com" to "frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com" following request signing error using AWS request signing version 4: GET https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com:443/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/ HTTP/1.1 15/03/20 11:25:32 WARN RestStorageService: Retrying request following error response: GET '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/' -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: Fri, 20 Mar 2015 11:25:31 GMT, x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, Host: frankfurt.ingestion.batch.s3.amazonaws.com, x-amz-date: 20150320T112531Z, Authorization: AWS4-HMAC-SHA256 Credential=XXX_MY_KEY_XXX/20150320/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=2098d3175c4304e44be912b770add7594d1d1b44f545c3025be1748672ec60e4], Response Headers: [x-amz-request-id: 5CABCD0D3046B267, x-amz-id-2: V65tW1lbSybbN3R3RMKBjJFz7xUgJDubSUm/XKXTypg7qfDtkSFRt2I9CMo2Qo2OAA+E44hiazg=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:32 GMT, Connection: close, Server: AmazonS3] Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3n://frankfurt.ingestion.batch/EAN/2015-03-09-72640385/input/HotelImageList.gz Do you have any Ideas? Was somebody of you already able to access S3 in Frankfurt, if so - how? Cheers Ralf