1. make sure your secret key doesn't have a "/" in it. If it does, generate a new key. 2. jets3t and hadoop JAR versions need to be in sync; jets3t 0.9.0 was picked up in Hadoop 2.4 and not AFAIK 3. Hadoop 2.6 has a new S3 client, "s3a", which compatible with s3n data. It uses the AWS toolkit over JetS3t, where all future dev is going. Assuming it is up date with the AWS toolkit, it will do the auth. Not knowingly tested against frankfurt though; just ireland, US east, US west & Japan. S3a still has some quirks being worked through; HADOOP-11571 lists the set fixed.
On 20 Mar 2015, at 15:15, Ralf Heyde <r...@hubrick.com<mailto:r...@hubrick.com>> wrote: Good Idea, will try that. But assuming, "only" data is located there, the problem will still occur. On Fri, Mar 20, 2015 at 3:08 PM, Gourav Sengupta <gourav.sengu...@gmail.com<mailto:gourav.sengu...@gmail.com>> wrote: Hi Ralf, using secret keys and authorization details is a strict NO for AWS, they are major security lapses and should be avoided at any cost. Have you tried starting the clusters using ROLES, they are wonderful way to start clusters or EC2 nodes and you do not have to copy and paste any permissions either. Try going through this article in AWS: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-iam-roles.html (though for datapipeline, they show the correct set of permissions to enable). I start EC2 nodes using roles (as mentioned in the link above), run the aws cli commands (without copying any keys or files). Please let me know if the issue was resolved. Regards, Gourav On Fri, Mar 20, 2015 at 1:53 PM, Ralf Heyde <r...@hubrick.com<mailto:r...@hubrick.com>> wrote: Hey, We want to run a Job, accessing S3, from EC2 instances. The Job runs in a self-provided Spark Cluster (1.3.0) on EC2 instances. In Irland everything works as expected. i just tried to move data from Irland -> Frankfurt. AWS S3 is forcing v4 of their API there, means: access is only possible via: AWS4-HMAC-SHA256 This is still ok, but I dont get access there. What I tried already: All of the Approaches I tried with these URLs: A) "s3n://<key>:<secret>@<bucket>/<path>/" B) "s3://<key>:<secret>@<bucket>/<path>/" C) "s3n://<bucket>/<path>/" D) "s3://<bucket>/<path>/" 1a. setting Environment Variables in the operating system 1b. found something, to set AccessKey/Secret in SparkConf like that (I guess, this does not have any effect) sc.set("AWS_ACCESS_KEY_ID", id) sc.set("AWS_SECRET_ACCESS_KEY", secret) 2. tried to use a "more up to date" jets3t client (somehow I was not able to get the "new" version running) 3. tried in-URL basic authentication (A+B) 4. Setting the hadoop configuration: hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3.S3FileSystem"); hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key); hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret); hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3.S3FileSystem"); hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey"); hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret"); --> Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/%2FEAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message><RequestId>43F8F02E767DC4A2</RequestId><HostId>wgMeAEYcZZa/2BazQ9TA+PAkUxt5l+ExnT4Emb+1Uk5KhWfJu5C8Xcesm1AXCfJ9nZJMyh4wPX8=</HostId></Error> 2. setting Hadoop Configuration hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem"); hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key); hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret); hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem"); hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey"); hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret"); --> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' - ResponseCode=400, ResponseMessage=Bad Request 5. without Hadoop Config Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). 6. without Hadoop Config but passed in S3 URL with A) Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' - ResponseCode=400, ResponseMessage=Bad Request with B) Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). Drilled down in the Job, I can see, that the RestStorageService recognizes AWS4-HMAC-SHA256 ... but somehow it gets a ResponseCode 400 (log below) -> i replaced the key / encoded secret with XXX_*_XXX: 15/03/20 11:25:31 WARN RestStorageService: Retrying request with "AWS4-HMAC-SHA256" signing mechanism: GET https://frankfurt.ingestion.batch.s3.amazonaws.com:443/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F<https://frankfurt.ingestion.batch.s3.amazonaws.com/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F> HTTP/1.1 15/03/20 11:25:31 WARN RestStorageService: Retrying request following error response: GET '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/' -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: Fri, 20 Mar 2015 11:25:31 GMT, Authorization: AWS XXX_MY_KEY_XXX:XXX_I_GUESS_SECRET_XXX], Response Headers: [x-amz-request-id: 7E6F85873D69D14E, x-amz-id-2: rGFW+kRfURzz3DlY/m/M8h054MmHu8bxJAtKVHUmov/VY7pBXvtMvbQTXxA7bffpu4xxf4rGmL4=, x-amz-region: eu-central-1, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:31 GMT, Connection: close, Server: AmazonS3] 15/03/20 11:25:32 WARN RestStorageService: Retrying request after automatic adjustment of Host endpoint from "frankfurt.ingestion.batch.s3.amazonaws.com<http://frankfurt.ingestion.batch.s3.amazonaws.com/>" to "frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com<http://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com/>" following request signing error using AWS request signing version 4: GET https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com:443/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/<https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/> HTTP/1.1 15/03/20 11:25:32 WARN RestStorageService: Retrying request following error response: GET '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/' -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: Fri, 20 Mar 2015 11:25:31 GMT, x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, Host: frankfurt.ingestion.batch.s3.amazonaws.com<http://frankfurt.ingestion.batch.s3.amazonaws.com/>, x-amz-date: 20150320T112531Z, Authorization: AWS4-HMAC-SHA256 Credential=XXX_MY_KEY_XXX/20150320/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=2098d3175c4304e44be912b770add7594d1d1b44f545c3025be1748672ec60e4], Response Headers: [x-amz-request-id: 5CABCD0D3046B267, x-amz-id-2: V65tW1lbSybbN3R3RMKBjJFz7xUgJDubSUm/XKXTypg7qfDtkSFRt2I9CMo2Qo2OAA+E44hiazg=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:32 GMT, Connection: close, Server: AmazonS3] Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3n://frankfurt.ingestion.batch/EAN/2015-03-09-72640385/input/HotelImageList.gz Do you have any Ideas? Was somebody of you already able to access S3 in Frankfurt, if so - how? Cheers Ralf