Good Idea, will try that. But assuming, "only" data is located there, the problem will still occur.
On Fri, Mar 20, 2015 at 3:08 PM, Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > Hi Ralf, > > using secret keys and authorization details is a strict NO for AWS, they > are major security lapses and should be avoided at any cost. > > Have you tried starting the clusters using ROLES, they are wonderful way > to start clusters or EC2 nodes and you do not have to copy and paste any > permissions either. > > Try going through this article in AWS: > http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-iam-roles.html > (though for datapipeline, they show the correct set of permissions to > enable). > > I start EC2 nodes using roles (as mentioned in the link above), run the > aws cli commands (without copying any keys or files). > > Please let me know if the issue was resolved. > > Regards, > Gourav > > On Fri, Mar 20, 2015 at 1:53 PM, Ralf Heyde <r...@hubrick.com> wrote: > >> Hey, >> >> We want to run a Job, accessing S3, from EC2 instances. The Job runs in a >> self-provided Spark Cluster (1.3.0) on EC2 instances. In Irland everything >> works as expected. >> >> i just tried to move data from Irland -> Frankfurt. AWS S3 is forcing v4 >> of their API there, means: access is only possible via: AWS4-HMAC-SHA256 >> >> This is still ok, but I dont get access there. What I tried already: >> >> All of the Approaches I tried with these URLs: >> A) "s3n://<key>:<secret>@<bucket>/<path>/" >> B) "s3://<key>:<secret>@<bucket>/<path>/" >> C) "s3n://<bucket>/<path>/" >> D) "s3://<bucket>/<path>/" >> >> 1a. setting Environment Variables in the operating system >> 1b. found something, to set AccessKey/Secret in SparkConf like that (I >> guess, this does not have any effect) >> sc.set("AWS_ACCESS_KEY_ID", id) >> sc.set("AWS_SECRET_ACCESS_KEY", secret) >> >> 2. tried to use a "more up to date" jets3t client (somehow I was not able >> to get the "new" version running) >> 3. tried in-URL basic authentication (A+B) >> 4. Setting the hadoop configuration: >> hadoopConfiguration.set("fs.s3n.impl", >> "org.apache.hadoop.fs.s3.S3FileSystem"); >> hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key); >> hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret); >> >> hadoopConfiguration.set("fs.s3.impl", >> "org.apache.hadoop.fs.s3.S3FileSystem"); >> hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey"); >> hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret"); >> >> --> >> Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for >> '/%2FEAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' XML Error >> Message: <?xml version="1.0" >> encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The >> authorization mechanism you have provided is not supported. Please use >> AWS4-HMAC-SHA256.</Message><RequestId>43F8F02E767DC4A2</RequestId><HostId>wgMeAEYcZZa/2BazQ9TA+PAkUxt5l+ExnT4Emb+1Uk5KhWfJu5C8Xcesm1AXCfJ9nZJMyh4wPX8=</HostId></Error> >> >> 2. setting Hadoop Configuration >> hadoopConfiguration.set("fs.s3n.impl", >> "org.apache.hadoop.fs.s3native.NativeS3FileSystem"); >> hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key); >> hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret); >> >> hadoopConfiguration.set("fs.s3.impl", >> "org.apache.hadoop.fs.s3native.NativeS3FileSystem"); >> hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey"); >> hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret"); >> >> --> >> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed >> for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' - >> ResponseCode=400, ResponseMessage=Bad Request >> >> 5. without Hadoop Config >> Exception in thread "main" java.lang.IllegalArgumentException: AWS Access >> Key ID and Secret Access Key must be specified as the username or password >> (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or >> fs.s3.awsSecretAccessKey properties (respectively). >> >> 6. without Hadoop Config but passed in S3 URL >> with A) Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: >> org.jets3t.service.S3ServiceException: S3 HEAD request failed for >> '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' - >> ResponseCode=400, ResponseMessage=Bad Request >> with B) Exception in thread "main" java.lang.IllegalArgumentException: >> AWS Access Key ID and Secret Access Key must be specified as the username >> or password (respectively) of a s3 URL, or by setting the >> fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively). >> >> >> Drilled down in the Job, I can see, that the RestStorageService >> recognizes AWS4-HMAC-SHA256 ... but somehow it gets a ResponseCode 400 (log >> below) -> i replaced the key / encoded secret with XXX_*_XXX: >> >> 15/03/20 11:25:31 WARN RestStorageService: Retrying request with >> "AWS4-HMAC-SHA256" signing mechanism: GET >> https://frankfurt.ingestion.batch.s3.amazonaws.com:443/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F >> HTTP/1.1 >> 15/03/20 11:25:31 WARN RestStorageService: Retrying request following >> error response: GET >> '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/' >> -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: >> Fri, 20 Mar 2015 11:25:31 GMT, Authorization: AWS >> XXX_MY_KEY_XXX:XXX_I_GUESS_SECRET_XXX], Response Headers: >> [x-amz-request-id: 7E6F85873D69D14E, x-amz-id-2: >> rGFW+kRfURzz3DlY/m/M8h054MmHu8bxJAtKVHUmov/VY7pBXvtMvbQTXxA7bffpu4xxf4rGmL4=, >> x-amz-region: eu-central-1, Content-Type: application/xml, >> Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:31 GMT, >> Connection: close, Server: AmazonS3] >> 15/03/20 11:25:32 WARN RestStorageService: Retrying request after >> automatic adjustment of Host endpoint from " >> frankfurt.ingestion.batch.s3.amazonaws.com" to " >> frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com" following >> request signing error using AWS request signing version 4: GET >> https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com:443/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/ >> HTTP/1.1 >> 15/03/20 11:25:32 WARN RestStorageService: Retrying request following >> error response: GET >> '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/' >> -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: >> Fri, 20 Mar 2015 11:25:31 GMT, x-amz-content-sha256: >> e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, Host: >> frankfurt.ingestion.batch.s3.amazonaws.com, x-amz-date: >> 20150320T112531Z, Authorization: AWS4-HMAC-SHA256 >> Credential=XXX_MY_KEY_XXX/20150320/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=2098d3175c4304e44be912b770add7594d1d1b44f545c3025be1748672ec60e4], >> Response Headers: [x-amz-request-id: 5CABCD0D3046B267, x-amz-id-2: >> V65tW1lbSybbN3R3RMKBjJFz7xUgJDubSUm/XKXTypg7qfDtkSFRt2I9CMo2Qo2OAA+E44hiazg=, >> Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20 >> Mar 2015 11:25:32 GMT, Connection: close, Server: AmazonS3] >> Exception in thread "main" >> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: >> s3n://frankfurt.ingestion.batch/EAN/2015-03-09-72640385/input/HotelImageList.gz >> >> >> Do you have any Ideas? Was somebody of you already able to access S3 in >> Frankfurt, if so - how? >> >> Cheers Ralf >> >> >> >