Hi Daniel, can you give it a try in the IBM's Analytics for Spark, the fix has been in for a week now
thanks Mario From: Daniel Lopes <dan...@onematch.com.br> To: Mario Ds Briggs/India/IBM@IBMIN Cc: Adam Roberts <arobe...@uk.ibm.com>, user <user@spark.apache.org>, Steve Loughran <ste...@hortonworks.com>, Sachin Aggarwal4/India/IBM@IBMIN Date: 14/09/2016 01:19 am Subject: Re: Fw: Spark + Parquet + IBM Block Storage at Bluemix Hi Mario, Thanks for your help, so I will keeping using CSVs Best, Daniel Lopes Chief Data and Analytics Officer | OneMatch c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes www.onematch.com.br On Mon, Sep 12, 2016 at 3:39 PM, Mario Ds Briggs <mario.bri...@in.ibm.com> wrote: Daniel, I believe it is related to https://issues.apache.org/jira/browse/SPARK-13979 and happens only when task fails in a executor (probably for some other reason u hit the latter in parquet and not csv). The PR in there, should be shortly available in IBM's Analytics for Spark. thanks Mario Inactive hide details for Adam Roberts---12/09/2016 09:37:21 pm---Mario, incase you've not seen this...Adam Roberts---12/09/2016 09:37:21 pm---Mario, incase you've not seen this... From: Adam Roberts/UK/IBM To: Mario Ds Briggs/India/IBM@IBMIN Date: 12/09/2016 09:37 pm Subject: Fw: Spark + Parquet + IBM Block Storage at Bluemix Mario, incase you've not seen this... Adam Roberts IBM Spark Team Lead Runtime Technologies - Hursley ----- Forwarded by Adam Roberts/UK/IBM on 12/09/2016 17:06 ----- From: Daniel Lopes <dan...@onematch.com.br> To: Steve Loughran <ste...@hortonworks.com> Cc: user <user@spark.apache.org> Date: 12/09/2016 13:05 Subject: Re: Spark + Parquet + IBM Block Storage at Bluemix Thanks Steve, But this error occurs only with parquet files, CSVs works. Best, Daniel Lopes Chief Data and Analytics Officer | OneMatch c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes www.onematch.com.br On Sun, Sep 11, 2016 at 3:28 PM, Steve Loughran <ste...@hortonworks.com> wrote: On 9 Sep 2016, at 17:56, Daniel Lopes < dan...@onematch.com.br> wrote: Hi, someone can help I'm trying to use parquet in IBM Block Storage at Spark but when I try to load get this error: using this config credentials = { "name": "keystone", "auth_url": "https://identity.open.softlayer.com", "project": "object_storage_23f274c1_d11XXXXXXXXXXXXXXXe634", "projectId": "XXXXXXd9c4aa39b7c7eCCCCCCCCb", "region": "dallas", "userId": "XXXXX64087180b40XXXXX2b909", "username": "admin_XXXX9dd810f8901d48778XXXXXX", "password": "chXXXXXXXXXXXXX6_", "domainId": "c1ddad17cfcXXXXXXXXX41", "domainName": "10XXXXXX", "role": "admin" } def set_hadoop_config(credentials): """This function sets the Hadoop configuration with given credentials, so it is possible to access data using SparkContext""" prefix = "fs.swift.service." + credentials['name'] hconf = sc._jsc.hadoopConfiguration() hconf.set(prefix + ".auth.url", credentials ['auth_url']+'/v3/auth/tokens') hconf.set(prefix + ".auth.endpoint.prefix", "endpoints") hconf.set(prefix + ".tenant", credentials ['projectId']) hconf.set(prefix + ".username", credentials ['userId']) hconf.set(prefix + ".password", credentials ['password']) hconf.setInt(prefix + ".http.port", 8080) hconf.set(prefix + ".region", credentials ['region']) hconf.setBoolean(prefix + ".public", True) set_hadoop_config(credentials) ------------------------------------------------- Py4JJavaErrorTraceback (most recent call last) <ipython-input-55-5a14928215eb> in <module>() ----> 1 train.groupby('Acordo').count().show() Py4JJavaError: An error occurred while calling o406.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 60 in stage 30.0 failed 10 times, most recent failure: Lost task 60.9 in stage 30.0 (TID 2556, yp-spark-dal09-env5-0039): org.apache.hadoop.fs.swift.exceptions.SwiftConfigurationException: Missing mandatory configuration option: fs.swift.service.keystone.auth.url In my own code, I'd assume that the value of credentials['name'] didn't match that of the URL, assuming you have something like swift://bucket.keystone . Failing that: the options were set too late. Instead of asking for the hadoop config and editing that, set the option in your spark context, before it is launched, with the prefix "hadoop" at org.apache.hadoop.fs.swift.http.RestClientBindings.copy (RestClientBindings.java:223) at org.apache.hadoop.fs.swift.http.RestClientBindings.bind (RestClientBindings.java:147) Daniel Lopes Chief Data and Analytics Officer | OneMatch c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes www.onematch.com.br