Chhavi Bansal created SPARK-48423: ------------------------------------- Summary: Unable to write MLPipeline to blob storage using .option attribute Key: SPARK-48423 URL: https://issues.apache.org/jira/browse/SPARK-48423 Project: Spark Issue Type: Bug Components: ML, MLlib, Spark Core Affects Versions: 3.4.3 Reporter: Chhavi Bansal
I am trying to write mllib pipeline with a series of stages set in it to azure blob storage giving relevant write parameters, but it still complains of `fs.azure.account.key` not being found in the configuration. Sharing the code. {code:java} val spark = SparkSession.builder().appName("main").master("local[4]").getOrCreate() import spark.implicits._ val df = spark.createDataFrame(Seq( (0L, "a b c d e spark"), (1L, "b d") )).toDF("id", "text") val si = new StringIndexer().setInputCol("text").setOutputCol("IND_text") val pipelinee = new Pipeline().setStages(Array(si)) val pipelineModel = pipelinee.fit(df) val path = BLOB_STORAGE_PATH pipelineModel.write .option("spark.hadoop.fs.azure.account.key.<account_name>.dfs.core.windows.net", "__").option("fs.azure.account.key.<account_name>.dfs.core.windows.net", "__").option("fs.azure.account.oauth2.client.endpoint.<account_name>.dfs.core.windows.net", "__").option("fs.azure.account.oauth2.client.id.<account_name>.dfs.core.windows.net", "__").option("fs.azure.account.auth.type.<account_name>.dfs.core.windows.net","__").option("fs.azure.account.oauth2.client.secret.<account_name>.dfs.core.windows.net", "__").option("fs.azure.account.oauth.provider.type.<account_name>.dfs.core.windows.net", "__") .save(path){code} The error that i get is {code:java} Failure to initialize configuration Caused by: InvalidConfigurationValueException: Invalid configuration value detected for fs.azure.account.key at org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:51) at org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:548) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1449){code} This shows that even though the key,value of {code:java} spark.hadoop.fs.azure.account.key.<account_name>.dfs.core.windows.net {code} is being sent via option param, but is not being set internally. while this works only if i explicitly set the values in the {code:java} spark.conf.set(key,value) {code} which might be problematic for a multi-tenant solution, which can be using the same spark context. My Ask is that like dataframe write.options(Map<key,value>) helps set this configuration, the option(key1, value1) should also work to write to azure blob storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org