Chhavi Bansal created SPARK-48423:
-------------------------------------

             Summary: Unable to write MLPipeline to blob storage using .option 
attribute
                 Key: SPARK-48423
                 URL: https://issues.apache.org/jira/browse/SPARK-48423
             Project: Spark
          Issue Type: Bug
          Components: ML, MLlib, Spark Core
    Affects Versions: 3.4.3
            Reporter: Chhavi Bansal


I am trying to write mllib pipeline with a series of stages set in it to azure 
blob storage giving relevant write parameters, but it still complains of 
`fs.azure.account.key` not being found in the configuration.

Sharing the code.
{code:java}
val spark = 
SparkSession.builder().appName("main").master("local[4]").getOrCreate()

import spark.implicits._

val df = spark.createDataFrame(Seq(
  (0L, "a b c d e spark"),
  (1L, "b d")
)).toDF("id", "text") 

val si = new StringIndexer().setInputCol("text").setOutputCol("IND_text")
val pipelinee = new Pipeline().setStages(Array(si))
val pipelineModel = pipelinee.fit(df)
val path = BLOB_STORAGE_PATH

pipelineModel.write
.option("spark.hadoop.fs.azure.account.key.<account_name>.dfs.core.windows.net",
  "__").option("fs.azure.account.key.<account_name>.dfs.core.windows.net", 
"__").option("fs.azure.account.oauth2.client.endpoint.<account_name>.dfs.core.windows.net",
 
"__").option("fs.azure.account.oauth2.client.id.<account_name>.dfs.core.windows.net",
 
"__").option("fs.azure.account.auth.type.<account_name>.dfs.core.windows.net","__").option("fs.azure.account.oauth2.client.secret.<account_name>.dfs.core.windows.net",
 
"__").option("fs.azure.account.oauth.provider.type.<account_name>.dfs.core.windows.net",
 "__")
.save(path){code}
 

The error that i get is 
{code:java}
 Failure to initialize configuration
Caused by: InvalidConfigurationValueException: Invalid configuration value 
detected for fs.azure.account.key
at 
org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:51)
    at 
org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:548)
    at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1449){code}
This shows that even though the key,value of 
{code:java}
spark.hadoop.fs.azure.account.key.<account_name>.dfs.core.windows.net {code}
is being sent via option param, but is not being set internally.

 

while this works only if i explicitly set the values in the
{code:java}
spark.conf.set(key,value) {code}
which might be problematic for a multi-tenant solution, which can be using the 
same spark context.

 

My Ask is that like dataframe write.options(Map<key,value>) helps set this 
configuration, the option(key1, value1) should also work to write to azure blob 
storage.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to