Hi Chetan,

                  You can have a static parquet file created, and when you
create a data frame you can pass the location of both the files, with
option mergeSchema true. This will always fetch you a dataframe even if the
original file is not present.

Kuchekar, Nilesh


On Sat, May 9, 2020 at 10:46 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Have you tried catching error when you are creating a dataframe?
>
> import scala.util.{Try, Success, Failure}
> val df = Try(spark.read.
>      format("com.databricks.spark.xml").
>            option("rootTag", "hierarchy").
>            option("rowTag", "sms_request").
>            load("/tmp/broadcast.xml")) match {
>                case Success(df) => df
>                case Failure(exception) => throw new Exception("foo")
>           }
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 9 May 2020 at 22:51, Chetan Khatri <chetan.opensou...@gmail.com>
> wrote:
>
>> Hi Spark Users,
>>
>> I've a spark job where I am reading the parquet path, and that parquet
>> path data is generated by other systems, some of the parquet paths doesn't
>> contains any data which is possible. is there a any way to read the parquet
>> if no data found I can create a dummy dataframe and go ahead.
>>
>> One way is to check path exists like
>>
>>  val conf = spark.sparkContext.hadoopConfiguration
>>     val fs = org.apache.hadoop.fs.FileSystem.get(conf)
>>     val currentAreaExists = fs.exists(new
>> org.apache.hadoop.fs.Path(consumableCurrentArea))
>>
>> But I don't want to check this for 300 parquets, just if data doesn't
>> exist in the parquet path go with the dummy parquet / custom DataFrame
>>
>> AnalysisException: u'Unable to infer schema for Parquet. It must be
>> specified manually.;'
>>
>> Thanks
>>
>

Reply via email to