Re: question regarding pyspark

Pushkar.Gujar Fri, 21 Apr 2017 17:32:55 -0700

 Hi Afshin,

If you need to associate header information from 2nd file to first one i.e.
, you can do that with specifying custom schema. Below is example from
spark-csv package.   As you can guess, you will have to do some
pre-processing to create customSchema by first reading second file .


val customSchema = StructType(Array(
    StructField("year", IntegerType, true),
    StructField("make", StringType, true),
    StructField("model", StringType, true),
    StructField("comment", StringType, true),
    StructField("blank", StringType, true)))
val df = sqlContext.read
    .format("com.databricks.spark.csv")
    .option("header", "true") // Use first line of all files as header
    .schema(customSchema)
    .load("cars.csv")



Thank you,
*Pushkar Gujar*


On Fri, Apr 21, 2017 at 7:37 PM, Afshin, Bardia <
bardia.afs...@capitalone.com> wrote:

> I’m ingesting a CSV with hundreds of columns and the original CSV file
> it’self doesn’t have any header. I do have a separate file that is just the
> headers, is there a way to tell Spark API this information when loading the
> CSV file? Or do I have to do some preprocesisng before doing so?
>
>
>
> Thanks,
>
> Bardia Afshin
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: question regarding pyspark

Reply via email to