Hi Afshin, If you need to associate header information from 2nd file to first one i.e. , you can do that with specifying custom schema. Below is example from spark-csv package. As you can guess, you will have to do some pre-processing to create customSchema by first reading second file .
val customSchema = StructType(Array( StructField("year", IntegerType, true), StructField("make", StringType, true), StructField("model", StringType, true), StructField("comment", StringType, true), StructField("blank", StringType, true))) val df = sqlContext.read .format("com.databricks.spark.csv") .option("header", "true") // Use first line of all files as header .schema(customSchema) .load("cars.csv") Thank you, *Pushkar Gujar* On Fri, Apr 21, 2017 at 7:37 PM, Afshin, Bardia < bardia.afs...@capitalone.com> wrote: > I’m ingesting a CSV with hundreds of columns and the original CSV file > it’self doesn’t have any header. I do have a separate file that is just the > headers, is there a way to tell Spark API this information when loading the > CSV file? Or do I have to do some preprocesisng before doing so? > > > > Thanks, > > Bardia Afshin > > ------------------------------ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. >