> > df = spark.sqlContext.read.csv('out/df_in.csv') > shouldn't this be just - df = spark.read.csv('out/df_in.csv') sparkSession itself is in entry point to dataframes and SQL functionality .
our bootstrap is a bit messy, in our case no. In the general case yes. On 9 May 2017 at 16:56, Pushkar.Gujar <pushkarvgu...@gmail.com> wrote: > df = spark.sqlContext.read.csv('out/df_in.csv') >> > > shouldn't this be just - > > df = spark.read.csv('out/df_in.csv') > > sparkSession itself is in entry point to dataframes and SQL functionality . > > > Thank you, > *Pushkar Gujar* > > > On Tue, May 9, 2017 at 6:09 PM, Mark Hamstra <m...@clearstorydata.com> > wrote: > >> Looks to me like it is a conflict between a Databricks library and Spark >> 2.1. That's an issue for Databricks to resolve or provide guidance. >> >> On Tue, May 9, 2017 at 2:36 PM, lucas.g...@gmail.com < >> lucas.g...@gmail.com> wrote: >> >>> I'm a bit confused by that answer, I'm assuming it's spark deciding >>> which lib to use. >>> >>> On 9 May 2017 at 14:30, Mark Hamstra <m...@clearstorydata.com> wrote: >>> >>>> This looks more like a matter for Databricks support than spark-user. >>>> >>>> On Tue, May 9, 2017 at 2:02 PM, lucas.g...@gmail.com < >>>> lucas.g...@gmail.com> wrote: >>>> >>>>> df = spark.sqlContext.read.csv('out/df_in.csv') >>>>>> >>>>> >>>>> >>>>>> 17/05/09 15:51:29 WARN ObjectStore: Version information not found in >>>>>> metastore. hive.metastore.schema.verification is not enabled so >>>>>> recording the schema version 1.2.0 >>>>>> 17/05/09 15:51:29 WARN ObjectStore: Failed to get database default, >>>>>> returning NoSuchObjectException >>>>>> 17/05/09 15:51:30 WARN ObjectStore: Failed to get database >>>>>> global_temp, returning NoSuchObjectException >>>>>> >>>>> >>>>> >>>>>> Py4JJavaError: An error occurred while calling o72.csv. >>>>>> : java.lang.RuntimeException: Multiple sources found for csv >>>>>> (*com.databricks.spark.csv.DefaultSource15, >>>>>> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat*), >>>>>> please specify the fully qualified class name. >>>>>> at scala.sys.package$.error(package.scala:27) >>>>>> at org.apache.spark.sql.execution.datasources.DataSource$.looku >>>>>> pDataSource(DataSource.scala:591) >>>>>> at org.apache.spark.sql.execution.datasources.DataSource.provid >>>>>> ingClass$lzycompute(DataSource.scala:86) >>>>>> at org.apache.spark.sql.execution.datasources.DataSource.provid >>>>>> ingClass(DataSource.scala:86) >>>>>> at org.apache.spark.sql.execution.datasources.DataSource.resolv >>>>>> eRelation(DataSource.scala:325) >>>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.sc >>>>>> ala:152) >>>>>> at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.sca >>>>>> la:415) >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>>>>> ssorImpl.java:57) >>>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>>>>> thodAccessorImpl.java:43) >>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) >>>>>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) >>>>>> at py4j.Gateway.invoke(Gateway.java:280) >>>>>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.j >>>>>> ava:132) >>>>>> at py4j.commands.CallCommand.execute(CallCommand.java:79) >>>>>> at py4j.GatewayConnection.run(GatewayConnection.java:214) at >>>>>> java.lang.Thread.run(Thread.java:745) >>>>> >>>>> >>>>> When I change our call to: >>>>> >>>>> df = spark.hiveContext.read \ >>>>> >>>>> .format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat') >>>>> \ >>>>> .load('df_in.csv) >>>>> >>>>> No such issue, I was under the impression (obviously wrongly) that >>>>> spark would automatically pick the local lib. We have the databricks >>>>> library because other jobs still explicitly call it. >>>>> >>>>> Is the 'correct answer' to go through and modify so as to remove the >>>>> databricks lib / remove it from our deploy? Or should this just work? >>>>> >>>>> One of the things I find less helpful in the spark docs are when >>>>> there's multiple ways to do it but no clear guidance on what those methods >>>>> are intended to accomplish. >>>>> >>>>> Thanks! >>>>> >>>> >>>> >>> >> >