Please see https://github.com/databricks/spark-xml/issues/92
On Fri, Jun 17, 2016 at 5:19 AM, VG <vlin...@gmail.com> wrote: > I am using spark-xml for loading data and creating a data frame. > > If xml element has sub elements and values, then it works fine. Example > if the xml element is like > > <a val="1"> > <b>test</b> > </a> > > however if the xml element is bare with just attributes, then it does not > work - Any suggestions. > <a val="1" /> Does not load the data > > > > Any suggestions to fix this > > > > > > > On Fri, Jun 17, 2016 at 4:28 PM, Siva A <siva9940261...@gmail.com> wrote: > >> Use Spark XML version,0.3.3 >> <dependency> >> <groupId>com.databricks</groupId> >> <artifactId>spark-xml_2.10</artifactId> >> <version>0.3.3</version> >> </dependency> >> >> On Fri, Jun 17, 2016 at 4:25 PM, VG <vlin...@gmail.com> wrote: >> >>> Hi Siva >>> >>> This is what i have for jars. Did you manage to run with these or >>> different versions ? >>> >>> >>> <dependency> >>> <groupId>org.apache.spark</groupId> >>> <artifactId>spark-core_2.10</artifactId> >>> <version>1.6.1</version> >>> </dependency> >>> <dependency> >>> <groupId>org.apache.spark</groupId> >>> <artifactId>spark-sql_2.10</artifactId> >>> <version>1.6.1</version> >>> </dependency> >>> <dependency> >>> <groupId>com.databricks</groupId> >>> <artifactId>spark-xml_2.10</artifactId> >>> <version>0.2.0</version> >>> </dependency> >>> <dependency> >>> <groupId>org.scala-lang</groupId> >>> <artifactId>scala-library</artifactId> >>> <version>2.10.6</version> >>> </dependency> >>> >>> Thanks >>> VG >>> >>> >>> On Fri, Jun 17, 2016 at 4:16 PM, Siva A <siva9940261...@gmail.com> >>> wrote: >>> >>>> Hi Marco, >>>> >>>> I did run in IDE(Intellij) as well. It works fine. >>>> VG, make sure the right jar is in classpath. >>>> >>>> --Siva >>>> >>>> On Fri, Jun 17, 2016 at 4:11 PM, Marco Mistroni <mmistr...@gmail.com> >>>> wrote: >>>> >>>>> and your eclipse path is correct? >>>>> i suggest, as Siva did before, to build your jar and run it via >>>>> spark-submit by specifying the --packages option >>>>> it's as simple as run this command >>>>> >>>>> spark-submit --packages >>>>> com.databricks:spark-xml_<scalaversion>:<packageversion> --class <Name >>>>> of >>>>> your class containing main> <path to your jar> >>>>> >>>>> Indeed, if you have only these lines to run, why dont you try them in >>>>> spark-shell ? >>>>> >>>>> hth >>>>> >>>>> On Fri, Jun 17, 2016 at 11:32 AM, VG <vlin...@gmail.com> wrote: >>>>> >>>>>> nopes. eclipse. >>>>>> >>>>>> >>>>>> On Fri, Jun 17, 2016 at 3:58 PM, Siva A <siva9940261...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> If you are running from IDE, Are you using Intellij? >>>>>>> >>>>>>> On Fri, Jun 17, 2016 at 3:20 PM, Siva A <siva9940261...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Can you try to package as a jar and run using spark-submit >>>>>>>> >>>>>>>> Siva >>>>>>>> >>>>>>>> On Fri, Jun 17, 2016 at 3:17 PM, VG <vlin...@gmail.com> wrote: >>>>>>>> >>>>>>>>> I am trying to run from IDE and everything else is working fine. >>>>>>>>> I added spark-xml jar and now I ended up into this dependency >>>>>>>>> >>>>>>>>> 6/06/17 15:15:57 INFO BlockManagerMaster: Registered BlockManager >>>>>>>>> Exception in thread "main" *java.lang.NoClassDefFoundError: >>>>>>>>> scala/collection/GenTraversableOnce$class* >>>>>>>>> at >>>>>>>>> org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.<init>(ddl.scala:150) >>>>>>>>> at >>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:154) >>>>>>>>> at >>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >>>>>>>>> at >>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >>>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >>>>>>>>> Caused by:* java.lang.ClassNotFoundException: >>>>>>>>> scala.collection.GenTraversableOnce$class* >>>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>>>>>> ... 5 more >>>>>>>>> 16/06/17 15:15:58 INFO SparkContext: Invoking stop() from shutdown >>>>>>>>> hook >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jun 17, 2016 at 2:59 PM, Marco Mistroni < >>>>>>>>> mmistr...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> So you are using spark-submit or spark-shell? >>>>>>>>>> >>>>>>>>>> you will need to launch either by passing --packages option (like >>>>>>>>>> in the example below for spark-csv). you will need to iknow >>>>>>>>>> >>>>>>>>>> --packages com.databricks:spark-xml_<scala.version>:<package >>>>>>>>>> version> >>>>>>>>>> >>>>>>>>>> hth >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jun 17, 2016 at 10:20 AM, VG <vlin...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Apologies for that. >>>>>>>>>>> I am trying to use spark-xml to load data of a xml file. >>>>>>>>>>> >>>>>>>>>>> here is the exception >>>>>>>>>>> >>>>>>>>>>> 16/06/17 14:49:04 INFO BlockManagerMaster: Registered >>>>>>>>>>> BlockManager >>>>>>>>>>> Exception in thread "main" java.lang.ClassNotFoundException: >>>>>>>>>>> Failed to find data source: org.apache.spark.xml. Please find >>>>>>>>>>> packages at >>>>>>>>>>> http://spark-packages.org >>>>>>>>>>> at >>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77) >>>>>>>>>>> at >>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102) >>>>>>>>>>> at >>>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) >>>>>>>>>>> at >>>>>>>>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) >>>>>>>>>>> at org.ariba.spark.PostsProcessing.main(PostsProcessing.java:19) >>>>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>>>> org.apache.spark.xml.DefaultSource >>>>>>>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>>>>>>>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) >>>>>>>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>>>>>>>>>> at >>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >>>>>>>>>>> at >>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4$$anonfun$apply$1.apply(ResolvedDataSource.scala:62) >>>>>>>>>>> at scala.util.Try$.apply(Try.scala:192) >>>>>>>>>>> at >>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >>>>>>>>>>> at >>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$4.apply(ResolvedDataSource.scala:62) >>>>>>>>>>> at scala.util.Try.orElse(Try.scala:84) >>>>>>>>>>> at >>>>>>>>>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:62) >>>>>>>>>>> ... 4 more >>>>>>>>>>> >>>>>>>>>>> Code >>>>>>>>>>> SQLContext sqlContext = new SQLContext(sc); >>>>>>>>>>> DataFrame df = sqlContext.read() >>>>>>>>>>> .format("org.apache.spark.xml") >>>>>>>>>>> .option("rowTag", "row") >>>>>>>>>>> .load("A.xml"); >>>>>>>>>>> >>>>>>>>>>> Any suggestions please .. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jun 17, 2016 at 2:42 PM, Marco Mistroni < >>>>>>>>>>> mmistr...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> too little info >>>>>>>>>>>> it'll help if you can post the exception and show your sbt file >>>>>>>>>>>> (if you are using sbt), and provide minimal details on what you >>>>>>>>>>>> are doing >>>>>>>>>>>> kr >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jun 17, 2016 at 10:08 AM, VG <vlin...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Failed to find data source: com.databricks.spark.xml >>>>>>>>>>>>> >>>>>>>>>>>>> Any suggestions to resolve this >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >