I believe, it's not about the location (i.e., local machine or HDFS) but it's all about the format of the input file. For example, I am getting the following error while trying to read an input file in libsvm format:
*Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: libsvm. * The application works fine on Eclipse. However, while packaging the corresponding jar file, I am getting the above error which is really weird! Regards, _________________________________ *Md. Rezaul Karim* BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html <http://139.59.184.114/index.html> On 7 December 2016 at 23:39, Iman Mohtashemi <iman.mohtash...@gmail.com> wrote: > No but I tried that too and still didn't work. Where are the files being > read from? From the local machine or HDFS? Do I need to get the files to > HDFS first? In Eclipse I just point to the location of the directory? > > On Wed, Dec 7, 2016 at 3:34 PM Md. Rezaul Karim < > rezaul.ka...@insight-centre.org> wrote: > >> Hi, >> >> You should prepare your jar file (from your Spark application written in >> Java) with all the necessary dependencies. You can create a Maven project >> on Eclipse by specifying the dependencies in a Maven friendly pom.xml file. >> >> For building the jar with the dependencies and *main class (since you >> are getting the **ClassNotFoundException)* your pom.xml should contain >> the following in the *build *tag (example main class is marked in Red >> color): >> >> <build> >> <plugins> >> <!-- download source code in Eclipse, best practice --> >> <plugin> >> <groupId>org.apache.maven.plugins</groupId> >> <artifactId>maven-eclipse-plugin</artifactId> >> <version>2.9</version> >> <configuration> >> <downloadSources>true</downloadSources> >> <downloadJavadocs>false</downloadJavadocs> >> </configuration> >> </plugin> >> <!-- Set a compiler level --> >> <plugin> >> <groupId>org.apache.maven.plugins</groupId> >> <artifactId>maven-compiler-plugin</artifactId> >> <version>3.5.1</version> >> <configuration> >> <source>${jdk.version}</source> >> <target>${jdk.version}</target> >> </configuration> >> </plugin> >> <plugin> >> <groupId>org.apache.maven.plugins</groupId> >> <artifactId>maven-shade-plugin</artifactId> >> <version>2.4.3</version> >> <configuration> >> <shadeTestJar>true</shadeTestJar> >> </configuration> >> </plugin> >> <!-- Maven Assembly Plugin --> >> <plugin> >> <groupId>org.apache.maven.plugins</groupId> >> <artifactId>maven-assembly-plugin</artifactId> >> <version>2.4.1</version> >> <configuration> >> <!-- get all project dependencies --> >> <descriptorRefs> >> <descriptorRef>jar-with- >> dependencies</descriptorRef> >> </descriptorRefs> >> <!-- MainClass in mainfest make a executable jar --> >> <archive> >> <manifest> >> <mainClass>com.example. >> RandomForest.SongPrediction</mainClass> >> </manifest> >> </archive> >> >> <property> >> <name>oozie.launcher. >> mapreduce.job.user.classpath.first</name> >> <value>true</value> >> </property> >> >> </configuration> >> <executions> >> <execution> >> <id>make-assembly</id> >> <!-- bind to the packaging phase --> >> <phase>package</phase> >> <goals> >> <goal>single</goal> >> </goals> >> </execution> >> </executions> >> </plugin> >> </plugins> >> </build> >> >> >> An example pom.xml file has been attached for your reference. Feel free >> to reuse it. >> >> >> Regards, >> _________________________________ >> *Md. Rezaul Karim,* BSc, MSc >> PhD Researcher, INSIGHT Centre for Data Analytics >> National University of Ireland, Galway >> IDA Business Park, Dangan, Galway, Ireland >> Web: http://www.reza-analytics.eu/index.html >> <http://139.59.184.114/index.html> >> >> On 7 December 2016 at 23:18, im281 <iman.mohtash...@gmail.com> wrote: >> >> Hello, >> I have a simple word count example in Java and I can run this in Eclipse >> (code at the bottom) >> >> I then create a jar file from it and try to run it from the cmd >> >> >> java -jar C:\Users\Owner\Desktop\wordcount.jar Data/testfile.txt >> >> But I get this error? >> >> I think the main error is: >> *Exception in thread "main" java.lang.ClassNotFoundException: Failed to >> find >> data source: text* >> >> Any advise on how to run this jar file in spark would be appreciated >> >> >> Using Spark's default log4j profile: >> org/apache/spark/log4j-defaults.properties >> 16/12/07 15:16:41 INFO SparkContext: Running Spark version 2.0.2 >> 16/12/07 15:16:42 INFO SecurityManager: Changing view acls to: Owner >> 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls to: Owner >> 16/12/07 15:16:42 INFO SecurityManager: Changing view acls groups to: >> 16/12/07 15:16:42 INFO SecurityManager: Changing modify acls groups to: >> 16/12/07 15:16:42 INFO SecurityManager: SecurityManager: authentication >> disabled; ui acls disabled; users with view permissions: Set(Owner); >> groups >> with view permissions: Set(); users with modify permissions: Set(Owner); >> groups with modify permissions: Set() >> 16/12/07 15:16:44 INFO Utils: Successfully started service 'sparkDriver' >> on >> port 10211. >> 16/12/07 15:16:44 INFO SparkEnv: Registering MapOutputTracker >> 16/12/07 15:16:44 INFO SparkEnv: Registering BlockManagerMaster >> 16/12/07 15:16:44 INFO DiskBlockManager: Created local directory at >> C:\Users\Owner\AppData\Local\Temp\blockmgr-b4b1960b-08fc- >> 44fd-a75e-1a0450556873 >> 16/12/07 15:16:44 INFO MemoryStore: MemoryStore started with capacity >> 1984.5 >> MB >> 16/12/07 15:16:45 INFO SparkEnv: Registering OutputCommitCoordinator >> 16/12/07 15:16:45 INFO Utils: Successfully started service 'SparkUI' on >> port >> 4040. >> 16/12/07 15:16:45 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at >> http://192.168.19.2:4040 >> 16/12/07 15:16:45 INFO Executor: Starting executor ID driver on host >> localhost >> 16/12/07 15:16:45 INFO Utils: Successfully started service >> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 10252. >> 16/12/07 15:16:45 INFO NettyBlockTransferService: Server created on >> 192.168.19.2:10252 >> 16/12/07 15:16:45 INFO BlockManagerMaster: Registering BlockManager >> BlockManagerId(driver, 192.168.19.2, 10252) >> 16/12/07 15:16:45 INFO BlockManagerMasterEndpoint: Registering block >> manager >> 192.168.19.2:10252 with 1984.5 MB RAM, BlockManagerId(driver, >> 192.168.19.2, >> 10252) >> 16/12/07 15:16:45 INFO BlockManagerMaster: Registered BlockManager >> BlockManagerId(driver, 192.168.19.2, 10252) >> 16/12/07 15:16:46 WARN SparkContext: Use an existing SparkContext, some >> configuration may not take effect. >> 16/12/07 15:16:46 INFO SharedState: Warehouse path is >> 'file:/C:/Users/Owner/spark-warehouse'. >> Exception in thread "main" java.lang.ClassNotFoundException: Failed to >> find >> data source: text. Please find packages at >> https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects >> at >> org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource( >> DataSource.scala:148) >> at >> org.apache.spark.sql.execution.datasources.DataSource.providingClass$ >> lzycompute(DataSource.scala:79) >> at >> org.apache.spark.sql.execution.datasources.DataSource.providingClass( >> DataSource.scala:79) >> at >> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation( >> DataSource.scala:340) >> at >> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) >> at >> org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:504) >> at >> org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:540) >> at >> org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:513) >> at JavaWordCount.main(JavaWordCount.java:57) >> Caused by: java.lang.ClassNotFoundException: text.DefaultSource >> at java.net.URLClassLoader.findClass(Unknown Source) >> at java.lang.ClassLoader.loadClass(Unknown Source) >> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) >> at java.lang.ClassLoader.loadClass(Unknown Source) >> at >> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$ >> anonfun$apply$1.apply(DataSource.scala:132) >> at >> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$ >> anonfun$apply$1.apply(DataSource.scala:132) >> at scala.util.Try$.apply(Try.scala:192) >> at >> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply( >> DataSource.scala:132) >> at >> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply( >> DataSource.scala:132) >> at scala.util.Try.orElse(Try.scala:84) >> at >> org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource( >> DataSource.scala:132) >> ... 8 more >> 16/12/07 15:16:46 INFO SparkContext: Invoking stop() from shutdown hook >> 16/12/07 15:16:46 INFO SparkUI: Stopped Spark web UI at >> http://192.168.19.2:4040 >> 16/12/07 15:16:46 INFO MapOutputTrackerMasterEndpoint: >> MapOutputTrackerMasterEndpoint stopped! >> 16/12/07 15:16:46 INFO MemoryStore: MemoryStore cleared >> 16/12/07 15:16:46 INFO BlockManager: BlockManager stopped >> 16/12/07 15:16:46 INFO BlockManagerMaster: BlockManagerMaster stopped >> 16/12/07 15:16:46 INFO >> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: >> OutputCommitCoordinator stopped! >> 16/12/07 15:16:46 INFO SparkContext: Successfully stopped SparkContext >> 16/12/07 15:16:46 INFO ShutdownHookManager: Shutdown hook called >> 16/12/07 15:16:46 INFO ShutdownHookManager: Deleting directory >> C:\Users\Owner\AppData\Local\Temp\spark-dab2587b-a794-4947- >> ac13-d40056cf71d8 >> >> C:\Users\Owner> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> public final class JavaWordCount { >> private static final Pattern SPACE = Pattern.compile(" "); >> >> public static void main(String[] args) throws Exception { >> >> if (args.length < 1) { >> System.err.println("Usage: JavaWordCount <file>"); >> System.exit(1); >> } >> >> //boiler plate needed to run locally >> SparkConf conf = new SparkConf().setAppName("Word Count >> Application").setMaster("local[*]"); >> JavaSparkContext sc = new JavaSparkContext(conf); >> >> SparkSession spark = SparkSession >> .builder() >> .appName("Word Count") >> .getOrCreate() >> .newSession(); >> >> >> JavaRDD<String> lines = spark.read().textFile(args[0]).javaRDD(); >> >> >> JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, >> String>() { >> @Override >> public Iterator<String> call(String s) { >> return Arrays.asList(SPACE.split(s)).iterator(); >> } >> }); >> >> JavaPairRDD<String, Integer> ones = words.mapToPair( >> new PairFunction<String, String, Integer>() { >> @Override >> public Tuple2<String, Integer> call(String s) { >> return new Tuple2<>(s, 1); >> } >> }); >> >> JavaPairRDD<String, Integer> counts = ones.reduceByKey( >> new Function2<Integer, Integer, Integer>() { >> @Override >> public Integer call(Integer i1, Integer i2) { >> return i1 + i2; >> } >> }); >> >> List<Tuple2<String, Integer>> output = counts.collect(); >> for (Tuple2<?,?> tuple : output) { >> System.out.println(tuple._1() + ": " + tuple._2()); >> } >> spark.stop(); >> } >> } >> >> >> >> >> -- >> View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Running-spark-from-Eclipse-and-then- >> Jar-tp28182.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >>