I think there might be still something messed up with the classpath. It complains in the logs about deprecated jars and deprecated configuration files.
> On 21 Sep 2016, at 22:21, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > Well I am left to use Spark for importing data from RDBMS table to Hadoop. > > You may argue why and it is because Spark does it in one process and no errors > > With sqoop I am getting this error message which leaves the RDBMS table data > on HDFS file but stops there. > > 2016-09-21 21:00:15,084 [myid:] - INFO [main:OraOopLog@103] - Data Connector > for Oracle and Hadoop is disabled. > 2016-09-21 21:00:15,095 [myid:] - INFO [main:SqlManager@98] - Using default > fetchSize of 1000 > 2016-09-21 21:00:15,095 [myid:] - INFO [main:CodeGenTool@92] - Beginning > code generation > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/data6/hduser/hbase-0.98.21-hadoop2/lib/phoenix-4.8.0-HBase-0.98-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/data6/hduser/hbase-0.98.21-hadoop2/lib/phoenix-4.8.0-HBase-0.98-hive.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/data6/hduser/hbase-0.98.21-hadoop2/lib/phoenix-4.8.0-HBase-0.98-thin-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/data6/hduser/hbase-0.98.21-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/home/hduser/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > 2016-09-21 21:00:15,681 [myid:] - INFO [main:OracleManager@417] - Time zone > has been set to GMT > 2016-09-21 21:00:15,717 [myid:] - INFO [main:SqlManager@757] - Executing SQL > statement: select * from sh.sales where (1 = 0) > 2016-09-21 21:00:15,727 [myid:] - INFO [main:SqlManager@757] - Executing SQL > statement: select * from sh.sales where (1 = 0) > 2016-09-21 21:00:15,748 [myid:] - INFO [main:CompilationManager@94] - > HADOOP_MAPRED_HOME is /home/hduser/hadoop-2.7.3/share/hadoop/mapreduce > Note: > /tmp/sqoop-hduser/compile/82dcf5975118b5e271b442e547201fdf/QueryResult.java > uses or overrides a deprecated API. > Note: Recompile with -Xlint:deprecation for details. > 2016-09-21 21:00:17,354 [myid:] - INFO [main:CompilationManager@330] - > Writing jar file: > /tmp/sqoop-hduser/compile/82dcf5975118b5e271b442e547201fdf/QueryResult.jar > 2016-09-21 21:00:17,366 [myid:] - INFO [main:ImportJobBase@237] - Beginning > query import. > 2016-09-21 21:00:17,511 [myid:] - WARN [main:NativeCodeLoader@62] - Unable > to load native-hadoop library for your platform... using builtin-java classes > where applicable > 2016-09-21 21:00:17,516 [myid:] - INFO [main:Configuration@840] - mapred.jar > is deprecated. Instead, use mapreduce.job.jar > 2016-09-21 21:00:17,993 [myid:] - INFO [main:Configuration@840] - > mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps > 2016-09-21 21:00:18,094 [myid:] - INFO [main:RMProxy@56] - Connecting to > ResourceManager at rhes564/50.140.197.217:8032 > 2016-09-21 21:00:23,441 [myid:] - INFO [main:DBInputFormat@192] - Using read > commited transaction isolation > 2016-09-21 21:00:23,442 [myid:] - INFO [main:DataDrivenDBInputFormat@147] - > BoundingValsQuery: SELECT MIN(prod_id), MAX(prod_id) FROM (select * from > sh.sales where (1 = 1) ) t1 > 2016-09-21 21:00:23,540 [myid:] - INFO [main:JobSubmitter@394] - number of > splits:4 > 2016-09-21 21:00:23,547 [myid:] - INFO [main:Configuration@840] - > mapred.job.name is deprecated. Instead, use mapreduce.job.name > 2016-09-21 21:00:23,547 [myid:] - INFO [main:Configuration@840] - > mapred.cache.files.timestamps is deprecated. Instead, use > mapreduce.job.cache.files.timestamps > 2016-09-21 21:00:23,547 [myid:] - INFO [main:Configuration@840] - > mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class > 2016-09-21 21:00:23,547 [myid:] - INFO [main:Configuration@840] - > mapreduce.inputformat.class is deprecated. Instead, use > mapreduce.job.inputformat.class > 2016-09-21 21:00:23,547 [myid:] - INFO [main:Configuration@840] - > mapreduce.outputformat.class is deprecated. Instead, use > mapreduce.job.outputformat.class > 2016-09-21 21:00:23,548 [myid:] - INFO [main:Configuration@840] - > mapred.output.value.class is deprecated. Instead, use > mapreduce.job.output.value.class > 2016-09-21 21:00:23,548 [myid:] - INFO [main:Configuration@840] - > mapred.output.dir is deprecated. Instead, use > mapreduce.output.fileoutputformat.outputdir > 2016-09-21 21:00:23,548 [myid:] - INFO [main:Configuration@840] - > mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files > 2016-09-21 21:00:23,548 [myid:] - INFO [main:Configuration@840] - > mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir > 2016-09-21 21:00:23,548 [myid:] - INFO [main:Configuration@840] - > mapred.job.classpath.files is deprecated. Instead, use > mapreduce.job.classpath.files > 2016-09-21 21:00:23,548 [myid:] - INFO [main:Configuration@840] - user.name > is deprecated. Instead, use mapreduce.job.user.name > 2016-09-21 21:00:23,548 [myid:] - INFO [main:Configuration@840] - > mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces > 2016-09-21 21:00:23,549 [myid:] - INFO [main:Configuration@840] - > mapred.cache.files.filesizes is deprecated. Instead, use > mapreduce.job.cache.files.filesizes > 2016-09-21 21:00:23,549 [myid:] - INFO [main:Configuration@840] - > mapred.output.key.class is deprecated. Instead, use > mapreduce.job.output.key.class > 2016-09-21 21:00:23,656 [myid:] - INFO [main:JobSubmitter@477] - Submitting > tokens for job: job_1474455325627_0045 > 2016-09-21 21:00:23,955 [myid:] - INFO [main:YarnClientImpl@174] - Submitted > application application_1474455325627_0045 to ResourceManager at > rhes564/50.140.197.217:8032 > 2016-09-21 21:00:23,980 [myid:] - INFO [main:Job@1272] - The url to track > the job: http://http://rhes564:8088/proxy/application_1474455325627_0045/ > 2016-09-21 21:00:23,981 [myid:] - INFO [main:Job@1317] - Running job: > job_1474455325627_0045 > 2016-09-21 21:00:31,180 [myid:] - INFO [main:Job@1338] - Job > job_1474455325627_0045 running in uber mode : false > 2016-09-21 21:00:31,182 [myid:] - INFO [main:Job@1345] - map 0% reduce 0% > 2016-09-21 21:00:40,260 [myid:] - INFO [main:Job@1345] - map 25% reduce 0% > 2016-09-21 21:00:44,283 [myid:] - INFO [main:Job@1345] - map 50% reduce 0% > 2016-09-21 21:00:48,308 [myid:] - INFO [main:Job@1345] - map 75% reduce 0% > 2016-09-21 21:00:55,346 [myid:] - INFO [main:Job@1345] - map 100% reduce 0% > 2016-09-21 21:00:56,359 [myid:] - INFO [main:Job@1356] - Job > job_1474455325627_0045 completed successfully > 2016-09-21 21:00:56,501 [myid:] - ERROR [main:ImportTool@607] - Imported > Failed: No enum constant org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS > > > > > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > >> On 21 September 2016 at 20:56, Michael Segel <michael_se...@hotmail.com> >> wrote: >> Uhmmm… >> >> A bit of a longer-ish answer… >> >> Spark may or may not be faster than sqoop. The standard caveats apply… YMMV. >> >> The reason I say this… you have a couple of limiting factors. The main one >> being the number of connections allowed with the target RDBMS. >> >> Then there’s the data distribution within the partitions / ranges in the >> database. >> By this, I mean that using any parallel solution, you need to run copies of >> your query in parallel over different ranges within the database. Most of >> the time you may run the query over a database where there is even >> distribution… if not, then you will have one thread run longer than the >> others. Note that this is a problem that both solutions would face. >> >> Then there’s the cluster itself. >> Again YMMV on your spark job vs a Map/Reduce job. >> >> In terms of launching the job, setup, etc … the spark job could take longer >> to setup. But on long running queries, that becomes noise. >> >> The issue is what makes the most sense to you, where do you have the most >> experience, and what do you feel the most comfortable in using. >> >> The other issue is what do you do with the data (RDDs,DataSets, Frames, etc) >> once you have read the data? >> >> >> HTH >> >> -Mike >> >> PS. I know that I’m responding to an earlier message in the thread, but this >> is something that I’ve heard lots of questions about… and its not a simple >> thing to answer… Since this is a batch process. The performance issues are >> moot. >> >>> On Aug 24, 2016, at 5:07 PM, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>> Personally I prefer Spark JDBC. >>> >>> Both Sqoop and Spark rely on the same drivers. >>> >>> I think Spark is faster and if you have many nodes you can partition your >>> incoming data and take advantage of Spark DAG + in memory offering. >>> >>> By default Sqoop will use Map-reduce which is pretty slow. >>> >>> Remember for Spark you will need to have sufficient memory >>> >>> HTH >>> >>> Dr Mich Talebzadeh >>> >>> LinkedIn >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> >>> http://talebzadehmich.wordpress.com >>> >>> Disclaimer: Use it at your own risk. Any and all responsibility for any >>> loss, damage or destruction of data or any other property which may arise >>> from relying on this email's technical content is explicitly disclaimed. >>> The author will in no case be liable for any monetary damages arising from >>> such loss, damage or destruction. >>> >>> >>>> On 24 August 2016 at 22:39, Venkata Penikalapati >>>> <mail.venkatakart...@gmail.com> wrote: >>>> Team, >>>> Please help me in choosing sqoop or spark jdbc to fetch data from rdbms. >>>> Sqoop has lot of optimizations to fetch data does spark jdbc also has >>>> those ? >>>> >>>> I'm performing few analytics using spark data for which data is residing >>>> in rdbms. >>>> >>>> Please guide me with this. >>>> >>>> >>>> Thanks >>>> Venkata Karthik P >>>> >>> >> >