Hi everyone, I tried to manipulate MySQL tables from spark, I do not want to move these tables from MySQL to spark, as these tables can easily get very big. It is ideal that the data stays in the database where it was stored. For me, spark is only used to speed up the read and write process (as I am more a data analyst rather than an application developer). So I did not install hadoop. People here have helped me a lot, but I still cannot connect MySQL to spark, possible reasons are, for instance, java version, java files location, connector files location, MySQL version, environment variable location, the use of jdbc or odbc, and so on. My questions are:
1. Do we need to install hadoop and java before installing spark? 2. Which version of each of these package are stable for successful installation and connection, if anyone had any possible experience? (the solutions online might worked on older version of these packages, but seems not working anymore in my case, I’m on mac by the way). 3. So far, the only way I tried successfully is to utilize the sqldf package on SparkR to connect MySQL, but does it mean that spark is working (to speed up the process) when I run the sql queries with sqldf package on SparkR? I hope I described my questions clearly. Thank you very much for the help. Best regards, YA --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org