yeah but... is the glass half-full or half-empty? sure this might suck but keep your head high, bro! Lots of it (hive) does work. :)
On Fri, Mar 17, 2017 at 2:25 PM, hernan saab <hernan_javier_s...@yahoo.com> wrote: > Stephan, > > Thanks for the response. > > The one thing that I don't appreciate from those who promote and DOCUMENT > spark on hive is that, seemingly, there is absolutely no evidence seen that > says that hive on spark WORKS. > As a matter of fact, after a lot of pain, I noticed it is not supported by > just about anybody. > > If someone dares to document Hive on Spark (see link > https://cwiki.apache.org/confluence/display/Hive/Hive+ > on+Spark%3A+Getting+Started) why can't they have the decency to mention > what specific combo of Hadoop/Spark/Hive versions used that works? Have a > git repo included in a doc with all the right versions and libraries. Why > not? We can start from there and progressively use newer libraries in case > the doc becomes stale. I am not really asking much, I just want to know > what the documenter used to claim that Hive on Spark works, that's it. > > Clearly, for most cases, this setup is broken and it misleads people to > waste time on a broken setup. > > I love this tech. But I do notice that there is some mean spirited or very > negligent actions made by the apache development community. Documenting > hive on spark while knowing it won't work for most cases means apache > developers don't give a crap about the time wasted by people like us. > > > > > On Friday, March 17, 2017 1:14 PM, Edward Capriolo <edlinuxg...@gmail.com> > wrote: > > > > > On Fri, Mar 17, 2017 at 2:56 PM, hernan saab <hernan_javier_s...@yahoo.com > > wrote: > > I have been in a similar world of pain. Basically, I tried to use an > external Hive to have user access controls with a spark engine. > At the end, I realized that it was a better idea to use apache tez instead > of a spark engine for my particular case. > > But the journey is what I want to share with you. > The big data apache tools and libraries such as Hive, Tez, Spark, Hadoop , > Parquet etc etc are not interchangeable as we would like to think. There > are very limited combinations for very specific versions. This is why tools > like Ambari can be useful. Ambari sets a path of combos of versions known > to work and the dirty work is done under the UI. > > More often than not, when you try a version that few people tried, you > will get error messages that will derailed you and cause you to waste a lot > of time. > > In addition, this group, as well as many other apache big data user > groups, provides extremely poor support for users. The answers you usually > get are not even hints to a solution. Their answers usually translate to > "there is nothing I am willing to do about your problem. If I did, I should > get paid" in many cryptic ways. > > If you ask your question to the Spark group they will take you to the Hive > group and viceversa (I can almost guarantee it based on previous > experiences) > > But in hindsight, people who work on this kinds of things typically make > more money that the average developers. If you make more $$s it makes sense > learning this stuff is supposed to be harder. > > Conclusion, don't try it. Or try using Tez/Hive instead of Spark/Hive if > you are querying large files. > > > > On Friday, March 17, 2017 11:33 AM, Stephen Sprague <sprag...@gmail.com> > wrote: > > > :( gettin' no love on this one. any SME's know if Spark 2.1.0 will work > with Hive 2.1.0 ? That JavaSparkListener class looks like a deal breaker > to me, alas. > > thanks in advance. > > Cheers, > Stephen. > > On Mon, Mar 13, 2017 at 10:32 PM, Stephen Sprague <sprag...@gmail.com> > wrote: > > hi guys, > wondering where we stand with Hive On Spark these days? > > i'm trying to run Spark 2.1.0 with Hive 2.1.0 (purely coincidental > versions) and running up against this class not found: > > java.lang. NoClassDefFoundError: org/apache/spark/ JavaSparkListener > > > searching the Cyber i find this: > 1. http://stackoverflow.com/ questions/41953688/setting- > spark-as-default-execution- engine-for-hive > <http://stackoverflow.com/questions/41953688/setting-spark-as-default-execution-engine-for-hive> > > which pretty much describes my situation too and it references this: > > > 2. https://issues.apache.org/ jira/browse/SPARK-17563 > <https://issues.apache.org/jira/browse/SPARK-17563> > > which indicates a "won't fix" - but does reference this: > > > 3. https://issues.apache.org/ jira/browse/HIVE-14029 > <https://issues.apache.org/jira/browse/HIVE-14029> > > which looks to be fixed in hive 2.2 - which is not released yet. > > > so if i want to use spark 2.1.0 with hive am i out of luck - until hive > 2.2? > > thanks, > Stephen. > > > > > > Stephan, > > I understand some of your frustration. Remember that many in open source > are volunteering their time. This is why if you pay a vendor for support of > some software you might pay 50K a year or $200.00 an hour. If I was your > vendor/consultant I would have started the clock 10 minutes ago just to > answer this email :). The only "pay" I ever got from Hive is that I can use > it as a resume bullet point, and I wrote a book which pays me royalties. > > As it relates specifically to your problem, when you see the trends you > are seeing it probably means you are in a minority of the user base. Either > your doing something no one else is doing, you are too cutting edge, or no > one has an easy solution. Hive is making the move from the classic > MapReduce, two other execution engines have been made Tez and HiveOnSpark. > Because we are open source we allow people to "scratch an itch" that is the > Apache way. From time to time in means something that was added stops being > viable because of lack of support. > > I agree with your final assessment which is Tez is the most viable engine > for Hive. This is by no means a put down of the HiveOnSpark work and it > does not mean it will never the most viable. By the same token if the > versions fall out of sync and all that exists is complains the viability > speaks for itself. > > Remember that keeping two fast moving things together is no easy chore. I > used to run Hive + cassandra. Seems easy, crap two versions of common CLI, > shade one version everything works, crap new hive release has different > versions of thrift, shade + patch, crap now one of the other dependencies > is incompatible fork + shade + patch. At some point you have to say to > yourself if I can not make critical mass of this solution such that I am > the only one doing/patching it then I give up and find some other way to do > it. > > >