He lives on after all! and thanks for the continued feedback. We need the answers to these questions using HS2:
1. what is the output of "ps -ef | grep -i hiveserver2" on your system? in particular what is the value of -Xmx ? 2. does "select * from table limit 1" work? Thanks, Stephen. On Tue, Feb 18, 2014 at 6:32 AM, David Gayou <david.ga...@kxen.com> wrote: > I'm so sorry, i wrote an answer, and i forgot to sent it.... > And i haven't been able to work on this for a few days. > > > So far : > > I have a 15k columns table and 50k rows. > > I do not see any changes if i change the storage. > > > *Hive 12.0* > > My test query is "select * from bigtable" > > > If i use the hive cli, it works fine. > > If i use hiveserver1 + ODBC : it works fine > > If i use hiverserver2 + odbc or hiverserver2 + beeline,i have this java > exception : > > 2014-02-18 13:22:22,571 ERROR thrift.ProcessFunction > (ProcessFunction.java:process(41)) - Internal error processing FetchResults > > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2734) > at java.util.ArrayList.ensureCapacity(ArrayList.java:167) > at java.util.ArrayList.add(ArrayList.java:351) > at > org.apache.hive.service.cli.thrift.TRow.addToColVals(TRow.java:160) > at > org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) > at > org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) > at > org.apache.hive.service.cli.operation.SQLOperation.prepareFromRow(SQLOperation.java:270) > at > org.apache.hive.service.cli.operation.SQLOperation.decode(SQLOperation.java:262) > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:246) > at > org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:171) > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:438) > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:346) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:407) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) > > > > > *From the SVN trunk* : (for the HIVE-3746) > > With the maven change, most of the documentation and wiki are out of date. > Compiling from trunk was not that easy and i may have failed some steps > but : > > It has the same behavior. It works in CLI and hiveserver1. > It fails with hiveserver 2. > > > Regards > > David Gayou > > > > > > On Thu, Feb 13, 2014 at 3:11 AM, Navis류승우 <navis....@nexr.com> wrote: > >> With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes >> less memory than before. >> >> Could you try it with the version in trunk? >> >> >> 2014-02-13 10:49 GMT+09:00 Stephen Sprague <sprag...@gmail.com>: >> >> question to the original poster. closure appreciated! >>> >>> >>> On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague <sprag...@gmail.com>wrote: >>> >>>> thanks Ed. And on a separate tact lets look at Hiveserver2. >>>> >>>> >>>> @OP> >>>> >>>> *I've tried to look around on how i can change the thrift heap size but >>>> haven't found anything.* >>>> >>>> >>>> looking at my hiveserver2 i find this: >>>> >>>> $ ps -ef | grep -i hiveserver2 >>>> dwr 9824 20479 0 12:11 pts/1 00:00:00 grep -i hiveserver2 >>>> dwr 28410 1 0 00:05 ? 00:01:04 >>>> /usr/lib/jvm/java-6-sun/jre/bin/java >>>> *-Xmx256m*-Dhadoop.log.dir=/usr/lib/hadoop/logs >>>> -Dhadoop.log.file=hadoop.log >>>> -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= >>>> -Dhadoop.root.logger=INFO,console >>>> -Djava.library.path=/usr/lib/hadoop/lib/native >>>> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true >>>> -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar >>>> /usr/lib/hive/lib/hive-service-0.12.0.jar >>>> org.apache.hive.service.server.HiveServer2 >>>> >>>> >>>> >>>> >>>> questions: >>>> >>>> 1. what is the output of "ps -ef | grep -i hiveserver2" on your >>>> system? in particular what is the value of -Xmx ? >>>> >>>> 2. can you restart your hiveserver with -Xmx1g? or some value that >>>> makes sense to your system? >>>> >>>> >>>> >>>> Lots of questions now. we await your answers! :) >>>> >>>> >>>> >>>> On Fri, Jan 31, 2014 at 11:51 AM, Edward Capriolo < >>>> edlinuxg...@gmail.com> wrote: >>>> >>>>> Final table compression should not effect the de serialized size of >>>>> the data over the wire. >>>>> >>>>> >>>>> On Fri, Jan 31, 2014 at 2:49 PM, Stephen Sprague >>>>> <sprag...@gmail.com>wrote: >>>>> >>>>>> Excellent progress David. So. What the most important thing here >>>>>> we learned was that it works (!) by running hive in local mode and that >>>>>> this error is a limitation in the HiveServer2. That's important. >>>>>> >>>>>> so textfile storage handler and having issues converting it to ORC. >>>>>> hmmm. >>>>>> >>>>>> follow-ups. >>>>>> >>>>>> 1. what is your query that fails? >>>>>> >>>>>> 2. can you add a "limit 1" to the end of your query and tell us if >>>>>> that works? this'll tell us if it's column or row bound. >>>>>> >>>>>> 3. bonus points. run these in local mode: >>>>>> > set hive.exec.compress.output=true; >>>>>> > set mapred.output.compression.type=BLOCK; >>>>>> > set >>>>>> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; >>>>>> > create table blah stored as ORC as select * from <your >>>>>> table>; #i'm curious if this'll work. >>>>>> > show create table blah; #send output back if previous step >>>>>> worked. >>>>>> >>>>>> 4. extra bonus. change ORC to SEQUENCEFILE in #3 see if that works >>>>>> any differently. >>>>>> >>>>>> >>>>>> >>>>>> I'm wondering if compression would have any effect on the size of the >>>>>> internal ArrayList the thrift server uses. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Jan 31, 2014 at 9:21 AM, David Gayou <david.ga...@kxen.com>wrote: >>>>>> >>>>>>> Ok, so here are some news : >>>>>>> >>>>>>> I tried to boost the HADOOP_HEAPSIZE to 8192, >>>>>>> I also setted the mapred.child.java.opts to 512M >>>>>>> >>>>>>> And it doesn't seem's to have any effect. >>>>>>> ------ >>>>>>> >>>>>>> I tried it using an ODBC driver => fail after few minutes. >>>>>>> Using a local JDBC (beeline) => running forever without any error. >>>>>>> >>>>>>> Both through hiveserver 2 >>>>>>> >>>>>>> If i use the local mode : it works! (but that not really what i >>>>>>> need, as i don't really how to access it with my software) >>>>>>> >>>>>>> ------ >>>>>>> I use a text file as storage. >>>>>>> I tried to use ORC, but i can't populate it with a load data (it >>>>>>> return an error of file format). >>>>>>> >>>>>>> Using an "ALTER TABLE orange_large_train_3 SET FILEFORMAT ORC" after >>>>>>> populating the table, i have a file format error on select. >>>>>>> >>>>>>> ------ >>>>>>> >>>>>>> @Edward : >>>>>>> >>>>>>> I've tried to look around on how i can change the thrift heap size >>>>>>> but haven't found anything. >>>>>>> Same thing for my client (haven't found how to change the heap size) >>>>>>> >>>>>>> My usecase is really to have the most possible columns. >>>>>>> >>>>>>> >>>>>>> Thanks a lot for your help >>>>>>> >>>>>>> >>>>>>> Regards >>>>>>> >>>>>>> David >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo < >>>>>>> edlinuxg...@gmail.com> wrote: >>>>>>> >>>>>>>> Ok here are the problem(s). Thrift has frame size limits, thrift >>>>>>>> has to buffer rows into memory. >>>>>>>> >>>>>>>> Hove thrift has a heap size, it needs to big in this case. >>>>>>>> >>>>>>>> Your client needs a big heap size as well. >>>>>>>> >>>>>>>> The way to do this query if it is possible may be turning row >>>>>>>> lateral, potwntially by treating it as a list, it will make queries on >>>>>>>> it >>>>>>>> awkward. >>>>>>>> >>>>>>>> Good luck >>>>>>>> >>>>>>>> >>>>>>>> On Thursday, January 30, 2014, Stephen Sprague <sprag...@gmail.com> >>>>>>>> wrote: >>>>>>>> > oh. thinking some more about this i forgot to ask some other >>>>>>>> basic questions. >>>>>>>> > >>>>>>>> > a) what storage format are you using for the table (text, >>>>>>>> sequence, rcfile, orc or custom)? "show create table <table>" would >>>>>>>> yield >>>>>>>> that. >>>>>>>> > >>>>>>>> > b) what command is causing the stack trace? >>>>>>>> > >>>>>>>> > my thinking here is rcfile and orc are column based (i think) and >>>>>>>> if you don't select all the columns that could very well limit the >>>>>>>> size of >>>>>>>> the "row" being returned and hence the size of the internal ArrayList. >>>>>>>> OTOH, if you're using "select *", um, you have my sympathies. :) >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague < >>>>>>>> sprag...@gmail.com> wrote: >>>>>>>> > >>>>>>>> > thanks for the information. Up-to-date hive. Cluster on the >>>>>>>> smallish side. And, well, sure looks like a memory issue. :) rather >>>>>>>> than >>>>>>>> an inherent hive limitation that is. >>>>>>>> > >>>>>>>> > So. I can only speak as a user (ie. not a hive developer) but >>>>>>>> what i'd be interested in knowing next is is this via running hive in >>>>>>>> local >>>>>>>> mode, correct? (eg. not through hiveserver1/2). And it looks like it >>>>>>>> boinks on array processing which i assume to be internal code arrays >>>>>>>> and >>>>>>>> not hive data arrays - your 15K columns are all scalar/simple types, >>>>>>>> correct? Its clearly fetching results and looks be trying to store >>>>>>>> them in >>>>>>>> a java array - and not just one row but a *set* of rows (ArrayList) >>>>>>>> > >>>>>>>> > two things to try. >>>>>>>> > >>>>>>>> > 1. boost the heap-size. try 8192. And I don't know if >>>>>>>> HADOOP_HEAPSIZE is the controller of that. I woulda hoped it was called >>>>>>>> something like "HIVE_HEAPSIZE". :) Anyway, can't hurt to try. >>>>>>>> > >>>>>>>> > 2. trim down the number of columns and see where the breaking >>>>>>>> point is. is it 10K? is it 5K? The idea is to confirm its _the >>>>>>>> number of >>>>>>>> columns_ that is causing the memory to blow and not some other artifact >>>>>>>> unbeknownst to us. >>>>>>>> > >>>>>>>> > 3. Google around the Hive namespace for something that might >>>>>>>> limit or otherwise control the number of rows stored at once in Hive's >>>>>>>> internal buffer. I snoop around too. >>>>>>>> > >>>>>>>> > >>>>>>>> > That's all i got for now and maybe we'll get lucky and someone on >>>>>>>> this list will know something or another about this. :) >>>>>>>> > >>>>>>>> > cheers, >>>>>>>> > Stephen. >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > On Thu, Jan 30, 2014 at 2:32 AM, David Gayou < >>>>>>>> david.ga...@kxen.com> wrote: >>>>>>>> > >>>>>>>> > We are using the Hive 0.12.0, but it doesn't work better on hive >>>>>>>> 0.11.0 or hive 0.10.0 >>>>>>>> > Our hadoop version is 1.1.2. >>>>>>>> > Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU >>>>>>>> (with hyperthreading so 4 cores per machine) + 16Gb Ram each >>>>>>>> > >>>>>>>> > The error message i get is : >>>>>>>> > >>>>>>>> > 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction >>>>>>>> (ProcessFunction.java:process(41)) - Internal error processing >>>>>>>> FetchResults >>>>>>>> > java.lang.OutOfMemoryError: Java heap space >>>>>>>> > at java.util.Arrays.copyOf(Arrays.java:2734) >>>>>>>> > at java.util.ArrayList.ensureCapacity(ArrayList.java:167) >>>>>>>> > at java.util.ArrayList.add(ArrayList.java:351) >>>>>>>> > at org.apache.hive.service.cli.Row.<init>(Row.java:47) >>>>>>>> > at >>>>>>>> org.apache.hive.service.cli.RowSet.addRow(RowSet.java:61) >>>>>>>> > at >>>>>>>> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235) >>>>>>>> > at >>>>>>>> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170) >>>>>>>> > at >>>>>>>> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417) >>>>>>>> > at >>>>>>>> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306) >>>>>>>> > at >>>>>>>> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386) >>>>>>>> > at >>>>>>>> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) >>>>>>>> > at >>>>>>>> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358) >>>>>>>> > at >>>>>>>> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) >>>>>>>> > at >>>>>>>> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) >>>>>>>> > at >>>>>>>> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) >>>>>>>> > at >>>>>>>> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) >>>>>>>> > at java.security.AccessCont >>>>>>>> >>>>>>>> -- >>>>>>>> Sorry this was sent from mobile. Will do less grammar and spell >>>>>>>> check than usual. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >