With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes less memory than before.
Could you try it with the version in trunk? 2014-02-13 10:49 GMT+09:00 Stephen Sprague <sprag...@gmail.com>: > question to the original poster. closure appreciated! > > > On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague <sprag...@gmail.com>wrote: > >> thanks Ed. And on a separate tact lets look at Hiveserver2. >> >> >> @OP> >> >> *I've tried to look around on how i can change the thrift heap size but >> haven't found anything.* >> >> >> looking at my hiveserver2 i find this: >> >> $ ps -ef | grep -i hiveserver2 >> dwr 9824 20479 0 12:11 pts/1 00:00:00 grep -i hiveserver2 >> dwr 28410 1 0 00:05 ? 00:01:04 >> /usr/lib/jvm/java-6-sun/jre/bin/java >> *-Xmx256m*-Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log >> -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= >> -Dhadoop.root.logger=INFO,console >> -Djava.library.path=/usr/lib/hadoop/lib/native >> -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true >> -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar >> /usr/lib/hive/lib/hive-service-0.12.0.jar >> org.apache.hive.service.server.HiveServer2 >> >> >> >> >> questions: >> >> 1. what is the output of "ps -ef | grep -i hiveserver2" on your >> system? in particular what is the value of -Xmx ? >> >> 2. can you restart your hiveserver with -Xmx1g? or some value that >> makes sense to your system? >> >> >> >> Lots of questions now. we await your answers! :) >> >> >> >> On Fri, Jan 31, 2014 at 11:51 AM, Edward Capriolo >> <edlinuxg...@gmail.com>wrote: >> >>> Final table compression should not effect the de serialized size of the >>> data over the wire. >>> >>> >>> On Fri, Jan 31, 2014 at 2:49 PM, Stephen Sprague <sprag...@gmail.com>wrote: >>> >>>> Excellent progress David. So. What the most important thing here we >>>> learned was that it works (!) by running hive in local mode and that this >>>> error is a limitation in the HiveServer2. That's important. >>>> >>>> so textfile storage handler and having issues converting it to ORC. >>>> hmmm. >>>> >>>> follow-ups. >>>> >>>> 1. what is your query that fails? >>>> >>>> 2. can you add a "limit 1" to the end of your query and tell us if that >>>> works? this'll tell us if it's column or row bound. >>>> >>>> 3. bonus points. run these in local mode: >>>> > set hive.exec.compress.output=true; >>>> > set mapred.output.compression.type=BLOCK; >>>> > set >>>> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; >>>> > create table blah stored as ORC as select * from <your >>>> table>; #i'm curious if this'll work. >>>> > show create table blah; #send output back if previous step >>>> worked. >>>> >>>> 4. extra bonus. change ORC to SEQUENCEFILE in #3 see if that works any >>>> differently. >>>> >>>> >>>> >>>> I'm wondering if compression would have any effect on the size of the >>>> internal ArrayList the thrift server uses. >>>> >>>> >>>> >>>> On Fri, Jan 31, 2014 at 9:21 AM, David Gayou <david.ga...@kxen.com>wrote: >>>> >>>>> Ok, so here are some news : >>>>> >>>>> I tried to boost the HADOOP_HEAPSIZE to 8192, >>>>> I also setted the mapred.child.java.opts to 512M >>>>> >>>>> And it doesn't seem's to have any effect. >>>>> ------ >>>>> >>>>> I tried it using an ODBC driver => fail after few minutes. >>>>> Using a local JDBC (beeline) => running forever without any error. >>>>> >>>>> Both through hiveserver 2 >>>>> >>>>> If i use the local mode : it works! (but that not really what i >>>>> need, as i don't really how to access it with my software) >>>>> >>>>> ------ >>>>> I use a text file as storage. >>>>> I tried to use ORC, but i can't populate it with a load data (it >>>>> return an error of file format). >>>>> >>>>> Using an "ALTER TABLE orange_large_train_3 SET FILEFORMAT ORC" after >>>>> populating the table, i have a file format error on select. >>>>> >>>>> ------ >>>>> >>>>> @Edward : >>>>> >>>>> I've tried to look around on how i can change the thrift heap size but >>>>> haven't found anything. >>>>> Same thing for my client (haven't found how to change the heap size) >>>>> >>>>> My usecase is really to have the most possible columns. >>>>> >>>>> >>>>> Thanks a lot for your help >>>>> >>>>> >>>>> Regards >>>>> >>>>> David >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo < >>>>> edlinuxg...@gmail.com> wrote: >>>>> >>>>>> Ok here are the problem(s). Thrift has frame size limits, thrift has >>>>>> to buffer rows into memory. >>>>>> >>>>>> Hove thrift has a heap size, it needs to big in this case. >>>>>> >>>>>> Your client needs a big heap size as well. >>>>>> >>>>>> The way to do this query if it is possible may be turning row >>>>>> lateral, potwntially by treating it as a list, it will make queries on it >>>>>> awkward. >>>>>> >>>>>> Good luck >>>>>> >>>>>> >>>>>> On Thursday, January 30, 2014, Stephen Sprague <sprag...@gmail.com> >>>>>> wrote: >>>>>> > oh. thinking some more about this i forgot to ask some other basic >>>>>> questions. >>>>>> > >>>>>> > a) what storage format are you using for the table (text, sequence, >>>>>> rcfile, orc or custom)? "show create table <table>" would yield that. >>>>>> > >>>>>> > b) what command is causing the stack trace? >>>>>> > >>>>>> > my thinking here is rcfile and orc are column based (i think) and >>>>>> if you don't select all the columns that could very well limit the size >>>>>> of >>>>>> the "row" being returned and hence the size of the internal ArrayList. >>>>>> OTOH, if you're using "select *", um, you have my sympathies. :) >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague < >>>>>> sprag...@gmail.com> wrote: >>>>>> > >>>>>> > thanks for the information. Up-to-date hive. Cluster on the >>>>>> smallish side. And, well, sure looks like a memory issue. :) rather than >>>>>> an inherent hive limitation that is. >>>>>> > >>>>>> > So. I can only speak as a user (ie. not a hive developer) but what >>>>>> i'd be interested in knowing next is is this via running hive in local >>>>>> mode, correct? (eg. not through hiveserver1/2). And it looks like it >>>>>> boinks on array processing which i assume to be internal code arrays and >>>>>> not hive data arrays - your 15K columns are all scalar/simple types, >>>>>> correct? Its clearly fetching results and looks be trying to store them >>>>>> in >>>>>> a java array - and not just one row but a *set* of rows (ArrayList) >>>>>> > >>>>>> > two things to try. >>>>>> > >>>>>> > 1. boost the heap-size. try 8192. And I don't know if >>>>>> HADOOP_HEAPSIZE is the controller of that. I woulda hoped it was called >>>>>> something like "HIVE_HEAPSIZE". :) Anyway, can't hurt to try. >>>>>> > >>>>>> > 2. trim down the number of columns and see where the breaking point >>>>>> is. is it 10K? is it 5K? The idea is to confirm its _the number of >>>>>> columns_ that is causing the memory to blow and not some other artifact >>>>>> unbeknownst to us. >>>>>> > >>>>>> > 3. Google around the Hive namespace for something that might limit >>>>>> or otherwise control the number of rows stored at once in Hive's internal >>>>>> buffer. I snoop around too. >>>>>> > >>>>>> > >>>>>> > That's all i got for now and maybe we'll get lucky and someone on >>>>>> this list will know something or another about this. :) >>>>>> > >>>>>> > cheers, >>>>>> > Stephen. >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Thu, Jan 30, 2014 at 2:32 AM, David Gayou <david.ga...@kxen.com> >>>>>> wrote: >>>>>> > >>>>>> > We are using the Hive 0.12.0, but it doesn't work better on hive >>>>>> 0.11.0 or hive 0.10.0 >>>>>> > Our hadoop version is 1.1.2. >>>>>> > Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU (with >>>>>> hyperthreading so 4 cores per machine) + 16Gb Ram each >>>>>> > >>>>>> > The error message i get is : >>>>>> > >>>>>> > 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction >>>>>> (ProcessFunction.java:process(41)) - Internal error processing >>>>>> FetchResults >>>>>> > java.lang.OutOfMemoryError: Java heap space >>>>>> > at java.util.Arrays.copyOf(Arrays.java:2734) >>>>>> > at java.util.ArrayList.ensureCapacity(ArrayList.java:167) >>>>>> > at java.util.ArrayList.add(ArrayList.java:351) >>>>>> > at org.apache.hive.service.cli.Row.<init>(Row.java:47) >>>>>> > at org.apache.hive.service.cli.RowSet.addRow(RowSet.java:61) >>>>>> > at >>>>>> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235) >>>>>> > at >>>>>> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170) >>>>>> > at >>>>>> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417) >>>>>> > at >>>>>> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306) >>>>>> > at >>>>>> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386) >>>>>> > at >>>>>> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) >>>>>> > at >>>>>> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358) >>>>>> > at >>>>>> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) >>>>>> > at >>>>>> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) >>>>>> > at >>>>>> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) >>>>>> > at >>>>>> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) >>>>>> > at java.security.AccessCont >>>>>> >>>>>> -- >>>>>> Sorry this was sent from mobile. Will do less grammar and spell check >>>>>> than usual. >>>>>> >>>>> >>>>> >>>> >>> >> >