Re: Issue with Hive and table with lots of column

David Gayou Fri, 31 Jan 2014 09:22:46 -0800

Ok, so here are some news :

I tried to boost the HADOOP_HEAPSIZE to 8192,
I also setted the mapred.child.java.opts to 512M


And it doesn't seem's to have any effect.
 ------

I tried it using an ODBC driver => fail after few minutes.
Using a local JDBC (beeline) => running forever without any error.

Both through hiveserver 2

If i use the local mode : it works!   (but that not really what i need, as
i don't really how to access it with my software)

------
I use a text file as storage.
I tried to use ORC, but i can't populate it with a load data  (it return an
error of file format).

Using an "ALTER TABLE orange_large_train_3 SET FILEFORMAT ORC" after
populating the table, i have a file format error on select.

------

@Edward :

I've tried to look around on how i can change the thrift heap size but
haven't found anything.
Same thing for my client (haven't found how to change the heap size)

My usecase is really to have the most possible columns.


Thanks a lot for your help


Regards

David





On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

> Ok here are the problem(s). Thrift has frame size limits, thrift has to
> buffer rows into memory.
>
> Hove thrift has a heap size, it needs to big in this case.
>
> Your client needs a big heap size as well.
>
> The way to do this query if it is possible may be turning row lateral,
> potwntially by treating it as a list, it will make queries on it awkward.
>
> Good luck
>
>
> On Thursday, January 30, 2014, Stephen Sprague <sprag...@gmail.com> wrote:
> > oh. thinking some more about this i forgot to ask some other basic
> questions.
> >
> > a) what storage format are you using for the table (text, sequence,
> rcfile, orc or custom)?   "show create table <table>" would yield that.
> >
> > b) what command is causing the stack trace?
> >
> > my thinking here is rcfile and orc are column based (i think) and if you
> don't select all the columns that could very well limit the size of the
> "row" being returned and hence the size of the internal ArrayList.  OTOH,
> if you're using "select *", um, you have my sympathies. :)
> >
> >
> >
> >
> > On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague <sprag...@gmail.com>
> wrote:
> >
> > thanks for the information. Up-to-date hive. Cluster on the smallish
> side. And, well, sure looks like a memory issue. :)  rather than an
> inherent hive limitation that is.
> >
> > So.  I can only speak as a user (ie. not a hive developer) but what i'd
> be interested in knowing next is is this via running hive in local mode,
> correct? (eg. not through hiveserver1/2).  And it looks like it boinks on
> array processing which i assume to be internal code arrays and not hive
> data arrays - your 15K columns are all scalar/simple types, correct?  Its
> clearly fetching results and looks be trying to store them in a java array
> - and not just one row but a *set* of rows (ArrayList)
> >
> > two things to try.
> >
> > 1. boost the heap-size. try 8192. And I don't know if HADOOP_HEAPSIZE is
> the controller of that. I woulda hoped it was called something like
> "HIVE_HEAPSIZE". :)  Anyway, can't hurt to try.
> >
> > 2. trim down the number of columns and see where the breaking point is.
> is it 10K? is it 5K?   The idea is to confirm its _the number of columns_
> that is causing the memory to blow and not some other artifact unbeknownst
> to us.
> >
> > 3. Google around the Hive namespace for something that might limit or
> otherwise control the number of rows stored at once in Hive's internal
> buffer. I snoop around too.
> >
> >
> > That's all i got for now and maybe we'll get lucky and someone on this
> list will know something or another about this. :)
> >
> > cheers,
> > Stephen.
> >
> >
> >
> > On Thu, Jan 30, 2014 at 2:32 AM, David Gayou <david.ga...@kxen.com>
> wrote:
> >
> > We are using the Hive 0.12.0, but it doesn't work better on hive 0.11.0
> or hive 0.10.0
> > Our hadoop version is 1.1.2.
> > Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU (with
> hyperthreading so 4 cores per machine) + 16Gb Ram each
> >
> > The error message i get is :
> >
> > 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction
> (ProcessFunction.java:process(41)) - Internal error processing FetchResults
> > java.lang.OutOfMemoryError: Java heap space
> >         at java.util.Arrays.copyOf(Arrays.java:2734)
> >         at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> >         at java.util.ArrayList.add(ArrayList.java:351)
> >         at org.apache.hive.service.cli.Row.<init>(Row.java:47)
> >         at org.apache.hive.service.cli.RowSet.addRow(RowSet.java:61)
> >         at
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235)
> >         at
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170)
> >         at
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417)
> >         at
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306)
> >         at
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386)
> >         at
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373)
> >         at
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358)
> >         at
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> >         at
> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> >         at
> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)
> >         at
> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)
> >         at java.security.AccessCont
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Issue with Hive and table with lots of column

Reply via email to