Thanks Hanifi, do you think the user list or the Jira will be the best
place to track my questions?  I have quite a few, and I think I am going to
try to put some "typical" user restrictions on how I can handle the data.
I.e. How would a user who may not know string manipulation split those
lists?  Yes, I could write python or spark job, but what if this is
generated data, this could cause some very interesting conversations with
Users.  The reason I say this is I don't want to appear to be ignoring
advice or asking inane questions, but instead I am trying to figure out how
we can do a few things

1. Keep troubleshooting in Drill if at all possible
2. Increase effectiveness of error messages.
3. Increase the number of use cases for Drill
4. Make the user experience for Drill outstanding.

Please bear with my questions with that in mind,  So, to my first question,
keep on the list here (where many users may be able read/see/learn from) or
should we have the back and forth be in the JIRA?

Thanks again,

John

On Mon, Feb 8, 2016 at 1:05 PM, Hanifi Gunes <hgu...@maprtech.com> wrote:

> Thanks for the feedback. Yep my answer seems very much dev focused than
> user.
>
> The error is manifestation of extremely wide columns in your dataset. I
> would recommend splitting the list if that's an option.
>
> Assuming the problem column is a list of integers as below
>
> {
> "wide": [1,2,.....N]
> }
>
> after splitting it should look like
>
> {
> "wide0": [1,2,.....X],
> "wide1": [Y,.......Z]
> ...
> "wideN": [T,.......N]
> }
>
> Sounds like good idea to enhance the error reporting with file & column
> name. Filed [1] to track this.
>
> Thanks.
>
> 1: https://issues.apache.org/jira/browse/DRILL-4371
>
>
> On Fri, Feb 5, 2016 at 6:28 PM, John Omernik <j...@omernik.com> wrote:
>
> > Excuse my basic questions, when you say we are you reference Drill
> coders?
> > So what is Integer.MAX_VALUE bytes? Is that a query time setting?
> Drillbit
> > setting? Is it editable?  How does that value get interpreted for complex
> > data types (objects and arrays).
> >
> > Not only would the column be helpful, but the source file as well.  ( if
> > this is an individual record issue....or is this a cumulative error where
> > the max size of the sum of the length of multiple records of a column is
> at
> > issue).
> >
> >
> > Thoughts on how as a user I could address this in my dataset?
> >
> > Thanks!
> >
> > On Friday, February 5, 2016, Hanifi Gunes <hgu...@maprtech.com> wrote:
> >
> > > You see this exception because one of the columns in your dataset is
> > larger
> > > than an individual DrillBuf could store. The hard limit
> > > is Integer.MAX_VALUE bytes. Around the time we are trying to expand one
> > of
> > > the buffers, we notice the allocation request is oversized and fail the
> > > query. It would be nice if error message contained the column that
> raised
> > > this issue though.
> > >
> > > On Fri, Feb 5, 2016 at 1:39 PM, John Omernik <j...@omernik.com
> > > <javascript:;>> wrote:
> > >
> > > > Any thoughts on how to troubleshoot this (I have some fat json data
> > going
> > > > into the buffers apparently) It's not huge data, just wide/complex
> > (total
> > > > size is 1.4 GB)  Any thoughts on how to troubleshoot or settings I
> can
> > > use
> > > > to work through these errors?
> > > >
> > > >
> > > > Thanks!
> > > >
> > > >
> > > > John
> > > >
> > > >
> > > >
> > > >
> > > > Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand
> the
> > > > buffer. Max allowed buffer size is reached.
> > > >
> > > >
> > > >
> > > > Fragment 1:11
> > > >
> > > >
> > > >
> > > > [Error Id: db21dea0-ddd7-4fcf-9fea-b5031e358dad on node1
> > > >
> > > >
> > > >
> > > >   (org.apache.drill.exec.exception.OversizedAllocationException)
> Unable
> > > to
> > > > expand the buffer. Max allowed buffer size is reached.
> > > >
> > > >     org.apache.drill.exec.vector.UInt1Vector.reAlloc():214
> > > >
> > > >
> >  org.apache.drill.exec.vector.UInt1Vector$Mutator.setValueCount():469
> > > >
> > > >
> > > >
> > >
> >
> org.apache.drill.exec.vector.complex.ListVector$Mutator.setValueCount():324
> > > >
> > > >     org.apache.drill.exec.physical.impl.ScanBatch.next():247
> > > >
> > > >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
> > > >
> > > >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
> > > >
> > > >
> >  org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> > > >
> > > >
> > > >
> > > >
> > >
> >
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
> > > >
> > > >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> > > >
> > > >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
> > > >
> > > >
> > > >
> > > >
> > >
> >
> org.apache.drill.exec.test.generated.StreamingAggregatorGen1931.doWork():172
> > > >
> > > >
> > > >
> > > >
> > >
> >
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():167
> > > >
> > > >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> > > >
> > > >     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> > > >
> > > >
> > > >
> > > >
> > >
> >
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
> > > >
> > > >     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> > > >
> > > >     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
> > > >
> > > >     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
> > > >
> > > >     java.security.AccessController.doPrivileged():-2
> > > >
> > > >     javax.security.auth.Subject.doAs():415
> > > >
> > > >     org.apache.hadoop.security.UserGroupInformation.doAs():1595
> > > >
> > > >     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
> > > >
> > > >     org.apache.drill.common.SelfCleaningRunnable.run():38
> > > >
> > > >     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> > > >
> > > >     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> > > >
> > > >     java.lang.Thread.run():745 (state=,code=0)
> > > >
> > >
> >
> >
> > --
> > Sent from my iThing
> >
>

Reply via email to