Re: Question on sqoop parquet import support

Ron Gonzalez Wed, 22 Jul 2015 19:09:05 -0700

Sure let me try and figure out how to reproduce the stack. There's really no 
stack trace. What happens is that the amount of memory being used is greater 
than 2 GB and Yarn kills it at that point. Could it be that there's some memory 
not being released properly that the garbage collector can't get at?


I guess my other question is how much data have you guys loaded using this 
feature and the configuration values you used to make this work in your 
performance testing environment. I can use those settings and see what I get...

Thanks,
Ron

Sent from my iPhone

> On Jul 22, 2015, at 3:33 PM, Abraham Elmahrek <[email protected]> wrote:
> 
> Hey man,
> 
> Is there a stack trace or core dump you could provide? You might be right, 
> but there's no way for us to validate that.
> 
> The problem of compacting several files is definitely an issue. This is the 
> topic of https://issues.apache.org/jira/browse/SQOOP-1094. It's a great idea 
> to add Parquet support as well I feel.
> 
> -Abe
> 
>> On Tue, Jul 21, 2015 at 11:19 PM, Ron Gonzalez <[email protected]> wrote:
>> Hi,
>>   Quick question on the parquet support for sqoop import.
>>   I am finding that while trying to load a million row table, I can never 
>> get the map-reduce job to complete because the containers keep getting 
>> killed. I have already set the container size to be 2 GB and also changed 
>> the mapreduce java opts to be -Xmx2048m.
>>   Is there some configuration that I can set to address this?
>>   I believe the problem is due to the fact that for a parquet file with a 
>> lot of rows, then we have to keep the column data into memory before we can 
>> flush it to file so the larger the number of rows, the larger the amount of 
>> memory required before we can flush it to file. I'm open to creating smaller 
>> parquet files but I'm going to end up with a lot of parquet files in the 
>> process.
>>   Any suggestions?
>> 
>> Thanks,
>> Ron
>

Re: Question on sqoop parquet import support

Reply via email to