Sure let me try and figure out how to reproduce the stack. There's really no 
stack trace. What happens is that the amount of memory being used is greater 
than 2 GB and Yarn kills it at that point. Could it be that there's some memory 
not being released properly that the garbage collector can't get at?

I guess my other question is how much data have you guys loaded using this 
feature and the configuration values you used to make this work in your 
performance testing environment. I can use those settings and see what I get...

Thanks,
Ron

Sent from my iPhone

> On Jul 22, 2015, at 3:33 PM, Abraham Elmahrek <[email protected]> wrote:
> 
> Hey man,
> 
> Is there a stack trace or core dump you could provide? You might be right, 
> but there's no way for us to validate that.
> 
> The problem of compacting several files is definitely an issue. This is the 
> topic of https://issues.apache.org/jira/browse/SQOOP-1094. It's a great idea 
> to add Parquet support as well I feel.
> 
> -Abe
> 
>> On Tue, Jul 21, 2015 at 11:19 PM, Ron Gonzalez <[email protected]> wrote:
>> Hi,
>>   Quick question on the parquet support for sqoop import.
>>   I am finding that while trying to load a million row table, I can never 
>> get the map-reduce job to complete because the containers keep getting 
>> killed. I have already set the container size to be 2 GB and also changed 
>> the mapreduce java opts to be -Xmx2048m.
>>   Is there some configuration that I can set to address this?
>>   I believe the problem is due to the fact that for a parquet file with a 
>> lot of rows, then we have to keep the column data into memory before we can 
>> flush it to file so the larger the number of rows, the larger the amount of 
>> memory required before we can flush it to file. I'm open to creating smaller 
>> parquet files but I'm going to end up with a lot of parquet files in the 
>> process.
>>   Any suggestions?
>> 
>> Thanks,
>> Ron
> 

Reply via email to