Re: Question on sqoop parquet import support

Abraham Elmahrek Wed, 29 Jul 2015 18:26:36 -0700

Hey Ron,

Thanks for volunteering to help trouble shoot this. I actually lost track
of this, but could you file a bug at issues.apache.org/jira under the sqoop
project describing what you're seeing? I think you may have found a
legitimate load bug with the new parquet feature.


-Abe

On Wed, Jul 29, 2015 at 6:06 PM, Ron Gonzalez <[email protected]> wrote:

>  Hi Abe,
>   I was able to run sqoop from the command line using LocalJobRunner so I
> can avoid having to deal with the container memory issues.
>   I will try and run it overnight and see if it can complete if it has
> access to my entire memory.
>   Since I'm now running it as a local process, I should be able to do
> whatever kind of stuff you would need in order to understand what's going
> on.
>   I am also able to debug it in Eclipse so I'm going to see if I can
> figure it out myself as well, and report back my findings...
>
> Thanks,
> Ron
>
> On 07/22/2015 03:33 PM, Abraham Elmahrek wrote:
>
> Hey man,
>
>  Is there a stack trace or core dump you could provide? You might be
> right, but there's no way for us to validate that.
>
>  The problem of compacting several files is definitely an issue. This is
> the topic of https://issues.apache.org/jira/browse/SQOOP-1094. It's a
> great idea to add Parquet support as well I feel.
>
>  -Abe
>
> On Tue, Jul 21, 2015 at 11:19 PM, Ron Gonzalez <[email protected]>
> wrote:
>
>> Hi,
>>   Quick question on the parquet support for sqoop import.
>>   I am finding that while trying to load a million row table, I can never
>> get the map-reduce job to complete because the containers keep getting
>> killed. I have already set the container size to be 2 GB and also changed
>> the mapreduce java opts to be -Xmx2048m.
>>   Is there some configuration that I can set to address this?
>>   I believe the problem is due to the fact that for a parquet file with a
>> lot of rows, then we have to keep the column data into memory before we can
>> flush it to file so the larger the number of rows, the larger the amount of
>> memory required before we can flush it to file. I'm open to creating
>> smaller parquet files but I'm going to end up with a lot of parquet files
>> in the process.
>>   Any suggestions?
>>
>> Thanks,
>> Ron
>>
>
>
>

Re: Question on sqoop parquet import support

Reply via email to