Re: Question on sqoop parquet import support

Ron Gonzalez Wed, 29 Jul 2015 18:06:32 -0700

Hi Abe,

I was able to run sqoop from the command line using LocalJobRunner soI can avoid having to deal with the container memory issues.I will try and run it overnight and see if it can complete if it hasaccess to my entire memory.Since I'm now running it as a local process, I should be able to dowhatever kind of stuff you would need in order to understand what'sgoing on.I am also able to debug it in Eclipse so I'm going to see if I canfigure it out myself as well, and report back my findings...


Thanks,
Ron

On 07/22/2015 03:33 PM, Abraham Elmahrek wrote:

Hey man,
Is there a stack trace or core dump you could provide? You might beright, but there's no way for us to validate that.
The problem of compacting several files is definitely an issue. Thisis the topic of https://issues.apache.org/jira/browse/SQOOP-1094. It'sa great idea to add Parquet support as well I feel.
-Abe
On Tue, Jul 21, 2015 at 11:19 PM, Ron Gonzalez <[email protected]<mailto:[email protected]>> wrote:
    Hi,
      Quick question on the parquet support for sqoop import.
      I am finding that while trying to load a million row table, I
    can never get the map-reduce job to complete because the
    containers keep getting killed. I have already set the container
    size to be 2 GB and also changed the mapreduce java opts to be
    -Xmx2048m.
      Is there some configuration that I can set to address this?
      I believe the problem is due to the fact that for a parquet file
    with a lot of rows, then we have to keep the column data into
    memory before we can flush it to file so the larger the number of
    rows, the larger the amount of memory required before we can flush
    it to file. I'm open to creating smaller parquet files but I'm
    going to end up with a lot of parquet files in the process.
      Any suggestions?

    Thanks,
    Ron

Re: Question on sqoop parquet import support

Reply via email to