Hi Abe,
I was able to run sqoop from the command line using LocalJobRunner so I can avoid having to deal with the container memory issues. I will try and run it overnight and see if it can complete if it has access to my entire memory. Since I'm now running it as a local process, I should be able to do whatever kind of stuff you would need in order to understand what's going on. I am also able to debug it in Eclipse so I'm going to see if I can figure it out myself as well, and report back my findings...

Thanks,
Ron

On 07/22/2015 03:33 PM, Abraham Elmahrek wrote:
Hey man,

Is there a stack trace or core dump you could provide? You might be right, but there's no way for us to validate that.

The problem of compacting several files is definitely an issue. This is the topic of https://issues.apache.org/jira/browse/SQOOP-1094. It's a great idea to add Parquet support as well I feel.

-Abe

On Tue, Jul 21, 2015 at 11:19 PM, Ron Gonzalez <[email protected] <mailto:[email protected]>> wrote:

    Hi,
      Quick question on the parquet support for sqoop import.
      I am finding that while trying to load a million row table, I
    can never get the map-reduce job to complete because the
    containers keep getting killed. I have already set the container
    size to be 2 GB and also changed the mapreduce java opts to be
    -Xmx2048m.
      Is there some configuration that I can set to address this?
      I believe the problem is due to the fact that for a parquet file
    with a lot of rows, then we have to keep the column data into
    memory before we can flush it to file so the larger the number of
    rows, the larger the amount of memory required before we can flush
    it to file. I'm open to creating smaller parquet files but I'm
    going to end up with a lot of parquet files in the process.
      Any suggestions?

    Thanks,
    Ron



Reply via email to