Hi Abe,
I was able to run sqoop from the command line using LocalJobRunner so
I can avoid having to deal with the container memory issues.
I will try and run it overnight and see if it can complete if it has
access to my entire memory.
Since I'm now running it as a local process, I should be able to do
whatever kind of stuff you would need in order to understand what's
going on.
I am also able to debug it in Eclipse so I'm going to see if I can
figure it out myself as well, and report back my findings...
Thanks,
Ron
On 07/22/2015 03:33 PM, Abraham Elmahrek wrote:
Hey man,
Is there a stack trace or core dump you could provide? You might be
right, but there's no way for us to validate that.
The problem of compacting several files is definitely an issue. This
is the topic of https://issues.apache.org/jira/browse/SQOOP-1094. It's
a great idea to add Parquet support as well I feel.
-Abe
On Tue, Jul 21, 2015 at 11:19 PM, Ron Gonzalez <[email protected]
<mailto:[email protected]>> wrote:
Hi,
Quick question on the parquet support for sqoop import.
I am finding that while trying to load a million row table, I
can never get the map-reduce job to complete because the
containers keep getting killed. I have already set the container
size to be 2 GB and also changed the mapreduce java opts to be
-Xmx2048m.
Is there some configuration that I can set to address this?
I believe the problem is due to the fact that for a parquet file
with a lot of rows, then we have to keep the column data into
memory before we can flush it to file so the larger the number of
rows, the larger the amount of memory required before we can flush
it to file. I'm open to creating smaller parquet files but I'm
going to end up with a lot of parquet files in the process.
Any suggestions?
Thanks,
Ron