Hey man, Is there a stack trace or core dump you could provide? You might be right, but there's no way for us to validate that.
The problem of compacting several files is definitely an issue. This is the topic of https://issues.apache.org/jira/browse/SQOOP-1094. It's a great idea to add Parquet support as well I feel. -Abe On Tue, Jul 21, 2015 at 11:19 PM, Ron Gonzalez <[email protected]> wrote: > Hi, > Quick question on the parquet support for sqoop import. > I am finding that while trying to load a million row table, I can never > get the map-reduce job to complete because the containers keep getting > killed. I have already set the container size to be 2 GB and also changed > the mapreduce java opts to be -Xmx2048m. > Is there some configuration that I can set to address this? > I believe the problem is due to the fact that for a parquet file with a > lot of rows, then we have to keep the column data into memory before we can > flush it to file so the larger the number of rows, the larger the amount of > memory required before we can flush it to file. I'm open to creating > smaller parquet files but I'm going to end up with a lot of parquet files > in the process. > Any suggestions? > > Thanks, > Ron >
