Sure let me try and figure out how to reproduce the stack. There's really no stack trace. What happens is that the amount of memory being used is greater than 2 GB and Yarn kills it at that point. Could it be that there's some memory not being released properly that the garbage collector can't get at?
I guess my other question is how much data have you guys loaded using this feature and the configuration values you used to make this work in your performance testing environment. I can use those settings and see what I get... Thanks, Ron Sent from my iPhone > On Jul 22, 2015, at 3:33 PM, Abraham Elmahrek <[email protected]> wrote: > > Hey man, > > Is there a stack trace or core dump you could provide? You might be right, > but there's no way for us to validate that. > > The problem of compacting several files is definitely an issue. This is the > topic of https://issues.apache.org/jira/browse/SQOOP-1094. It's a great idea > to add Parquet support as well I feel. > > -Abe > >> On Tue, Jul 21, 2015 at 11:19 PM, Ron Gonzalez <[email protected]> wrote: >> Hi, >> Quick question on the parquet support for sqoop import. >> I am finding that while trying to load a million row table, I can never >> get the map-reduce job to complete because the containers keep getting >> killed. I have already set the container size to be 2 GB and also changed >> the mapreduce java opts to be -Xmx2048m. >> Is there some configuration that I can set to address this? >> I believe the problem is due to the fact that for a parquet file with a >> lot of rows, then we have to keep the column data into memory before we can >> flush it to file so the larger the number of rows, the larger the amount of >> memory required before we can flush it to file. I'm open to creating smaller >> parquet files but I'm going to end up with a lot of parquet files in the >> process. >> Any suggestions? >> >> Thanks, >> Ron >
