Question on sqoop parquet import support

Ron Gonzalez Tue, 21 Jul 2015 23:25:28 -0700

Hi,
  Quick question on the parquet support for sqoop import.

I am finding that while trying to load a million row table, I cannever get the map-reduce job to complete because the containers keepgetting killed. I have already set the container size to be 2 GB andalso changed the mapreduce java opts to be -Xmx2048m.

  Is there some configuration that I can set to address this?

I believe the problem is due to the fact that for a parquet file witha lot of rows, then we have to keep the column data into memory beforewe can flush it to file so the larger the number of rows, the larger theamount of memory required before we can flush it to file. I'm open tocreating smaller parquet files but I'm going to end up with a lot ofparquet files in the process.

  Any suggestions?


Thanks,
Ron

Question on sqoop parquet import support

Reply via email to