Hi,
  Quick question on the parquet support for sqoop import.
I am finding that while trying to load a million row table, I can never get the map-reduce job to complete because the containers keep getting killed. I have already set the container size to be 2 GB and also changed the mapreduce java opts to be -Xmx2048m.
  Is there some configuration that I can set to address this?
I believe the problem is due to the fact that for a parquet file with a lot of rows, then we have to keep the column data into memory before we can flush it to file so the larger the number of rows, the larger the amount of memory required before we can flush it to file. I'm open to creating smaller parquet files but I'm going to end up with a lot of parquet files in the process.
  Any suggestions?

Thanks,
Ron

Reply via email to