Hi,
Quick question on the parquet support for sqoop import.
I am finding that while trying to load a million row table, I can
never get the map-reduce job to complete because the containers keep
getting killed. I have already set the container size to be 2 GB and
also changed the mapreduce java opts to be -Xmx2048m.
Is there some configuration that I can set to address this?
I believe the problem is due to the fact that for a parquet file with
a lot of rows, then we have to keep the column data into memory before
we can flush it to file so the larger the number of rows, the larger the
amount of memory required before we can flush it to file. I'm open to
creating smaller parquet files but I'm going to end up with a lot of
parquet files in the process.
Any suggestions?
Thanks,
Ron