Guys,

I've read that increasing above (default 4kb) number to, say 128kb, might speed 
things up.

My input is 40mln serialised records coming from RDMS and I noticed that with 
increased IO my job actually runs a tiny bit slower. Is that possible?

p.s. got two questions:
1. During Sqoop import I see that two additional files are generated in the 
HDFS folder, namely
.../_log/history/...conf.xml
.../_log/history/...sqoop_generated_class.jar
Is there a way to redirect these files to a different directory? I cannot find 
an answer.

2. I run multiple reducers and each generate each own output. If I was to merge 
all the output, will running either of the below commands be recommended?

hadoop dfs -getmerge <output/*> <localdst>
or
hadoop dfs -cat output/* > output_All
hadoop dfs -get output_All <localdst>

Thanks,
AK


NOTICE: This e-mail message and any attachments are confidential, subject to 
copyright and may be privileged. Any unauthorized use, copying or disclosure is 
prohibited. If you are not the intended recipient, please delete and contact 
the sender immediately. Please consider the environment before printing this 
e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont 
confidentiels, protégés par le droit d'auteur et peuvent être couverts par le 
secret professionnel. Toute utilisation, copie ou divulgation non autorisée est 
interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, 
supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à 
l'environnement avant d'imprimer le présent courriel

Reply via email to