Guys, I've read that increasing above (default 4kb) number to, say 128kb, might speed things up.
My input is 40mln serialised records coming from RDMS and I noticed that with increased IO my job actually runs a tiny bit slower. Is that possible? p.s. got two questions: 1. During Sqoop import I see that two additional files are generated in the HDFS folder, namely .../_log/history/...conf.xml .../_log/history/...sqoop_generated_class.jar Is there a way to redirect these files to a different directory? I cannot find an answer. 2. I run multiple reducers and each generate each own output. If I was to merge all the output, will running either of the below commands be recommended? hadoop dfs -getmerge <output/*> <localdst> or hadoop dfs -cat output/* > output_All hadoop dfs -get output_All <localdst> Thanks, AK NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel