because many of the “words” are unicode, check the next blog. http://blogs.msdn.com/b/hpctrekker/archive/2013/04/01/make-another-small-step-with-the-javascript-console-pig-in-hdinsight.aspx
From: Varsha Raveendran Sent: Sunday, March 31, 2013 11:43 PM To: user@hadoop.apache.org Subject: Word count on cluster configuration Hello! I did the setup for a cluster configuration of Hadoop. After running the word count example the output shown in the part-r-00000 file is as shown : hduser@MT2012158:/usr/local/hadoop$ head /tmp/gutenberg-output/gutenberg-output 40 2 4 ��� � � � �@�� 2 ��� � � � �@�@�� 1 ���� � � � �@�@�� 1 P�������� j l k m �������� g��������������������EXTH � j 2004-01-01d Leonardo 1 P�������� � � � � �������� ���������������������EXTH � t 1 �P�������� � � � ������������ � � � ���������EXTH � j 2004-01-01d Leonardo 1 �P�������� � � � ������������ � � � � �����EXTH � t 1 Can you please tell me why this is happening? -- -Varsha