You need to be more clear about how do you process the files. I think the important question is what kind of InputFormat and OutputFormat you are using in your case. If you are using the default one, on Linux, I believe the TextInputFormat and TextOutputFormat will both convert bytes array to text using UTF-8 encoding. So if your source data is UTF-8, then your output should be fine. To help you in this case, you need to figure out following: 1) What kind InputFormat/OutputFormat you are using?2) How do you write the data output? Using Reducer Context.write to output, or you write to HDFS directly in your code?3) What encoding is your source data? Yong
Subject: Localization feature Date: Fri, 24 Jan 2014 09:54:15 +0530 From: khale...@suntecgroup.com To: user@hadoop.apache.org Hi All, Does Hadoop/MapReduce have localization feature? There is a scenario wherein we have to process files containing Dutch, German characters. When we process files containing a character like 'Ç', the character gets replaced by '�' in the output. Is there any possible work around for this? Thanks in advance, Khaleel