You need to be more clear about how do you process the files.
I think the important question is what kind of InputFormat and OutputFormat you 
are using in your case.
If you are using the default one, on Linux, I believe the TextInputFormat and 
TextOutputFormat will both convert bytes array to text using UTF-8 encoding. So 
if your source data is UTF-8, then your output should be fine.
To help you in this case, you need to figure out following:
1) What kind InputFormat/OutputFormat you are using?2) How do you write the 
data output? Using Reducer Context.write to output, or you write to HDFS 
directly in your code?3) What encoding is your source data?
Yong

Subject: Localization feature
Date: Fri, 24 Jan 2014 09:54:15 +0530
From: khale...@suntecgroup.com
To: user@hadoop.apache.org






Hi All,
 
Does Hadoop/MapReduce have localization feature?
 
There is a scenario wherein we have to process files containing Dutch, German 
characters. 
 
When we process files containing a character like 'Ç', the character gets 
replaced by '�' in the output.
 
Is there any possible work around for this?
 
 



Thanks in advance,
 

Khaleel                                           

Reply via email to