hi,all; I want to write file to hdfs over thrift . If the file is gzip or tar file , after uploading the files,i find the file size changes and can not tar xvzf/xvf anymore . For normal plain text file ,it works well . [hadoop@HOST s_cripts]$ echo $LANG en_US.UTF-8 [hadoop@HOST s_cripts]$ [hadoop@HOST s_cripts]$ jps 25868 TaskTracker 9116 Jps 25928 HadoopThriftServer #the thrift server 25749 JobTracker 25655 SecondaryNameNode 25375 NameNode 25495 DataNode [hadoop@HOST s_cripts]$ [hadoop@HOST s_cripts]$ pwd /home/hadoop/hadoop/src/contrib/thriftfs/s_cripts [hadoop@HOST s_cripts]$ hadoop fs -ls log/ff.tar.gz ls: Cannot access log/ff.tar.gz: No such file or directory. [hadoop@HOST s_cripts]$ python hdfs.py hdfs>> put ./my.tar.gz log/ff.tar.gz <thrift.protocol.TBinaryProtocol.TBinaryProtocol instance at 0x2348e60> in writeString :688 upload over:688 hdfs>> quit [hadoop@HOST s_cripts]$ hadoop fs -ls log/ff.tar.gz Found 1 items -rw-r--r-- 1 hadoop supergroup 1253 2012-10-25 08:57 /user/hadoop/log/ff.tar.gz #notice the size here is 1253 [hadoop@HOST s_cripts]$ ls -l my.tar.gz -rw-rw-r-- 1 hadoop hadoop 688 Oct 24 14:43 my.tar.gz #notice the size here is 688 [hadoop@HOST s_cripts]$ file my.tar.gz my.tar.gz: gzip compressed data, from Unix, last modified: Wed Oct 24 14:43:29 2012 #the file format [hadoop@HOST s_cripts]$ hadoop fs -get log/ff.tar.gz . [hadoop@HOST s_cripts]$ file ff.tar.gz ff.tar.gz: data #the file format [hadoop@HOST s_cripts]$ tar xvzf ff.tar.gz gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now [hadoop@HOST s_cripts]$ [hadoop@HOST s_cripts]$ head -2 my.tar.gz |xxd 0000000: 1f8b 0800 118e 8750 0003 ed99 4d53 db30 .......P....MS.0 0000010: 1086 732d bf42 070e 7040 966c c78e 7da3 ..s-.B..p@.l..}. 0000020: 4006 2ec0 8c69 7be8 7418 c551 1c37 b2e4 @....i{.t..Q.7.. [hadoop@HOST s_cripts]$ head -2 ff.tar.gz |xxd 0000000: 1fef bfbd 0800 11ef bfbd efbf bd50 0003 .............P.. 0000010: efbf bd4d 53ef bfbd 3010 efbf bd73 2def ...MS...0....s-. 0000020: bfbd 4207 0e70 40ef bfbd 6cc7 8e7d efbf ..B..p@...l..}.. 0000030: bd40 062e efbf bdef bfbd 697b efbf bd74 .@........i{...t 0000040: 18ef bfbd 511c 37ef bfbd efbf bd65 0aef ....Q.7......e.. thrift server and hdfs.py client on the same box(HOST) . If i use hadoop shell cmd to put/get the files,everything goes ok . It seems that thrift client write in binnary mode to thrift server,but the thrift server write the data encoded in other charset to hdfs files . Why the uploaded files changes ? thanks a lot !