MapReduce checksum error!

2020-03-16 Thread Daegyu Han
Hi, I have problem to run mapreduce grep application I upload text file but I got this error message. I googled several times, but I can't solve this problem. What is the main cause of this problem? Error: org.apache.hadoop.fs.ChecksumException: Checksum error: /10GB_1/file99 at 282861568 exp: 1

Is shortcircuit-read (SCR) really fast?

2019-08-29 Thread Daegyu Han
Hi all, Is ShortCircuit read faster than legacy read which goes through data nodes? I have evaluated SCR and legacy local read on both HDD and NVMe SSD. However, I have not seen any results that SCR is faster than legacy. Rather, SCR was slower than legacy when using NVMe SSDs because of the op

What is the best way to analyze io latency in hdfs?

2019-08-19 Thread Daegyu Han
Hi all, I'm currently studying HDFS, and I want to analyze HDFS io latency. I know that C / C ++ programs can use perf and ftrace under Linux to analyze user level and kernel level latency measurements and overhead. I would like to analyze the read io latency in HDFS to user level (HDFS) and sys

Re: What do you think about HDFS using GFS2 (shared disk file system) or GPFS (parallel filesystem) rather than local file system?

2019-08-18 Thread Daegyu Han
the Hadoop platform are optimized to provide high > performance by distributing work across a cluster that can utilize data > locality and fast local I/O.* > > On Sat, Aug 17, 2019 at 2:12 AM Daegyu Han wrote: > >> Hi all, >> >> As far as I know, HDFS is designed to tar

What do you think about HDFS using GFS2 (shared disk file system) or GPFS (parallel filesystem) rather than local file system?

2019-08-17 Thread Daegyu Han
Hi all, As far as I know, HDFS is designed to target local file systems like ext4 or xfs. Is it a bad approach to use SAN technology as storage for HDFS? Thank you, Daegyu ᐧ

Why does MapReduce read and process the input split line by line? Can not I upload the entire file to memory and process it?

2019-07-06 Thread Daegyu Han
Hi all, Why does MapReduce handle input split files one line at a time? Or is it not fast enough to read all the input split (ex: 128MB) and put the data in memory and process the data (string) in memory? If you are processing one row at a time, as in the current approach, is not the Java applic

How to solve Unexpected EOS from the reader?

2019-06-27 Thread Daegyu Han
Hi all, I have tried using TestDFSIO -read, but I have an error and can not resolve it. The environment I have experimented with is as follows. Four data nodes and one name node, and a block size of 512MB, and I wrote 128 files by using TestDFSIO. I tried to adjust various parameters, but the pr

Re: Does HDFS read blocks simultaneously in multi-threaded way?

2019-06-26 Thread Daegyu Han
t a multi-threaded block > reader in theory. > > > > On Jun 26, 2019, at 5:05 AM, Daegyu Han wrote: > > > > Hi all, > > > > Assuming HDFS has a 1GB file input.dat and a block size of 128MB. > > > > Can the user read multithreaded when reading the in

Does HDFS read blocks simultaneously in multi-threaded way?

2019-06-26 Thread Daegyu Han
Hi all, Assuming HDFS has a 1GB file input.dat and a block size of 128MB. Can the user read multithreaded when reading the input.dat file? In other words, is not the block being read sequentially, but reading multiple blocks at the same time? If not, is it difficult to implement a multi-threade

Re: NVMe Over fabric performance on HDFS

2019-06-25 Thread Daegyu Han
see what we can do. > > > > Thanks > Anu > > > On Tue, Jun 25, 2019 at 7:20 AM Daegyu Han wrote: >> >> Hi all, >> >> I am using storage disaggregation by mounting nvme ssds on the storage node. >> >> When we connect the compute node and the s

NVMe Over fabric performance on HDFS

2019-06-25 Thread Daegyu Han
Hi all, I am using storage disaggregation by mounting nvme ssds on the storage node. When we connect the compute node and the storage node with nvme over fabric (nvmeof) and test it, performance is much lower than that of local storage (DAS). In general, we know that applications need to increas

How to run libhdfs (C interface to HDFS) ?

2019-05-31 Thread Daegyu Han
Hi all, I tried to run hdfs c++ file in hdfs and tried the followed command. gcc above_sample.c -I$HADOOP_HDFS_HOME/include -L$HADOOP_HDFS_HOME/lib/native -lhdfs -o above_sample which is at https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/LibHdfs.html But I got the error m

Re: How to run Test class for HDFS?

2019-05-31 Thread Daegyu Han
Thanks, it worked. Best Regards, Daegyu 2019년 5월 31일 (금) 오후 3:44, Ayush Saxena 님이 작성: > > Try giving the name as org.apache.hadoop.hdfs.TestWriteRead > With package name > > > On 31-May-2019, at 11:16 AM, Daegyu Han wrote: > > > > Hi all, > > > > I wa

How to run Test class for HDFS?

2019-05-30 Thread Daegyu Han
dfs-tests.jar ?? Best Regards, Daegyu Han - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org