Re: multinode hadoop cluster on vmware

2013-12-19 Thread Hiran Chaudhuri
If I understand correctly, Hadoop or BigData applicatins are highly I/O bound. So for performant processing you would definitely prefer many little physical machines. Apart from that setting up Hadoop on VMWare should bring no questions that would not occur on physical installations. Depending

Re: Authentication issue on Hadoop 2.2.0

2013-12-19 Thread Silvina Caíno Lores
And that it is caused by this exception, as I've found out 2013-12-19 13:14:28,237 ERROR [pipe-uplink-handler] org.apache.hadoop.mapred.pipes.BinaryProtocol: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at

Re: Why other process can't see the change after calling hdfsHFlush unless hdfsCloseFile is called?

2013-12-19 Thread Xiaobin She
sorry to reply to my own thread. Does anyone know the answer to this question? If so, can you please tell me if my understanding is right or wrong? thanks. 2013/12/17 Xiaobin She xiaobin...@gmail.com hi, I'm using libhdfs to deal with hdfs in an c++ programme. And I have encountered an

Re: Why other process can't see the change after calling hdfsHFlush unless hdfsCloseFile is called?

2013-12-19 Thread Devin Suiter RDX
Hello, In my experience with Flume, watching the HDFS Sink verbose output, I know that even after a file has flushed, but is still open, it reads as a 0-byte file, even if there is actually data contained in the file. A HDFS file is a meta-location that can accept streaming input for as long as

RE: HDFS short-circuit reads

2013-12-19 Thread John Lilley
Ah, I see - thanks for clarifying. john From: Chris Nauroth [mailto:cnaur...@hortonworks.com] Sent: Tuesday, December 17, 2013 4:32 PM To: user@hadoop.apache.org Subject: Re: HDFS short-circuit reads Both of these methods return the same underlying data type that you're ultimately interested

Hadoop 2.2 Psuedo- Distributed Mode

2013-12-19 Thread S.L
Hi folks, I am unning Haddop 2.2 in pseudo-distributed mode and I have a i5 processor with 8 GB RAM running on CentOS 6.4 . However my Nutch job fails after a few minutes into execution withan OOM exception , I have increased the HADOOP_HEAPSIZE from the 1000MB to 4GB, but I still face the issue.

hadoop streaming troubles

2013-12-19 Thread Pavel Hančar
Hello, I am using hadoop-0.20.2-cdh3u1. First question: can one ommit sorting in streaming (e.g. when one only sums numbers)? Second question: Why do I have to run my jobs from empty current working directory? When I run it from my home, I get this: 13/12/19 16:22:40 ERROR

Re: Why other process can't see the change after calling hdfsHFlush unless hdfsCloseFile is called?

2013-12-19 Thread Xiaobin She
To Devin, thank you very much for your explanation. I do found that I can read the data out of the file even if I did not close the file I'm writing to ( the read operation is call on another file handler opened on the same file but still in the same process ), which make me more confuse at that

from relational to bigger data

2013-12-19 Thread Jay Vee
We have a large relational database ( ~ 500 GB, hundreds of tables ). We have summary tables that we rebuild from scratch each night that takes about 10 hours. From these summary tables, we have a web interface that accesses the summary tables to build reports. There is a business reason for

Re: from relational to bigger data

2013-12-19 Thread Chris Embree
In big data terms, 500G isn't big. But, moving that much data around every night is not trivial either. I'm going to guess at a lot here, but at a very high level. 1. Sqoop the data required to build the summary tables into Hadoop. 2. Crunch the summaries into new tables (really just files on

Re: from relational to bigger data

2013-12-19 Thread Vinay Bagare
I would also look at current setup. I agree with Chris that 500 GB is fairly insignificant. Best, Vinay Bagare On Dec 19, 2013, at 12:51 PM, Chris Embree cemb...@gmail.com wrote: In big data terms, 500G isn't big. But, moving that much data around every night is not trivial either. I'm

Re: Why other process can't see the change after calling hdfsHFlush unless hdfsCloseFile is called?

2013-12-19 Thread Peyman Mohajerian
Ok i just read the book section on this (Definite Guide to Hadoop), just to be sure, length of a file is stored in Name Node, and its updated only after client calls Name Node after close of the file. At that point if Name Node has received all the ACK from Data Nodes then it will set the length

LOGGING in MapReduce

2013-12-19 Thread unmesha sreeveni
I want to log my System.out.println() in console How to do that. I did the below code but it is not displaying any thing. I am using mapred api the old one. Did i do anything wrong? Code package tech; import java.io.BufferedWriter;

Re: Why other process can't see the change after calling hdfsHFlush unless hdfsCloseFile is called?

2013-12-19 Thread Xiaobin She
To Peyman, thank you for your reply. So the property of the file is stored in namenode, and it will not be updated until the file is closed. But isn't this will cause some problem ? For example, 1. process A open the file in write mode, wirte 1MB data, flush the data, and hold the file handler