Hi,
I have a dependency over mapper jobs completion time in my reducer
configure method. Where, I am building a dictionary by collecting the same
in pieces from mapper jobs. I will be using this dictionary in reduce()
method. Can some one please help me if I can put a constraint over reducer
Hello,
i'am planning to use HDFS as a DFS in a web application evenvironment.
There are two requirements: fault tolerence, which is ensured by the
replicas and load balancing. Is load balancing part of HDFS and how is
it configurable?
Thanks a lot
Jeff,
Thanks for help, I want to clarify several details:
1. I know this way to import files to HDFS, but this is connected with
direct accessing HDFS nodes by user.
Does exist another way export all data files from data server side to remote
HDFS nodes without tar invocation?
2. I've setup
If you have a fixed set of nodes in cluster and load data onto HDFS, it tries
to automatically balance the distribution across nodes by selecting random
nodes to store replicas. This has to be done with a client which is outside the
datanodes for random distribution. If you add new nodes to
Hi, All
I am deploy hadoop in my Linux computer as a single-node environment.And
everything running good.Now I want to follow the running step of hadoop job
step by step in Eclipse.please tell me how I can do that?
Thanks
when i run example wordcount i have problem like this :
Is wordcount a hadoop example? or your code?
On 8/16/08, tran thien [EMAIL PROTECTED] wrote:
hi everyone,
i am using hadoop 0.17.1.
There are 2 node : one master(also slave) and one slave.
when i run example wordcount i have problem
On 8/27/08 12:54 AM, Mork0075 [EMAIL PROTECTED] wrote:
i'am planning to use HDFS as a DFS in a web application evenvironment.
There are two requirements: fault tolerence, which is ensured by the
replicas and load balancing.
There is a SPOF in the form of the name node. So depending
This sound really interesting. And while increasing the replicas for
certain files, the available troughput for these files increases too?
Allen Wittenauer schrieb:
On 8/27/08 12:54 AM, Mork0075 [EMAIL PROTECTED] wrote:
i'am planning to use HDFS as a DFS in a web application evenvironment.
Hi, I need help if it's possible.
My name is Leandro Alvim and i`m a graduated in computer science in Brazil.
So, i'm using hadoop in my university project and i used your tutorials to
learn how to install and run a simple test with python and hadoop. Writing
my application i faced a problem that
Streaming has the ability to accept as input multiple directories, so that
would enable you to merge two directories
(--is this an assignment? ...)
Miles
2008/8/27 Leandro Alvim [EMAIL PROTECTED]
Hi, I need help if it's possible.
My name is Leandro Alvim and i`m a graduated in computer
Hi,
Planet-scale data explorations and data mining operations will almost
always need to include some sequential scans. Then, How can we speed
up sequential scans? BigTable paper shows that.
* Column-oriented storage (it reduces I/O)
* Data compression
* PDP (parallel distributed processing)
I posted this a while back and have been wondering whether I missed something
and the doc is out of date or this is a bug and I should file a jira. Is
there anyone out there who is successfully using IsolationRunner? Please let
me know.
Thanks,
-Yuri
On Friday 08 August 2008 10:09:48
Discussion inline.
You example with the friends makes perfectly sense. Can you imagine a
scenario where storing the data in column oriented instead of row
oriented db (so if you will an counterexample) causes such a huge
performance mismatch, like the friends one in row/column comparison?
On Tue, Aug 26, 2008 at 11:54 PM, Pallavi Palleti [EMAIL PROTECTED]wrote:
Where, I am building a dictionary by collecting the same
in pieces from mapper jobs. I will be using this dictionary in reduce()
method. Can some one please help me if I can put a constraint over reducer
startup time?
Also consider making a patch to fix the behavior and submit it.
Leandro Alvim wrote:
How can i use only a reduce without map?
I don't know if there's a way to run just a reduce task without a map
stage, but you could do it by having a map stage just using the
IdentityMapper class (which passes the data through to the reducers
unchanged), so
The down side of this (which appears to be the only way) is that your
entire input data set has to pass through the identity mapper and then
go through shuffle and sort before it gets to the reducer.
If you have a large input data set, this takes real resources - cpu,
disk, network and wall
On Wed, Aug 27, 2008 at 9:27 AM, Jason Venner [EMAIL PROTECTED] wrote:
The down side of this (which appears to be the only way) is that your
entire input data set has to pass through the identity mapper and then go
through shuffle and sort before it gets to the reducer.
If you don't need the
In order to do anything other than a tar transfer (which is a kludge, of
course), you'll need to open up the relevant ports between the client and
the hadoop cluster. I may miss a few here, but I believe these would
include port 50010 for the datanodes and whatever port the namenode is
listening
On Tue, Aug 26, 2008 at 7:50 AM, Owen O'Malley [EMAIL PROTECTED] wrote:
On Tue, Aug 26, 2008 at 12:39 AM, charles du [EMAIL PROTECTED] wrote:
I would like to sort a large number of records in a big file based on a
given field (key).
The property you are looking for is a total order and
Hi,
I want to know where is the detailed description of the next gen. job
tracker, which replaces hod? Thanks~
--Yiping Han
Optimizations
Right now I have a job whose reducer phase outputs the key-value pairs as
records into a database. Is this the best way to be loading the
database? What
are some alternatives?
Hi,
I would like the reducer to output to different files based upon the
value of the key. I understand that both MultipleOutputs and
MultipleOutputFormat can do this. Is that correct? However, I don't
understand the differences between these two classes. Can someone
explain the
Recently, most DBMS have a bulk insert mechanism instead of each
transaction. Check it.
-Edward
On Thu, Aug 28, 2008 at 6:27 AM, Yih Sun Khoo [EMAIL PROTECTED] wrote:
Optimizations
Right now I have a job whose reducer phase outputs the key-value pairs as
records into a database. Is this
Hi owen,
There is no solution on the server. Instead, I think we can add some
guide on how to fix this, or to explain this phenomenon.
On Thu, Aug 28, 2008 at 1:16 AM, Owen O'Malley [EMAIL PROTECTED] wrote:
Also consider making a patch to fix the behavior and submit it.
--
Best regards,
There are a number of Jiras that modify the scheduling piece of the
JobTracker.
- 3412 refactors the scheduler code out of the JT to make schedulers
more pluggable
- 3445 and 3746 are a couple of new schedulers that do more than the
default JT scheduler. Both these can be good alternatives to
Hi,all
I am using hadoop-0.18.0-core.jar and nutch-2008-08-18_04-01-55.jar,
and running hadoop on one namenode and 4 slaves.
attached is my hadoop-site.xml, and I didn't change the file
hadoop-default.xml
when data in segments are large,this kind of errors occure:
java.io.IOException: Could not
27 matches
Mail list logo