Hey,
Can anyone explain me what is reduce > copy phase in the reducer section ?
The (K,List(V)), is passed to the reducer. Is reduce > copy representing
copying of K,List(V) on the reducer from all mappers ?
I am monitoring my jobs on the cluster, using Jobtracker url.
I am seeing for most of my
Dear List,
we're trying to use a central HDFS storage in order to be accessed from
various other Hadoop-Distributions.
Do you think this is possible? We're having trouble, but not related to
different RPC-Versions.
When trying to access a Cloudera CDH3 Update 2 (cdh3u2) HDFS from
BigInsigh
i face the same issue but after sumtime when i balanced the cluster the
jobs started running fine,
On Wed, Jan 25, 2012 at 3:34 PM, praveenesh kumar wrote:
> Hey,
>
> Can anyone explain me what is reduce > copy phase in the reducer section ?
> The (K,List(V)), is passed to the reducer. Is reduce
@hadoophive
Can you explain more by "balance the cluster" ?
Thanks,
Praveenesh
On Wed, Jan 25, 2012 at 4:29 PM, hadoop hive wrote:
> i face the same issue but after sumtime when i balanced the cluster the
> jobs started running fine,
>
> On Wed, Jan 25, 2012 at 3:34 PM, praveenesh kumar >wrot
Understanding Fair Schedulers better.
Can we create mulitple pools in Fair Schedulers. I guess Yes. Please
correct me.
Suppose I have 2 pools in my fair-scheduler.xml
1. Hadoop-users : Min map : 10, Max map : 50, Min Reduce : 10, Max Reduce :
50
2. Admin-users: Min map : 20, Max map : 80, Min Re
Thanks Harsh.
I' ll look into the tasktracker logs to find any issues with mapreduce and
update this thread accordingly.
(PS: Sorry for the wide circulation. My mails still don't directly land on
common-user@hadoop.apache.org so tried posting it
through Nabble and something got broken. I have mai
Hello,
I'm trying to develop an application, where Reducer has to produce
multiple outputs.
In detail I need the Reducer to produce two types of files. Each file
will have different output.
I found in Hadoop, The Definitive Guide, that new API uses only
MultipleOutputs, but working with Mu
Hello Romeo,
Inline…
On Wed, Jan 25, 2012 at 4:07 PM, Romeo Kienzler wrote:
> Dear List,
>
> we're trying to use a central HDFS storage in order to be accessed from
> various other Hadoop-Distributions.
The HDFS you've setup, what 'distribution' is that from? You will have
to use that particula
BigInsights? ... Ok, I'll be nice ... :-)
Ok, so of I understand your question, you want to use a single HDFS file system
to be used by different 'Hadoop' frameworks ? (derivatives)
First, it doesn't make sense. I mean it really doesn't make any sense.
Second.. I don't think it would be possib
Its not your TaskTracker thats failing, your job itself is running
locally, and not on a JobTracker. This would not work for what you're
trying to run.
Are you sure you have the right mapred-site.xml configuration from
where you launch your job?
On Wed, Jan 25, 2012 at 5:12 PM, Utkarsh Rathore
w
Insight is a IBM related product, based on an fork of hadoop I think. The
mixing of totally different stacks make no sense. And will not work, I guess.
- Alex
--
Alexander Lorenz
http://mapredit.blogspot.com
On Jan 25, 2012, at 1:12 PM, Harsh J wrote:
> Hello Romeo,
>
> Inline…
>
> On Wed,
What version/release/distro of Hadoop are you using? Apache releases
got the new (unstable) API MultipleOutputs only in 0.21+, and was only
very recently backported to branch-1.
That said, the next release in 1.x (1.1.0, out soon) will carry the
new API MultipleOutputs, but presently no release in
Oh and btw, do not fear the @deprecated 'Old' API. We have
undeprecated it in the recent stable releases, and will continue to
support it for a long time. I'd recommend using the older API, as that
is more feature complete and test covered in the version you use.
On Wed, Jan 25, 2012 at 6:09 PM, H
Did you try using hftp:// instead of hdfs://. This would work across different
rpc versions as long as the code base is not from significantly different
branches.
EOFException might also be related to RPC version mismatch. If the release of
Hadoop is based off the 0.20.2xx (Hadoop with securit
Praveenesh,
You can try specifying "mapred.fairscheduler.pool" to your pool name while
running the job. By default, mapred.faircheduler.poolnameproperty set to
user.name ( each job run by user is allocated to his named pool ) and you
can also change this property to group.name.
Srinivas --
Also,
Praveenesh,
You can try specifying "mapred.fairscheduler.pool" to your pool name while
running the job. By default, mapred.faircheduler.poolnameproperty set to
user.name ( each job run by user is allocated to his named pool ) and you
can also change this property to group.name.
Srinivas --
Also,
I'm using 1.0.0 beta, suppose it was wrong decision to use beta version.
So do you recommend using 0.20.203.X and stick to Hadoop definitive
guide approaches?
Thanks for your reply
On 01/25/2012 01:41 PM, Harsh J wrote:
Oh and btw, do not fear the @deprecated 'Old' API. We have
undeprecated i
I recommend sticking to older APIs for the 1.x release line (Know that
1.x is a micro revision over and a rename of the 0.20.20x branches
[0]).
Do not worry about the @deprecated markers, these APIs are still fully
available and supported upto 0.23 and beyond, and should give off no
upgrade worrie
One more question. Just downloaded Hadoop 0.20.203.0 considered to be
last stable release. What about JobConf vs. Confirguration classes. What
should I use to avoid wrong approaches, because JobConf seems to be
depricated.
Sorry for bothering you with this questions. I'm just not used to having
Hi,
Just set your code to ignore the deprecation warnings for JobConf/etc.
- it causes no harm to use it.
On Wed, Jan 25, 2012 at 6:32 PM, Ondřej Klimpera wrote:
> One more question. Just downloaded Hadoop 0.20.203.0 considered to be last
> stable release. What about JobConf vs. Confirguration c
Alex,
I said I would be nice and hold my tongue when it comes to IBM and their IM
pillar products... :-)
You could write a client that talks to two different hadoop versions but then
you would be using hftp which is what you have under the hood in distcp...
But that doesn't seem to be what he
I am running pig jobs, how can I specify on which pool, it should run ?
Also do you mean, the pool allocation is done job wise, not user wise ?
On Wed, Jan 25, 2012 at 6:14 PM, Srinivas Surasani wrote:
> Praveenesh,
>
> You can try specifying "mapred.fairscheduler.pool" to your pool name while
this problem arise after adding a node , so then i start balancer to make
it balance ,
On Wed, Jan 25, 2012 at 4:38 PM, praveenesh kumar wrote:
> @hadoophive
>
> Can you explain more by "balance the cluster" ?
>
> Thanks,
> Praveenesh
>
> On Wed, Jan 25, 2012 at 4:29 PM, hadoop hive wrote:
>
> >
The copy phase fetches the map outputs. It may hang for a while if
there are no newly completed map outputs to fetch yet.
You can raise your reducers' slowstart value to have it not spend so
many cycles waiting but rather start at 80-90% of map completions,
instead of default 5%. This helps your M
Set the property in Pig with the 'set' command or other ways:
http://pig.apache.org/docs/r0.9.1/cmds.html#set or
http://pig.apache.org/docs/r0.9.1/start.html#properties
As Srinivas covered earlier, pool allocation can be done per-user if
you set the scheduler poolnameproperty to "user.name". Per g
Yeah , I am doing it, currently its on 20 %, I guess I have to raise it
more.
Funny thing is, its still happening after map is 100% completed.
when map is completed, it should not wait, right. But I see it still give
same message, for some time.
Thanks,
Praveenesh
On Wed, Jan 25, 2012 at 7:29 PM,
I am looking for the solution where we can do it permanently without
specify these things inside jobs.
I want to keep these things hidden from the end-user.
End-user would just write pig scripts and all the jobs submitted by the
particular user will get submit to their respective pools automaticall
Also, with the above mentioned method, my problem is I am having one
pool/user (thats obviously not a good way of configuring schedulers)
How can I allocate multiple users to one pool in the xml properties, so
that I don't have to care giving any options inside my codes.
Thanks,
Praveenesh
On Wed
A solution would be to place your users into groups, and use
group.name identifier to be the poolnameproperty. Would this work for
you instead?
On Wed, Jan 25, 2012 at 8:00 PM, praveenesh kumar wrote:
> Also, with the above mentioned method, my problem is I am having one
> pool/user (thats obvio
Dear all,
first of all the reason for this is that we have a lot of data in
Cloudera but want to test BigSheets (from BigInsights) and Datameer
using the same HDFS - Source (instead of reimporting).
Thanks a lot for your suggestion. I finally got it working, here is the
steps I have done:
Then in that case, will I be using group name tag in allocations file, like
this inside each pool ?
< group name="ABC">
6
Thanks,
Praveenesh
On Wed, Jan 25, 2012 at 8:08 PM, Harsh J wrote:
> A solution would be to place your users into groups, and use
> group.name identifier to be the
Not exactly. See, the poolnameproperty being group.name will map the
group name as a pool name. So you need to only use
for configuring a group "ABC". Does that make sense?
On Wed, Jan 25, 2012 at 8:49 PM, praveenesh kumar wrote:
> Then in that case, will I be using group name tag in allocations
Touche`!
Raj
>
> From: Robert Evans
>To: "common-user@hadoop.apache.org" ; Raj V
>
>Sent: Wednesday, January 25, 2012 7:36 AM
>Subject: Re: When to use a combiner?
>
>
>Re: When to use a combiner?
>You can use a combiner for average. You just have to write
okie got it.. same pool name.. as group name...
On Wed, Jan 25, 2012 at 8:51 PM, Harsh J wrote:
> Not exactly. See, the poolnameproperty being group.name will map the
> group name as a pool name. So you need to only use
> for configuring a group "ABC". Does that make sense?
>
> On Wed, Jan 25,
You can use a combiner for average. You just have to write a separate combiner
from your reducer.
Class myCombiner {
//The value is sum/count pairs
void reduce(Key key, Interable> values, Context context) {
long sum = 0;
long count = 0;
for(Pair value: values) {
Hi,
I'm trying to set up hadoop1.0.0 on my mac book, notice a bunch of set up
files in 'sbin' directory
rwxr-xr-x@ 1 root wheel 3392 16 Dec 03:39 hadoop-create-user.sh
-rwxr-xr-x@ 1 root wheel 3636 16 Dec 03:39 hadoop-setup-applications.sh
-rwxr-xr-x@ 1 root wheel 26777 16 Dec 03:39 ha
These scripts aren't for tarball installs. They are for package
installs that does not apply to Mac OSX. I haven't a clue what they're
even doing in the release tarball. You should file a JIRA issue to
have them removed.
You just need to follow:
http://hadoop.apache.org/common/docs/current/single_
37 matches
Mail list logo