Hi,
to deploy software I suggest pulp:
https://fedorahosted.org/pulp/wiki/HowTo
For a package-based distro (debian, redhat, centos) you can build apache's
hadoop, pack it and delpoy. Configs, as Cos say, over puppet. If you use a
redhat / centos take a look at spacewalk.
best,
Alex
On Mon, De
Arun,
> I want to test its behaviour under different size of jobs traces(meaning
> number of jobs say 5,10,25,50,100) under different
> number of nodes.
> Till now i was using only the test/data given by mumak which has 19 jobs and
> 1529 node topology. I don' have many nodes
> with me to run som
Hi Burak,
>Bejoy Ks, i have a continuous inflow of data but i think i need a near
real-time system.
Just to add to Bejoy's point,
with Oozie, you can specify the data dependency for running your job.
When specific amount of data is in, your can configure Oozie to run your job.
I think this will
Hi Burak,
The model of hadoop is very different, it is based on Job based model, in
more easy words its a kind of Batch model where map reduce job is executed
on a batch of data which is already present.
As per your requirement, word count example doesn't make sense if the file
has been written co
Athanasios Papaoikonomou, cron job isn't useful for me. Because i want to
execute the MR job on the same algorithm but different files have different
velocity.
Both Storm and facebook's hadoop are designed for that. But i want to use
apache distribution.
Bejoy Ks, i have a continuous inflow of da
Thanks Bejoy,
I was looking at DBInputFormat with MultipleInputs. MultipleInputs takes a
Path parameter. Are these paths just ignored here?
On Mon, Dec 5, 2011 at 2:31 PM, Bejoy Ks wrote:
> Hi Justin,
>Just to add on to my response. If you need to fetch data from
> rdbms on your mapp
You might also want to take a look at Storm, as thats what its design to
do: https://github.com/nathanmarz/storm/wiki
On Mon, Dec 5, 2011 at 1:34 PM, Mike Spreitzer wrote:
> Burak,
> Before we can really answer your question, you need to give us some more
> information on the processing you want
hadoop dfs cat /my/path/* > single_file
Russell Jurney
twitter.com/rjurney
russell.jur...@gmail.com
datasyndrome.com
On Dec 5, 2011, at 12:30 PM, Aaron Griffith wrote:
> Using PigStorage() my pig script output gets put into partial files on the
> hadoop
> file system.
>
> When I use the copyTo
Burak,
Before we can really answer your question, you need to give us some more
information on the processing you want to do. Do you want output that is
continuous or batched (if so, how)? How should the output at a given time
be related to the input up to then and the previous outputs?
Regar
Burak
If you have a continuous inflow of data, you can choose flume to
aggregate the files into larger sequence files or so if they are small and
when you have a substantial chunk of data(equal to hdfs block size). You
can push that data on to hdfs based on your SLAs you need to schedule you
Hi Chris
From the stack trace, it looks like a JVM corruption issue. It is
a known issue and have been fixed in CDH3u2, i believe an upgrade would
solve your issues.
https://issues.apache.org/jira/browse/MAPREDUCE-3184
Then regarding your queries,I'd try to help you out a bit.In mapreduce
Hi Chris,
I'd suggest updating to a newer version of your hadoop distro - you're
hitting some bugs that were fixed last summer. In particular, you're
missing the "amendment" patch from MAPREDUCE-2373 as well as some
patches to MR which make the fetch retry behavior more aggressive.
-Todd
On Mon,
Hi everyone,
I want to run a MR job continuously. Because i have streaming data and i
try to analyze it all the time in my way(algorithm). For example you want
to solve wordcount problem. It's the simplest one :) If you have some
multiple files and the new files are keep going, how do you handle it
Hi,
Using: *Version:* 0.20.2-cdh3u0, r81256ad0f2e4ab2bd34b04f53d25a6c23686dd14,
8 node cluster, 64 bit Centos
We are occasionally seeing MAX_FETCH_RETRIES_PER_MAP errors on reducer
jobs. When we investigate it looks like the TaskTracker on the node being
fetched from is not running. Looking at th
Hi Aaron
Instead of copyFromLocal use getmerge. It would do your job. The
syntax for CLI is
hadoop fs -getmerge /xyz.txt
Hope it helps!...
Regards
Bejoy.K.S
On Tue, Dec 6, 2011 at 1:57 AM, Aaron Griffith
wrote:
> Using PigStorage() my pig script output gets put into partial files on
Hi Justin,
Just to add on to my response. If you need to fetch data from
rdbms on your mapper using your custom mapreduce code you can use the
DBInputFormat in your mapper class with MultipleInputs. You have to be
careful in using the number of mappers for your application as dbs would
Using PigStorage() my pig script output gets put into partial files on the
hadoop
file system.
When I use the copyToLocal fuction from Hadoop it creates a local directory with
all the partial files.
Is there a way to copy the partial files from hadoop into a single local file?
Thanks
Justin
If I get your requirement right you need to get in data from
multiple rdbms sources and do a join on the same, also may be some more
custom operations on top of this. For this you don't need to go in for
writing your custom mapreduce code unless it is that required. You can
achieve t
I turned on the profiling in Hadoop, and the MapReduceTutorial at
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html says that
the profile files should go to the user log directory. However, they're
currently going to the working directory where I start the hadoop job
from. I've se
These that great project called BigTop (in the apache incubator) which
provides for building of Hadoop stack.
The part of what it provides is a set of Puppet recipes which will allow you
to do exactly what you're looking for with perhaps some minor corrections.
Serious, look at Puppet - otherwise
I would like join some db tables, possibly from different databases, in a
MR job.
I would essentially like to use MultipleInputs, but that seems file
oriented. I need a different mapper for each db table.
Suggestions?
Thanks!
Justin Vincent
I am running 64bit version. Have you setup SSH properly?
On Dec 3, 2011, at 2:30 AM, Will L wrote:
>
>
> I am using 64-Bit Eclipse 3.7.1 Cocoa with Hadoop 0.20.205.0. I get the
> following error message:
> An internal error occurred during: "Connecting to DFS localhost".
> org/apache/commons/co
Hi Praveenesh,
I had created VMs images with OS /hadoop nodes pre-configured which I
would start as per requirement. But if you plan to do at the hardware level
then Linux provides with kickstart type of configuration, which allows OS /
Package installations automatically (network conf
Hi all,
Can anyone guide me how to automate the hadoop installation/configuration
process?
I want to install hadoop on 10-20 nodes which may even exceed to 50-100
nodes ?
I know we can use some configuration tools like puppet/or shell-scripts ?
Has anyone done it ?
How can we do hadoop installati
24 matches
Mail list logo