Hello,everyone
I am new to hadoop.there are few topic about running uima-as in
hadoop.in the uima website ,there is one article talking about this .but
it is so general.I appreciates too much if anyone can have experiences about
running uima-as in hadoop and illustrate it with
Can you tell me how to deployed uima on hadoop?
Thanks in advance,
Jack
--
View this message in context:
http://www.nabble.com/UIMA-scale-out-using-Hadoop%2C-number-of-map-tasks-tp19010118p23131414.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
In addition to what Aaron mentioned, you can configure the minimum split
size in hadoop-site.xml to have smaller or larger input splits depending on
your application.
-Jim
On Mon, Apr 20, 2009 at 12:18 AM, Aaron Kimball aa...@cloudera.com wrote:
Yes, there can be more than one InputSplit per
On Sat, 2009-04-18 at 09:57 -0700, jason hadoop wrote:
The traditional approach would be a Mapper class that maintained a member
variable that you kept the max value record, and in the close method of your
mapper you output a single record containing that value.
Perhaps you can forgive the
The Hadoop Framework requires that a Map Phase be run before the Reduce
Phase.
By doing the initial 'reduce' in the map, a much smaller volume of data has
to flow across the network to the reduce tasks.
But yes, this could simply be done by using an IdentityMapper and then have
all of the work
Mark,
There is a setup price when using Hadoop, for each task a new JVM must
be spawned. On such a small scale, you won't see any good using MR.
J-D
On Mon, Apr 20, 2009 at 12:26 AM, Mark Kerzner markkerz...@gmail.com wrote:
Hi,
I ran a Hadoop MapReduce task in the local mode, reading and
Hi all,
I am not able to subscribe to pig mailing list (both dev and user). Here is
the error message that I am getting when I tried to confirm the
subscribtion.
Your message did not reach some or all of the intended recipients.
Subject:
Jean-Daniel,
I realize that, and my question was, is this the normal setup/finishup time,
about 2 minutes? If it is, then fine. I would expect that on tasks taking
10-15 minutes, 2 minutes would be totally justified, and I think that this
is the guideline - each task should take minutes.
Thank
Hi all,
Sorry for posting in wrong group. When I clicked on nabble in PIG
mailing list page(http://hadoop.apache.org/pig/mailing_lists.html), it
redirected me to this mailing list. Unaware of this, I posted in the
redirected group.
Thanks
Pallavi
-Original Message-
From: Pallavi
Mark,
Oh sorry, yes you should expect that kind of delay. A tip to optimize
that on big jobs with lots of tasks is to use the
JobConf.setNumTasksToExecutePerJvm(int numTasks) which sets how many
times a JVM can be reused (instead of spawning new ones).
Happy Hadooping!
J-D
On Mon, Apr 20, 2009
I am new to hadoop and now begin to look into the code. I want to know the
difference between RawLocalFileSystem and LocalFileSystem. I know the latter
one has the capability to do checksum. Is that all?
Thanks.
--
View this message in context:
Thanks Aaron, that really helps. I probably do need to control the
number of splits. My input 'data' consists of Java objects and their
size (in bytes) doesn't necessarily reflect the amount of time needed
for each map operation. I need to ensure that I have enough map tasks
so that all
Same here, sadly there isn't much call for Lucene user groups in Maine.
It would be nice though ^^
Matt
Amin Mohammed-Coleman wrote:
I would love to come but I'm afraid I'm stuck in rainy old England :(
Amin
On 18 Apr 2009, at 01:08, Bradford Stephens
bradfordsteph...@gmail.com wrote:
On Apr 20, 2009, at 7:49 PM, Xie, Tao wrote:
I am new to hadoop and now begin to look into the code. I want to
know the
difference between RawLocalFileSystem and LocalFileSystem. I know
the latter
one has the capability to do checksum. Is that all?
Pretty much.
Arun
Yes I considered Shevek's tactic as well, but as Jason pointed out
emit ing the entire data set just to find the maximum value would be
wasteful, you do not want to sort the dataset, you just want to break
it in parts and find the max value of each part, then bring it into
one part and perform
On Apr 20, 2009, at 9:56 AM, Mark Kerzner wrote:
Hi,
I ran a Hadoop MapReduce task in the local mode, reading and writing
from
HDFS, and it took 2.5 minutes. Essentially the same operations on
the local
file system without MapReduce took 1/2 minute. Is this to be
expected?
Hmm...
Arun, thank you very much for the answer. I will turn off the combiner. I am
debugging intermediate MR steps now, so I am mostly interested in
performance to for this, and real tuning will be later, in a cluster. I am
running 18.3, but general pointers should be good enough at this stage.
I am
Lest you think silence equals acceptance...
This is not appropriate use of these lists.
-Grant
On Apr 19, 2009, at 11:58 PM, wu fuheng wrote:
welcome to download
http://www.ultraie.com/admin/flist.php
I've written a MR job with multiple outputs. The normal output goes
to files named part-X and my secondary output records go to files
I've chosen to name ExceptionDocuments (and therefore are named
ExceptionDocuments-m-X).
I'd like to pull merged copies of these files to my local
Hey Jason,
Wouldn't this be avoided if you used a combiner to also perform the
max() operation? A minimal amount of data would be written over the
network.
I can't remember if the map output gets written to disk first, then
combine applied or if the combine is applied and then the data
Thanks for the responses, everyone. Where shall we host? My company
can offer space in our building in Factoria, but it's not exactly a
'cool' or 'fun' place. I can also reserve a room at a local library. I
can bring some beer and light refreshments.
On Mon, Apr 20, 2009 at 7:22 AM, Matthew Hall
Hi,
I have a single-node hadoop cluster. The hadoop version -
[patn...@ac4-dev-ims-211]~/dev/hadoop/hadoop-0.19.1% hadoop version
Hadoop 0.19.1
Subversion https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r
745977
Compiled by ndaley on Fri Feb 20 00:16:34 UTC 2009
Following
I might be in Seattle in the near future (currently in Los Angeles). When
were you thinking of having this?
On Mon, Apr 20, 2009 at 4:28 PM, Bradford Stephens
bradfordsteph...@gmail.com wrote:
Thanks for the responses, everyone. Where shall we host? My company
can offer space in our building
If you guys are interested in space over in Redmond, I can see if MSFT can
host. Let me know...
Lauren
On Mon, Apr 20, 2009 at 4:28 PM, Bradford Stephens
bradfordsteph...@gmail.com wrote:
Thanks for the responses, everyone. Where shall we host? My company
can offer space in our building in
Hi,
in an MR step, I need to extract text from various files (using Tika). I
have put text extraction into reduce(), because I am writing the extracted
text to the output on HDFS. But now it occurs to me that I might as well
have put it into map() and have default reduce() which will write every
Unless you need the hashing/sorting provided by the reduce phase, I'd
recommend placing your logic in your mapper and, when setting up your
job, calling JobConf#setNumReduceTasks(0), so that the reduce phase
won't be executed. In that case, any records emitted by your mapper
will be written to
Our application is using hadoop to parallelize jobs across ec2 cluster. HDFS
is used to store output files. How would you ideally copy output files from
HDFS to remote databases?
Thanks,
Parul V. Kudtarkar
--
View this message in context:
What is the exact purpose that you want a system not in hadoop cluster to
access the namenode or datanode? If it is simply to write data to HDFS from
local system and then to copy back data from HDFS to local system simply use
hadoop file system's shell commands.
Hope this helps!
deepya wrote:
28 matches
Mail list logo