Exactly - A job is already designed to be properly parallel w.r.t. its
input, and this would just add additional overheads of job setup and
scheduling. If your per-record processing requires threaded work,
consider using the MultithreadedMapper/Reducer classes instead.
On Wed, Dec 12, 2012 at
Hello list,
I don't know if this question makes any sense, but I would like
to ask, does it make sense to store 500TB (or more) data in a single DN?If
yes, then what should be the spec of other parameters *viz*. NN DN RAM,
N/W etc?If no, what could be the alternative?
Many thanks.
I am out of the office until 12/17/2012.
I am out of the office until 12/17/2012.
For any issues please contact Dispatcher:dispatcherdb...@us.ibm.com
Thanks.
Prabhat Pandey
Note: This is an automated response to your message Hadoop 101 sent on
12/11/2012 17:49:45.
This is the only
Yes it does make sense, depending on how much compute each byte of data
will require on average. With ordinary Hadoop, it is reasonable to have
half a dozen 2TB drives. With specialized versions of Hadoop considerably
more can be supported.
From what you say, it sounds like you are suggesting
Thank you so much for the valuable response Ted.
No, there would be dedicated storage for NN as well.
Any tips on RAM N/W?
*Computations are not really frequent.
Thanks again.
Regards,
Mohammad Tariq
On Wed, Dec 12, 2012 at 9:14 PM, Ted Dunning tdunn...@maprtech.com wrote:
Yes it
Yeah I found the TextInputFormat and TextKeyValueInputFormat and I know how to
parse text--I'm just too lazy. I was hoping there was a Text equivalent of a
SequenceFile that was hidden somewhere. As I said there is no mapper, this is
running outside of hadoop M/R. So I at least need a line
500 TB?
How many nodes in the cluster? Is this attached storage or is it in an array?
I mean if you have 4 nodes for a total of 2PB, what happens when you lose 1
node?
On Dec 12, 2012, at 9:02 AM, Mohammad Tariq donta...@gmail.com wrote:
Hello list,
I don't know if this
but I do have run across some situations where I could benefit from
multi-threading: if your hadoop mapper is prone to random access IO (such
as looking up a TFile, or HBase, which ultimately makes a network call and
then looks into a file segment), having multiple threads could utilize the
CPU
Hello there,
I have an Oozie workflow that is failing on a Hive action with the
following error:
FAILED: SemanticException [Error 10001]: Table not found
attempted_calls_import_raw_logs_named_route_name
If I run the query file from the command line (as described in the map task
log), it works
Thank you for the suggestion.
From the log output javax.jdo.option.ConnectionDriverName appears to be set
to com.mysql.jdbc.Driver, with the correct IP in
javax.jdo.option.ConnectionURL. I have copied hive-site.xml from the local
machine into Hadoop and instructed Oozie to use that, which it
Nothing that I'm aware of for text files, I'd just use standard unix utils
to process it outside of Hadoop.
As to getting a reader from any of the Input Formats, here's the typical
example you'd follow to get the reader for a sequence file, you could
extrapolate the example to access whichever
I'm having exactly this problem, and it's causing my job to fail when I try
to process a larger amount of data (I'm attempting to process 30GB of
compressed CSVs and the entire job fails every time).
This issues is open for it:
https://issues.apache.org/jira/browse/MAPREDUCE-5
Anyone have any
Hello Chris,
Thank you so much for the valuable insights. I was actually using the
same principle. I did the blunder and did the maths for entire (9*3)PB.
Seems I am higher than you, that too without drinking ;)
Many thanks.
Regards,
Mohammad Tariq
On Thu, Dec 13, 2012 at 10:38
Hi all,
Could you help me What is the difference between the branch-1 and branch-1-win ?
Regards,
Wenwu,Peng
Hi all,
I downloaded Hadoop-1.1.1 tar ball from one of the mirrors and
configured it in psuedo-distributed mode.
Namenode starts fine but datanode fails to start because of version mismatch.
The value of hadoop.relaxed.worker.version.check property (related to
the version is not match,as the log indicated:
namenode:1.1.1
datanode:1.1.2-SNAPSHOT
hadoop.relaxed.worker.version.check only works when version match(relax
just revision check).
you may have a try of hadoop.skip.worker.version.check.
see https://issues.apache.org/jira/browse/HADOOP-8968
On
If your production target is bit far away, I'd encourage setting up
and using the 2.x based releases for its feature set that may aid you
in your design. We'll be releasing 2.0.3 soon.
However, if you want the older, stable code, go with the 1.x based releases.
On Wed, Dec 12, 2012 at 6:58 PM,
17 matches
Mail list logo