yes,my key is ip,and value is a object(which inherited hadoop Record
class,and will be converted
a visualized data),e.g.:
key field1,field2,field3(these are properties belong to
object)
12.121.23.121 121,11,/img/dd.jpg
32.121.23.222 221,11,/img/xx.jpg
1.i want to sort by field1
Hi,
I'm doing some measurement on hadoop's execution time for my theses. I
discovered some steps in the jobs execution time when raising the
mappers' execution time continuously.
Here is a plot of the execution times with 1, 30 and 60 parallel
executing mappers:
The original posting said - The app does simple match every line of input data
with every line of persistent data. Hence the key should be replaced by a
String from the 10 GB store or a hash of it. Hence, we can match it with the
hash or String from the persistent Store.
can anyone get me a tips ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-sort-the-output-by-value-in-reduce-instead-of-by-key-tp2805541p2805922.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Your field1 data can be split over multiple reducers. Is it possible to emit
field1 as the key from the reducer (in case you do not need the ip anymore)?
From: leibnitz se3g2...@gmail.com
To: hadoop-u...@lucene.apache.org
Sent: Mon, 11 April, 2011 12:02:46 PM
Your field1 data can be split over multiple reducers. Is it possible to emit
field1 as the key from the reducer (in case you do not need the ip anymore)?
From: leibnitz se3g2...@gmail.com
To: hadoop-u...@lucene.apache.org
Sent: Mon, 11 April, 2011 12:02:46 PM
Anyone? Anyone at all?
I figured out the issue with the jobtracker, but I still have the
errors:
* Error register getProtocolVersion
* File (..) could only be replicated to 0 nodes, instead of 1
as explained in my first mail.
The 2nd error can appear without _any_ errors in _any_ of the
HI,
Some mr tasks upon small files had run on our hadoop cluster these
days, with not such high
work load. While when i check it tonight, the cluster refused to
response. So i restart the hdfs
mapred.
BUT unexcepted exceptions were thrown when starting~ and even the
hadoop fs -ls command
could
I found it similiar to
HADDOP-3027(https://issues.apache.org/jira/browse/HADOOP-3027),
but its error msg is:
org.apache.hadoop.mapred.JobTracker: problem cleaning system
directory: /tmp/hadoop/mapred/system
while mine is:
org.apache.hadoop.mapred.JobTracker: problem cleaning system directory:
Leibnitz,
I think you are looking for secondary sort in this case where the
data arrives in some sort of order at the reducer as opposed to in a
group by key. Is that the case?
For a look at secondary sort I've got a few blog articles:
Thamizh,
For a much older project I wrote a demo tool that computed the hadoop
style checksum locally:
https://github.com/jpatanooga/IvoryMonkey
Checksum generator is a single threaded replica of Hadoop's internal
Distributed hash-checksum mechanic.
What its actually doing is saving the CRC32
Forum moderator: pls mark emails from this user as spam.
2011/4/10 Tiru Murugan veera.tirumurugan...@gmail.com
Hi
I am creating a birthday calendar of all my friends and family. Can you
please click on the link below to enter your birthday for me?
Hi Mark,
I also met your problem,I found my way finally.
Firstly,your basic idea is right,we need to move these jars in to
HDFS,because files in HDFS are shared by all the node automatically.
So,There seem to be two solutions here.
solution: a)After you export your project as a jar,you add a
That is how I interpreted it, but if by simple some other matching function
then the most obvious one is meant, then it still is possible to extend theText
class and overwrite the hashCode and equals functions to accommodate for
this new sort of equality.
On Apr 11, 2011, at 1:41 AM, sumit ghosh
It seems like -libjars is for CLASSPATH only. To affect changes to LIBPATH on
each node, -archives needs to be used along with a scheme to have each process
set it's own LIBPATH, once the -archives are untarred, accordingly.
I think the documentation for -libjars could be amended to
I understand that part of the rules of MapReduce is that there's no shared
global information; nevertheless I have a problem that requires shared
global information and I'm trying to get a sense of what mechanisms are
available to address it.
I have a bunch of *sets* built on a vocabulary of
Hi,
I'm doing some measurement on hadoop's execution time for my theses. I
discovered some steps in the jobs execution time when raising the
mappers' execution time continuously.
Here is a plot of the execution times with 1, 30 and 60 parallel
executing mappers:
Hello,
I have a hadoop cluster with hbase 0.89 and hive 0.70 on ubuntu lucid 64 bit
servers.
I want to use ganglia. I did not install it per apt-get Install because this
gives me ganglia 3.1 but I need ganglia 3.0.x
(http://wiki.apache.org/hadoop/GangliaMetrics).
This didn't help me out
Christian,
The TaskTrackers send heartbeat messages to the JobTracker. The
default interval for these messages is 3 seconds.
This is one reason why you see the 3 second steps.
Abhishek
On Mon, Apr 11, 2011 at 3:19 AM, Christian Kumpe christ...@kumpe.de wrote:
Hi,
I'm doing some measurement
Depending on the function that you want to use, it sounds like you want to
use a self join to compute transposed cooccurrence.
That is, it sounds like you want to find all the sets that share elements
with X. If you have a binary matrix A that represents your set membership
with one row per set
Hi Abhishek,
thanks for your answer. Thus this is the reason for the 1s and 3s raster
in the whole plot.
Do you (or someone else) have any ideas what maybe is causing the few
outliers downwards?
The outliers upwards can be caused by some latencies in the network or
in the some of the nodes. No
We have some very large files that we access via memory mapping in
Java. Someone's asked us about how to make this conveniently
deployable in Hadoop. If we tell them to put the files into hdfs, can
we obtain a File for the underlying file on any given node?
Yes you can however it will require customization of HDFS. Take a
look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch.
I have been altering it for use with HBASE-3529. Note that the patch
noted is for the -append branch which is mainly for HBase.
On Mon, Apr 11, 2011 at 3:57
Hi All,
I was trying to run the program using HOD on a cluster, when I allocate
using 5 nodes, it runs fine, but when I allocate using 6 nodes, everytime I
tried to run a program, I get this error:
11/04/11 19:45:50 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found
in the classpath.
On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Yes you can however it will require customization of HDFS. Take a
look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch.
I have been altering it for use with HBASE-3529. Note that the patch
Also, it only provides access to a local chunk of a file which isn't very
useful.
On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Yes you can however it will require customization
What do you mean by local chunk? I think it's providing access to the
underlying file block?
On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning tdunn...@maprtech.com wrote:
Also, it only provides access to a local chunk of a file which isn't very
useful.
On Mon, Apr 11, 2011 at 5:32 PM, Edward
You may specify --with-gmetad parameter if you compile ganglia yourself.
For example:
./configure --sysconfdir=/etc/ganglia --with-gmetad
2011/4/11 malte.eh...@gmx.de
Hello,
I have a hadoop cluster with hbase 0.89 and hive 0.70 on ubuntu lucid 64
bit servers.
I want to use ganglia. I did
thanks all.
to : Josh,i think you are right.i have previously tried to use a group key
by field1+ip at reduce.but it is failed(not sort).
i will check your point:)
--
View this message in context:
Yes. But only one such block. That is what I meant by chunk.
That is fine if you want that chunk but if you want to mmap the entire file,
it isn't real useful.
On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
What do you mean by local chunk? I think it's
Hello,
I am trying to configure Hadoop in fully distributed mode on three virtual
Fedora machines. During configuring I am not getting any error. Even when I
am executing the script start-dfs.sh, there aren't any error.
But practically the namenode isn't able to connect the datanodes. These are
Hello,
I am trying to configure Hadoop in fully distributed mode on three virtual
Fedora machines. During configuring I am not getting any error. Even when I
am executing the script start-dfs.sh, there aren't any error.
But practically the namenode isn't able to connect the datanodes. These
32 matches
Mail list logo