Try this rather small C++ program...it will more than likley be a LOT faster
than anything you could do in hadoop. Hadoop is not the hammer for every nail.
Too many people think that any "cluster" solution will automagically scale
their problem...tain't true.
I'd appreciate hearing your resul
Another thing...where do you get 2M from?
10,000 x 100 = 1,000,000
So you'd have an absolute max of 1M to do for each 1002M total for your
examplegender pref cuts that in half or so...plus other prefs cut it
further...
And 2M calculations is relatively nothing compared to 10,000^2 (
You need to stop looking at this as an all-or-nothing...and look at it more
like real-time.
You only need to do an absolute max of 1*10,000 at a time. And...you actually
only need to do considerably less than that with age preference and other
factors for the usersand doing the computat
I had no idea the kimono comment would be so applicable to your problem...
Everything makes sense except the Bayesian computation.
Your "score" can be computed on subsetsin particular you only need to do it
on "new" and "changed" records. Most of which should be pretty static (age
needs
All you're doing is delaying the inevitable by going to hadoop. There's no
magic to hadoop. It doens't run as fast as individual processes. There's just
the ability to split jobs across a cluster which works for some problems. You
won't even get a linear improvement in speed.
At least I as
All you're doing is delaying the inevitable by going to hadoop. There's no
magic to hadoop. It doens't run as fast as individual processes. There's just
the ability to split jobs across a cluster which works for some problems. You
won't even get a linear improvement in speed.
At least I as
What kind of compare do you have to do?
You should be able to compute a checksum or such for each row when you insert
them and only have to look at the subset that matches if you're doing some sort
of substring or such.
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop
http://sourceforge.net/projects/gkernel/files/rng-tools
rndg is in there.
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
From: Jon Lederman [mailto:jon2...@mac.com]
Sent: Tue 1/4/2011 2:00 PM
To: commo
Did you sert your config and format the namenode as per these instructions?
http://hadoop.apache.org/common/docs/current/single_node_setup.html
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
Try checking your dfs status
hadoop dfsadmin -safemode get
Probably says "ON"
hadoop dfsadmin -safemode leave
Somebody else can probably say how to make this happen every reboot
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
__
I'm using hadoop-0.20.2 and I see this for my map/reduce class
com/ngc/asoc/recommend/Predict$Counter.class
com/ngc/asoc/recommend/Predict$R.class
com/ngc/asoc/recommend/Predict$M.class
com/ngc/asoc/recommend/Predict.class
I'm a java idiot so I don't know why they appear but perhaps you have sim
You mean the file is "not trusted". I was using Outlook and my company
automatically puts a digital certificate on all emails. I'm using webmail
right now which doesn't. That certificate is installed by default on all
company computers so it looks trusted to us without having to explicitly t
Using hadoop-0.20
I'm doing custom input splits from a Lucene index.
I want to split the document ID's across N mappers (I'm testing the
scalabilty of the problem across 4 nodes and 8 cores).
So the key is the document# and they are not sequential.
At this point I'm using splits.add to add eac
#1 Check CPU fan is working. A hot CPU can give flakey errorsespecially
during high CPU load.
#2 Do memtest on the machine. You might have a bad memory stick that is
getting hit (though I
would tend to think it would be a bit more random).
I've used memtest86 before to find such problems.
h
Using hadoop-0.20.2+737 on Redhat's distribution.
I'm trying to use a dictionary.csv file from a Lucene index inside a map
function plus another comma delimited file.
It's just a simple loop of reading a line, split the line on commas, and add
the dictionary entry to a hash map.
It's about an
15 matches
Mail list logo