Hi Isabel,
First of all, thanks for your reply.
On 03/28/2012 09:10 AM, Isabel Drost wrote:
On 27.03.2012 Dimitri Goldin wrote:
Having tried Mallets naive bayes implementation we achieved ~95%
accuracy without having to balance the training-data. Does anybody know
which implementation detail
Nope it's the sum of the absolute values of differences in ratings, for
your purposes.
On Thu, Mar 29, 2012 at 7:29 PM, ziad kamel ziad.kame...@gmail.com wrote:
City block distance or Manhattan distance
Wikipedia define it for points as
http://en.wikipedia.org/wiki/Taxicab_geometry
So how
Dear Owen,
I tried to look for any other representations like what you used but
didn't find. Can you direct me to any if you are aware of. Why you
used the distance below and not just the absolute difference between
ratings ?
Many thanks !
On Thu, Mar 29, 2012 at 1:30 PM, Sean Owen
What make me wonder is that
CityBlockSimilarity
gave a much higher precision compared with
EuclideanDistanceSimilarity
PearsonCorrelationSimilarity
and others , so it this something usual ? do we have a reason behind ?
On Thu, Mar 29, 2012 at 1:49 PM, ziad kamel ziad.kame...@gmail.com wrote:
I have a dataset that is not terribly large (~31 MB on disk in plaintext,
~145,000 records with 26 fields). I am trying to build random forests over
the data, but the process is quite slow. It takes about half an hour to
build 100 trees using the partial implementation. (I didn't realize I
Like I think we've said, it depends on your data. I expect that some
similarity metrics will work better than others. Why is hard to say
without knowing anything about your data.
I don't understand your previous question about representation. I just gave
you the definition of city-block distance.
Hadoop is what chooses the number of mappers, and it bases it on input
size. Generally it will not assign less than one worker per chunk and a
chunk is usually 64MB (still, I believe). You can override this directly
(well, at least, register a suggestion to Hadoop). I would tell you the
exact flag
I think that it is NOT using preferences values. Also in the
algorithm it mentions that it is using the NUMBER and not values
* @param pref1number of non-zero values in left vector
* @param pref2number of non-zero values in right vector
* @param intersection number of
What top items? I am not sure what you're referring to here, but, no I do
not expect things to be identical when changing metrics in general. I've
already answered your other question.
On Thu, Mar 29, 2012 at 10:52 PM, ziad kamel ziad.kame...@gmail.com wrote:
OK, things become more clear .
Hi ,
I want to recommend movies based of user preferences and movie type (
comedy , etc ). I have a data of users watching movies during years.
I don't have a direct preferences but was wondering if I can create
some with the years and movies type . Any suggestions?
data format
user - movie -
It is very common that preferences or ratings DECREASE recommendation
performance.
The basic reason is that there is little or no real signal in the ratings
after you account for the fact that the rating exists at all.
In practice, there is the additional reason that if you don't need a
rating,
Suggestion, indeed. I passed that option, but still only 2 mappers were
created.
On Thu, Mar 29, 2012 at 5:23 PM, Sean Owen sro...@gmail.com wrote:
Hadoop is what chooses the number of mappers, and it bases it on input
size. Generally it will not assign less than one worker per chunk and a
(If you're using a modern version of Hadoop, the flag is something
different, so make sure you check what the real value is.)
There's another option concerning minimum split size that you could reduce
from its default too.
On Thu, Mar 29, 2012 at 11:05 PM, Jason L Shaw jls...@uw.edu wrote:
Split your training data into lots of little files. Depending on the wind,
that may cause more mappers to be invoked.
On Thu, Mar 29, 2012 at 3:05 PM, Jason L Shaw jls...@uw.edu wrote:
Suggestion, indeed. I passed that option, but still only 2 mappers were
created.
On Thu, Mar 29, 2012 at
I never though a ratings can decrease the recommendations. Does this
thing have a name like under-fitting or so in recommender systems ?
On Thu, Mar 29, 2012 at 5:04 PM, Ted Dunning ted.dunn...@gmail.com wrote:
It is very common that preferences or ratings DECREASE recommendation
performance.
No. It is more related to the fact that ratings are just very strange
things.
On Thu, Mar 29, 2012 at 3:35 PM, ziad kamel ziad.kame...@gmail.com wrote:
I never though a ratings can decrease the recommendations. Does this
thing have a name like under-fitting or so in recommender systems ?
Hi,
I am trying the partial decision forest example, and wondering whether
there is any way to check the built trees stored in forest.seq file? I
cannot find any function in DecisionForest can do that. Thanks!
Regards,
Shawn
-Dmapred.map.tasks=N only gives a suggestion to Hadoop, and in most
cases (especially when the data is small) Hadoop doesn't take it into
consideration. To generate more mappers use -Dmapred.max.split.size=S,
S being the size of each data partition in bytes. So your data ~
3100B, if you want
You can use DecisionForest.load(Configuration conf, Path path)
(org.apache.mahout.classifier.df package). You can just pass the output
path that contains the trees and this function will load them all.
On Fri, Mar 30, 2012 at 3:41 AM, Xiaomeng Wan shawn...@gmail.com wrote:
Hi,
I am trying the
19 matches
Mail list logo