Hello, Everybody

i am trying to make a Naive Bayes Classifier available via a Servlet using
Jetty mixing what is available at
http://emmaespina.wordpress.com/2011/04/26/ham-spam-and-elephants-or-how-to-build-a-spam-filter-server-with-mahout/
and at
http://emmaespina.wordpress.com/2011/04/26/ham-spam-and-elephants-or-how-to-build-a-spam-filter-server-with-mahout/

All the data setup training and testing processes seems to went well, using
hadoop 2.4 or hadoop 1.2.1 and also using MAHOUT_LOCAL.

The servlet is also up and running, i can access it via localhost:8080 and
get my code executed

However, i am having some problems with materializing the bayes model:

NaiveBayesModel model = NaiveBayesModel.materialize(new Path(modelPath),
configuration);

Heres is the test post with curl (assuming ham.txt exists and has some
text):
curl http://localhost:8080/antispam -H “Content-T-Type: text/xml”
–data-binary @ham.txt

Here is the error i receive when i do the post:


java.lang.IllegalArgumentException: Unknown flags set: %d [-1110101]
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:148)
at org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:88)
at org.apache.mahout.math.VectorWritable.readVector(VectorWritable.java:199)
at
org.apache.mahout.classifier.naivebayes.NaiveBayesModel.materialize(NaiveBayesModel.java:112)
at org.example.SpamClassifier.classify(SpamClassifier.java:49)
at org.example.SpamClassifierServlet.doPost(SpamClassifierServlet.java:24)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:533)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:475)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:514)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:920)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:184)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:856)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:247)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:151)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:114)
at org.eclipse.jetty.server.Server.handle(Server.java:352)
at
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
at
org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:1066)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:805)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:218)
at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:426)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:510)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.access$000(SelectChannelEndPoint.java:34)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:40)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:450)
at java.lang.Thread.run(Thread.java:745)

I've tried several paths and i got differente exceptions for file not found
and things like this, so it is not a location problem, there should be
something with the model binary file format.

Am i missing something? The project is at github (
https://github.com/danielneis/mahout-spamclassifier-servlet ) , i’ve tried
several versions of mahout/hadoop-common/hadoop-core/hadoop-hdfs in the pom
file.
 Kind regards,
Daniel

-- 
Daniel Neis Araujo

Reply via email to