[google-appengine] Re: faster machine learning on appengine (nlp bayes)

2011-02-06 Thread Norlesh
OK so I got curious over the weekend and decided to do some testing.. The short of it is don't bother with trees! Without being able to run native code extensions that can create efficient nodes there is to much overhead involved with creating a new python object for each node. I ran a test with

[google-appengine] Re: faster machine learning on appengine (nlp bayes)

2011-02-04 Thread Norlesh
Sofia a radix tree builds a tree of all the prefixes common between words and then attaches a leaf node with the new suffix when the word doesn't fit in the existing structure. The upshot of this is as your word count increases the storage cost per word decreases. As for actual performance numbers

[google-appengine] Re: faster machine learning on appengine (nlp bayes)

2011-02-03 Thread Norlesh
How about building a radix tree with your classification attached as an attribute to each word and storing it as an object. You could load to memory at the start of each job and then merge any new words discovered at the end of the job and in between you get lookups without any network penalty.

[google-appengine] Re: faster machine learning on appengine (nlp bayes)

2011-02-03 Thread sofia
@Nick I'm in Portugal so for now I don't have access to the prediction api. Maybe when there's worldwide access i'll switch but for now it's a no go - any estimates on when it will be opened to the rest of the world? :) @Shane not sure how this could be implemented.. there could be

[google-appengine] Re: faster machine learning on appengine (nlp bayes)

2011-02-02 Thread Calvin
If the key_name for each classified word was the actual word then you could do a batch get for all the words in an article. Is there a reason it can't be? Also it seems like the work of fetching the classifications could be split up Map/Reduce style. Chop an article into chunks that can be

[google-appengine] Re: faster machine learning on appengine (nlp bayes)

2011-02-02 Thread sofia
Not sure. Words are in portuguese so they have accented chars like graça and évora - could these be keys? The whole process is more or less like this: a cronjob fetches an rss, and creates a task for each article - inserting the article and marking it as unprocessed. Then there's another

[google-appengine] Re: faster machine learning on appengine (nlp bayes)

2011-02-02 Thread Calvin
I've never tried to use unicode as a key name. Could possibly use the base64 of the word as the key name? There are some great Google IO talks on ways to do map reduce/batch processing. This is the first one I found:

[google-appengine] Re: faster machine learning on appengine (nlp bayes)

2011-02-02 Thread sofia
I'll try both suggestions and see how it goes. Thanks for the input :) -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to

Re: [google-appengine] Re: faster machine learning on appengine (nlp bayes)

2011-02-02 Thread Nick Johnson (Google)
Hi Sofia, Another option would be to use the Google Prediction API, which will do all this for you! -Nick Johnson On Thu, Feb 3, 2011 at 1:33 PM, sofia sofiacard...@gmail.com wrote: I'll try both suggestions and see how it goes. Thanks for the input :) -- You received this message because

[google-appengine] Re: faster machine learning on appengine (nlp bayes)

2011-02-02 Thread Albert
@Calvin I've tried using unicode as key_names. It works fine with mine. @Nick Interesting On Feb 3, 11:53 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi Sofia, Another option would be to use the Google Prediction API, which will do all this for you! -Nick Johnson On Thu, Feb