OK so I got curious over the weekend and decided to do some testing..
The short of it is don't bother with trees! Without being able to run
native code extensions that can create efficient nodes there is to
much overhead involved with creating a new python object for each
node.
I ran a test with
Sofia a radix tree builds a tree of all the prefixes common between
words and then attaches a leaf node with the new suffix when the word
doesn't fit in the existing structure. The upshot of this is as your
word count increases the storage cost per word decreases. As for
actual performance numbers
How about building a radix tree with your classification attached as
an attribute to each word and storing it as an object. You could load
to memory at the start of each job and then merge any new words
discovered at the end of the job and in between you get lookups
without any network penalty.
@Nick I'm in Portugal so for now I don't have access to the prediction api.
Maybe when there's worldwide access i'll switch but for now it's a no go -
any estimates on when it will be opened to the rest of the world? :)
@Shane not sure how this could be implemented.. there could be
If the key_name for each classified word was the actual word then you could
do a batch get for all the words in an article. Is there a reason it can't
be?
Also it seems like the work of fetching the classifications could be split
up Map/Reduce style. Chop an article into chunks that can be
Not sure. Words are in portuguese so they have accented chars like graça
and évora - could these be keys?
The whole process is more or less like this: a cronjob fetches an rss, and
creates a task for each article - inserting the article and marking it as
unprocessed. Then there's another
I've never tried to use unicode as a key name. Could possibly use the
base64 of the word as the key name?
There are some great Google IO talks on ways to do map reduce/batch
processing. This is the first one I found:
I'll try both suggestions and see how it goes. Thanks for the input :)
--
You received this message because you are subscribed to the Google Groups
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to
Hi Sofia,
Another option would be to use the Google Prediction API, which will do all
this for you!
-Nick Johnson
On Thu, Feb 3, 2011 at 1:33 PM, sofia sofiacard...@gmail.com wrote:
I'll try both suggestions and see how it goes. Thanks for the input :)
--
You received this message because
@Calvin I've tried using unicode as key_names. It works fine with
mine.
@Nick Interesting
On Feb 3, 11:53 am, Nick Johnson (Google) nick.john...@google.com
wrote:
Hi Sofia,
Another option would be to use the Google Prediction API, which will do all
this for you!
-Nick Johnson
On Thu, Feb
10 matches
Mail list logo