Re: [sup-talk] Amazon.com messages can't be added to index

William Morgan Sun, 24 Feb 2008 21:11:23 -0800

Reformatted excerpts from Luis Villa's message of 2008-02-22:
> /usr/lib/ruby/gems/1.8/gems/sup-0.4/lib/sup/index.rb:200:in `sync_message': 
> just added message 
> "!~!UENERkVCMDkAAQACAPYAAAAAAAAAOKG7EAXlEBqhuwgAKypWwgAAbXNwc3QuZGxsAAAAAABOSVRB+b+4AQCqADfZbgAAAABDADoAXABEAG8AYwB1AG0AZQBuAHQAcwAgAGEAbgBkACAAUwBlAHQAdABpAG4AZwBzAFwAawBiAGUAbgB0AG8AbgBcAEwAbwBjAGEAbAAgAFMAZQB0AHQAaQBuAGcAcwBcAEEAcABwAGwAaQBjAGEAdABpAG8AbgAgAEQAYQB0AGEAXABNAGkAYwByAG8AcwBvAGYAdABcAE8AdQB0AGwAbwBvAGsAXABPAHUAdABsAG8AbwBrAC4AcABzAHQAAAAYAAAAAAAAALLH/vR9UMVCgMck3LV+0wHCgAAAGAAAAAAAAACyx/70fVDFQoDHJNy1ftMBhLcgAAAAAAAQAAAAqTDfdQ6dIEawbQUxhNxqVz4AAABSRTogQnVnemlsbGE6IEhhcyBhbnlvbmUgc3VjY2Vzc2Z1bGx5IGNyZWF0ZWQgU3ViLUNvbXBvbmVudHM/[EMAIL
>  PROTECTED]" but couldn't find it in a search (RuntimeError)


Sigh. Why would anyone generate a message id like that?

There were two problems causing your error. I've fixed them both in git
next. You can probably apply the attached patches to your 0.4 release if
you don't want to use git just yet.

The first problem was that marking the message_id field as non-tokenized
in Ferret just solves all sorts of tokenization problems. So that's in.

The second problem is a Ferret bug, where apparently TermQuery values of
more than 255 characters never match anything. The current workaround
just lops off anything after the 255th character. And that may very well
screw things up if falsely uniquefies things.

The right long-term answer is probably to take the hex SHA1 of every
message id and just use that instead of the original value. Then all of
these issues will be solved. That will require an index rebuild for
everyone, so I'm going to hold off on that for now.

-- 
William <[EMAIL PROTECTED]>

0001-don-t-tokenize-message_id-field-in-index.patch
Description: Binary data

0002-only-use-the-first-255-characters-of-a-message-id-f.patch
Description: Binary data

_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk

Re: [sup-talk] Amazon.com messages can't be added to index

Reply via email to