Reformatted excerpts from Marcus Williams's message of 2008-02-28:
> The only thing I'm a little wary of is the join() I do of the
> attachment filenames for the index (like labels). This means that
> ferret doesnt actually know the difference between two files called
> file1 and file2 and a single file called "file1 file2". Not sure it
> matters that much for this usage though.

The answer here is to escape the spaces and to use a Ferret custom
analyzer for this field in the index, one that will split only on
non-escaped spaces.

Something like this (needs testing):

  irb(main):055:0> a = Ferret::Analysis::RegExpAnalyzer.new 
/([^\s\\]|(\\\s))+/, false
  => #<Ferret::Analysis::RegExpAnalyzer:0xb79740fc>
  irb(main):056:0> t = a.token_stream :potato, "one\\ two three\\ four"=> 
#<Ferret::Analysis::TokenStream:0xb79705d8>
  irb(main):057:0> t.next
  => token["one\ two":0:8:1]
  irb(main):058:0> t.next
  => token["three\ four":9:20:1]

Then assign that analyzer to the :attachments field in index.rb circa
line 37, just like I do for :subject and :body.

You'll have to make sure to do the escaping properly both on user input
at query time, and at storage time to the index.

> Also I dont repopulate the attachments attribute on the message object
> and I couldnt figure out quite how you do it for labels (through the
> initialise?). 

Not quite sure what you mean here, but the answer might be: index.rb
line 371 is where we build a Message object from an index entry, and
you'll need to pass in an :attachments attribute (and handle it within
Message#initialize).

-- 
William <[EMAIL PROTECTED]>
_______________________________________________
sup-talk mailing list
sup-talk@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-talk

Reply via email to