[Trac-dev] Re: Woosh for search?

Chris Mulligan Thu, 05 Mar 2009 08:10:35 -0800

So I've been playing with Whoosh a little this morning. Not really
integrating it, just pulling out ticket and wiki information with my little
http://trac-hacks.org/wiki/TracMergeScript Trac object, which makes it
trivial to do things like iterate over all tickets. There's definite
potential to this approach.


Repository search would be awesome, but we should take care not to tie it
too closely to svn. We're running a multirepos mercurial forest, which would
require a very different implementation than a single SVN repository. I
definitely used and really liked RepoSearch though.

whooshtest.py:
import os, os.path
import sys
from whoosh.fields import *
from whoosh import index
from whoosh.qparser import MultifieldParser
sys.path.append('tracmerge')
from ptrac import Trac

schema = Schema(id=ID(stored=True, unique=True), type=ID,
keywords=KEYWORD(scorable=True),
        component=KEYWORD, milestone=TEXT, summary=TEXT(stored=True),
content=TEXT, changes=TEXT)

#If we don't have an index directory create one and index
try:
        ix = index.open_dir('index')
except:
        if not os.path.exists('index'):
                os.mkdir('index')
        ix = index.create_in('index', schema=schema)
        writer = ix.writer()

        t = Trac('../dev/')

        for tid in t.listTickets():
                print tid
                ticket = t.getTicket(tid)
                chgs = []
                for chg in ticket['ticket_change']:
                        if chg['field'] == 'comment':
                                chgs.append(chg['newvalue'])
                writer.add_document(id=unicode(tid), type=u'ticket',
summary=ticket['summary'], content=ticket['description'],
                        keywords=ticket['keywords'],
component=ticket['component'], milestone=ticket['milestone'],
changes='\n\n'.join(chgs))

        for pageName in t.listWikiPages():
                print pageName
                pageDetails = t.getWikiPageCurrent(pageName)
                writer.add_document(id=pageName, type=u'wiki',
summary=pageDetails['comment'], content=pageDetails['text'])
        writer.commit()

#search
searcher = ix.searcher()
parser = MultifieldParser(["content", 'keywords', 'component', 'milestone',
'summary', 'changes'], schema = ix.schema)

input = sys.argv[1]
query = parser.parse(input)
results = searcher.search(query)
print results
for res in results:
        print res






On Thu, Mar 5, 2009 at 9:41 AM, Jeff Hammel <[email protected]> wrote:

>
> On Thu, Mar 05, 2009 at 11:08:33AM +0100, Christian Boos wrote:
> >
> > W. Martin Borgert wrote:
> > > On 2009-03-04 18:06, Chris Mulligan wrote:
> > >
> > >> This is motivated entirely by a local need. As our primary internal
> tracs
> > >> grow (thousands of tickets and wiki pages) it's becoming harder and
> harder
> > >> for users to find already existing content. They end up making lots of
> > >> dupes, making the problem even worse the next time.
> > >>
> > >
> > > Yes, the trac search facilities are good, but sometimes not good
> > > enough. Sometimes one likes to search "the whole thing", e.g.
> > > including PDFs in the SVN trunk etc. I'm not sure, if whoosh
> > > addresses this problem.
> > >
> >
> > Searching content in the repository is addressed by the RepoSearch
> > plugin on trac-hacks, if I'm right.
> >
> >   http://trac-hacks.org/wiki/RepoSearchPlugin
> >
> > Looking for content inside non-text file like a .pdf would require an
> > additional extraction/analyze step.
>
> Perhaps an infrastructure could be built such that filters could be applied
> to mimetypes and the search is performed on the results of that filter.  For
> example, you could apply pdftotext as a filter for pdfs and search the
> resulting text or you or antiword to (horrible) .doc files.
>
> > Also, I don't know if the plugin allows for searching the path names,
> > useful for locating some source file you have no idea in which
> > subproject or branch it is ;-)
> >
> > -- Christian
> >
> >
> > >
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/trac-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Trac-dev] Re: Woosh for search?

Reply via email to