On Tue, Jan 27, 2009 at 4:47 AM, Chris Hostetter
<hossman_luc...@fucit.org> wrote:
>
> : I know I was able to imitate that in plain-lucene by crafting a particular
> : analyzer-filter who was only given the URL as content and who gave further 
> the
> : tokens of the stream.
>
> FWIW: while taking advantage of DIH and some of it's plugin APIs to deal
> with this is probaly a better way to -- anything you could do in a
> TokenFilter with a homegrown Lucene app can also be done in a TokenFilter
> in Solr -- all you need is a simple TokenFilterFactory to initialize your
> TokenFilter.
>
> From a purist standpoint: the decision about where to hook in a feature
> like this depends on the mental model you have of your index vs the
> differnet ways you can get data into your index.  if every document should
> have an "extendedText" field, and docs you post via xml or csv will have
> thta field verbatim, but documents you index using DIH will get it by
> fetching a URL, then a DIH plugin is the way to go -- if you want every
> client sending you docs to provide a URL and you *always* fetch that URL
> to get the content, then a TokenFilter is hte way to go.

Hoss , makes sense.

But we do not have an inbuilt TokenFilter which does that. Nor does
DIH support it now . I have opened an issue for DIH
(https://issues.apache.org/jira/browse/SOLR-980)
Is it desirable to have  TokenFilter which offers similar functionality?


>
>
>
>
> -Hoss
>
>



-- 
--Noble Paul

Reply via email to