RE: Beginner: Specific indexing

2008-09-09 Thread Steven A Rowe
Hi Raymond, Check out SinkTokenizer/TeeTokenFilter: Look at the unit tests for usage hints:

Re: Beginner: Specific indexing

2008-09-09 Thread Raymond Balmès
Well that is well explained in "Lucene in Action" if you want to search files you have to build a file parser and there is a good example given. So not really my problem. But I thought I could go thru the token stream only once, where I have to go twice 1. for detecting my triplets , 2. for indexi

Re: Beginner: Specific indexing

2008-09-08 Thread Chris Hostetter
: I think I'm getting you. But the files I'm going to parse have many formats : : PDF, HTML, Word. : they don't have a particular structure, memos if you will. But the ones I'm : interested in will have the triplets I described A... see this is something i completley didn't realize. "Lucen

Re: Beginner: Specific indexing

2008-09-05 Thread Raymond Balmès
I think I'm getting you. But the files I'm going to parse have many formats : PDF, HTML, Word. they don't have a particular structure, memos if you will. But the ones I'm interested in will have the triplets I described Yes building a TokenFilter as you suggest should do the job. I guess my initi

Re: Beginner: Specific indexing

2008-09-05 Thread Chris Hostetter
: Interesting if you are not going to use an analyser... what then ? I'm : thinking of using javacc, because I oversimplified somewhat the 3 field : string structure, so I need a kind of small grammar for that. Well, the specifics of "what else" is in your files is going to be the biggest factor

Re: Beginner: Specific indexing

2008-09-05 Thread Raymond Balmès
I understand your point, I did not say it was a Lucene problem but was rather checking if I my intended design was correct... basically not. Since I thought that I would first break my stream in token to do my special filter, I thought I could do it in one step... Interesting if you are not going

Re: Beginner: Specific indexing

2008-09-04 Thread Chris Hostetter
Honestly: your problem doesn't sound like a Lucene problem to me at all ... i would write custom code to cehck your files for the pattern you are looking for. if you find it *then* construct a Document object, and add your 3 fields. I probably wouldn't even use an analyzer. -Hoss

Re: Beginner: Specific indexing

2008-09-02 Thread Raymond Balmès
OK, not clear enough. I have documents in which I'm looking for 3 consecutive elements : <#1> <#2> (string1 is a predefined list) I want to disregard those without this sequence and reverse index those with these markers... it looks to me that parsing won't do the job since my documents are unst

Re: Beginner: Specific indexing

2008-09-02 Thread Chris Hostetter
I may be missunderstanding your question, but i wouldn't attempt to tackle this with a TokenFilter unless you want both the "tag" and the numbers to appear in the same field. i think what you want to do is first parse whatever file format you are dealing with, then build Documents based on the

Beginner: Specific indexing

2008-08-30 Thread Raymond Balmès
Hi guys, Fairly new to Lucene, and just finished reading Lucene in Action. My problem is the following I need to index the documents that only contains the following pattern(s) in a mass of documents: <#1> <#2> is a fixed list of words <#x> are small numbers <100 My idea is to simply build a

Beginner: Specific indexing

2008-08-30 Thread Raymond Balmès
Hi guys, Fairly new to Lucene, and just finished reading Lucene in Action. My problem is the following I need to index the documents that only contains the following pattern(s) in a mass of documents: <#1> <#2> is a fixed list of words <#x> are small numbers <100 My idea is to simply build a