New to Lucene - some questions about demo

2009-07-28 Thread Ohaya
Hi, I'm just starting to work with Lucene, and I guess that I learn best by working with code, so I've started with the demos in the Lucene distribution. I got the IndexFiles.java and IndexHTML.java working, and also the luceneweb.war is deployed to Tomcat. I used IndexFiles.java to index

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
t match. A > search for "FooFoo" would, assuming that your search terms are not > being lowercased. > > > > -- > Ian. > > > On Tue, Jul 28, 2009 at 1:56 PM, Ohaya wrote: > > Hi, > > > > I'm just starting to work with Lucene, and I gues

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
; >> searching on the fields other than the "contents" field (recall, I'm > >> pretty sure that all those other fields are in the index, via Luke)? > >> > >> Jim > >> > >> > >> > >> Ian Lea wrote: > &g

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
;>>> Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't > >>>> think that's the problem either :(... > >>>> > >>>> I looked at the SearchFiles.java code, and it looks like it's literally > >>>> u

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Matthew, Ok, thanks for the clarifications. When I have some quiet time, I'll try to re-do the tests I did earlier and post back if any questions. Thanks again, Jim Matthew Hall wrote: > Oh.. no. > > If you specifically include a fieldname: blah in your clause, you don't > need a Mult

How to index IP addresses?

2009-07-30 Thread ohaya
Hi, I am trying to index information in some proprietary-formatted files. In particular, these files contain some IP addresses in dotted notation, e.g., aa.bb.cc.dd. For my initial test, I have a Document implementation, and after I extract what I need into a String named "Info", I do: doc.

How to search "path"?

2009-07-30 Thread ohaya
Hi, I am working with a modified version of the demo IndexFiles. In that code, when it builds the index, it has: doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.NOT_ANALYZED)); In Luke, I can see all the file paths in the "path" field. I am also using the demo lucenewe

RE: How to index IP addresses?

2009-07-30 Thread ohaya
Hi, Oh. Ok, thanks! I'll give that a try. Jim "Armasu wrote: > Keyword: Field.Index.NOT_ANALYZED > > -Original Message- > From: oh...@cox.net [mailto:oh...@cox.net] > Sent: Thursday, July 30, 2009 4:36 PM > To: java-user@lucene.apache.org > Subject: How to index IP addresses?

Re: How to search "path"?

2009-07-30 Thread ohaya
Ian, I'll respond to this msg, re. searching "path". I made the change you suggested, to "Field.Index.ANALYZED", and that fixed the problem I was having with searching for components of the "path" field. Thanks! Jim Ian Lea wrote: > In contrast to your last question and reply, if you u

Re: How to index IP addresses?

2009-07-30 Thread ohaya
Hi Matthew and Narcis, I think that I found the (original) problem. It looks like the reason that I was getting all those other terms, which looked to me like the octets, weren't the octets :)... When I was doing the doc.add(), there were some other numbers (not IP addresses) in the String tha

Re: Term's frequency

2009-07-30 Thread ohaya
prashant ullegaddi wrote: > How to get the number of times a term occurs in the Lucene index? > > Regards, > Prashant. Hi, You didn't mention if you were looking for something programmatic or not, but there's a tool called "Luke", and when you start that up and point it to your index

Is there a list of "special" characters for standard analyzer?

2009-07-30 Thread ohaya
Hi, I was wonder if there is a list of special characters for the standard analyzer? What I mean by "special" is characters that the analyzer considers break characters. For example, if I have something like "foo=something", apparently the analyzer considers this as two terms, "foo" and "so

Re: Is there a list of "special" characters for standard analyzer?

2009-07-30 Thread ohaya
Phil Whelan wrote: > On Thu, Jul 30, 2009 at 7:12 PM, wrote: > > I was wonder if there is a list of special characters for the standard > > analyzer? > > > > What I mean by "special" is characters that the analyzer considers break > > characters. > > For example, if I have something like

Re: Is there a list of "special" characters for standard analyzer?

2009-07-31 Thread ohaya
Hi Ahmet, Thanks for the clarification and information! That was exactly what I was looking for. Jim AHMET ARSLAN wrote: > > > I guess that the obvious question is "Which characters are > > considered 'punctuation characters'?". > > Punctuation = ("_"|"-"|"/"|"."|",") > > > In part

Seeking guidance for updating indexes

2009-07-31 Thread ohaya
Hi, I still am new to Lucene, but I think I have an initial indexer app (based on the demo IndexFiles app) working, and also have a web app, based on the demo luceneweb web app working. I'm still busy tweaking both, but am starting to think ahead, about operational type issues, esp. updating

Re: Seeking guidance for updating indexes

2009-07-31 Thread ohaya
Hi, Phil and Ian, Thanks for the responses and confirmations about this. Assuming that our requirements (as I described earlier) don't change, it looks like this updating/inserting thing should be pretty easy :)! Later, and have a great weekend! Jim Phil Whelan wrote: > Hi Jim, >

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread ohaya
Hi, Sorry to jump in, but I've been following this thread with interest :)... Am I misunderstanding your original observation, that ThreadedIndexWriter produced smaller index? Did the ThreadedIndexWriter also finish faster (I'm assuming that it should)? If the index is smaller, and everyt

Re: ThreadedIndexWriter vs. IndexWriter

2009-07-31 Thread ohaya
Hi, I don't know the answer to your questions, but I'm guessing that the answer to #3 is probably because the answers to #1 and #2. Did you try to look at the indexes using Luke? That shows the top 50 terms when it starts, so it might be obvious what the differences are, and that might give

java.io.IOException when trying to list terms in index (IndexReader)

2009-08-01 Thread ohaya
Hi, I'm starting to work on an app to list all of the terms in the "path" field. I'm including the beginning of my code below. When I run this, pointing it to a directory named "index" containing the Lucene indexes, I am getting a java.io.IOException. Here's the output when I run: Index in d

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-01 Thread ohaya
Phil, Yes, that exception is not very helpful :)!! I'll try your suggestions and post back. Thanks, Jim Phil Whelan wrote: > Hi Jim, > > I cannot see anything obvious, but both open() and terms() throw > IOException's. You could try putting these in separate try..catch > blocks to see

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-01 Thread ohaya
Phil, I posted in haste. Actually, from the output that I posted, doesn't it it look like the .next() itself is throwing the exception? That is what has been puzzling me. It looks like it got through the open() and terms() with no problem, then it blew up when calling the next()? Jim

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-01 Thread ohaya
Hi, BTW, the next() method is an abstract method in the Javadocs. Does that mean that I'm suppose to have my own implementation? Jim oh...@cox.net wrote: > Phil, > > I posted in haste. Actually, from the output that I posted, doesn't it it > look like the .next() itself is throwing t

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-01 Thread ohaya
Hi, I changed the beginning of the try to: try { System.out.println("About to call .next()..."); boolean foo = termsEnumerator.next(); System.out.println("Finished calling first .next()");

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-01 Thread ohaya
Hi, I don't know what happened, but all of a sudden, it started working :(... Jim oh...@cox.net wrote: > Hi, > > I changed the beginning of the try to: > > try { > System.out.println("About to call .next()..."); > boolean foo = t

Weird discrepancy with term counts vs. terms (off by 1)

2009-08-02 Thread ohaya
Hi, I've noticed a kind of strange problem with term counts and actual terms. Some background: I wrote an app that creates an index, including a "path" field. I am now working on an app (code was in the previous thread) that, as part of what it does, needs to get a list of all of the "path"

Re: Weird discrepancy with term counts vs. terms (off by 1)

2009-08-02 Thread ohaya
Hi, BTW, my indexer app is basically the same as the demo IndexFiles.java. Here's part of the main: try { IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); System.out.println("Indexing to directory '" +INDEX_DIR+

Re: Weird discrepancy with term counts vs. terms (off by 1)

2009-08-02 Thread ohaya
Hi Phil, For problem with my app, it wasn't what you suggested (about the tokens, etc.). For some later things, my indexer creates both a "path" field that is analyzed (and thus tokenized, etc.) and another field, "fullpath", which is not analyzed (and thus, not tokenized). The problem with my

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-02 Thread ohaya
g2011 wrote: > > hi,as you the error messages you listed below,pls put the 'reader.close()' > block to the bottom of method. > i think,if you invoke it first,the infrastructure stream is closed ,so > exceptions is encountered. > > > ohaya wrote: > > &g

Re: java.io.IOException when trying to list terms in index (IndexReader)

2009-08-02 Thread ohaya
I posted, there was a close() in the > > finally? > > > > Or, are you saying that when an IndexReader is opened, that that somehow > > persists in the system, even past my Java app terminating? > > > > FYI, I'm doing this testing on Windows, under Eclipse... >

Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Hi, I have an app to initially create a Lucene index, and to populate it with documents. I'm now working on that app to insert new documents into that Lucene index. In general, this new app, which is based loosely on the demo apps (e.g., IndexFiles.java), is working, i.e., I can run it with a

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Hi Ian, Thanks for the quick response. I forgot to mention, but in our case, the "producers" is part of a commercial package, so we don't have a way to get them to change anything, so I think the 1st 3 suggestions are not feasible for us. I have considered something like the 4th suggestion (ch

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Ian, One question about the 4th alternative: I was wondering how you implemented the sleep() in Java, esp. in such a way as not to mess up any of the Lucene stuff (in case there's threading)? Right now, my indexer/inserter app doesn't explicitly do any threading stuff. Thanks, Jim oh..

Re: Slightly Off-topic: How to decide whether or not to add a document?

2009-08-04 Thread ohaya
Hi Ian, Ok, thanks for the additional info. I've implemented check for both file.lastModified and file.length(), and it seems to work in my dev environment (Windows), so I'll have to test on a "real" system. Thanks again, Jim Ian Lea wrote: > Jim > > > The sleep is simply > >

Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Hi, In my indexer app (based on the IndexFiles.java demo), I am adding the "path" field: doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.ANALYZED)); Per Luke, the full path (e.g., "c:\\.yyy") gets parsed, and one of the terms (again, per Luke) is "", i.e.,

Re: Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Phil, Both my indexer and the webapp are basically from the Lucene demos, the indexer starting with the IndexFiles.java demo code, so I think they're both using the StandardAnalyzer. What appears in Luke, when I select "path" is just the filename part, without the extension, i.e., the "" p

Re: Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Phil, I need to be more precise... The files that I have are at, say: C:\dir1\dir2\ so, for example, I have C:\dir1\dir2\file-1-1.dat C:\dir1\dir2\file-1-2.dat C:\dir1\dir2\file-1-3.dat C:\dir1\dir2\file-1-4.dat C:\dir1\dir2\file-1-5.dat After indexing, and, using Luke, I look at the "path" f

Re: Why does this search succeed with web app, but not Luke?

2009-08-06 Thread ohaya
Hi Phil, Well, kind of... but... Then, why, when I do the search in Luke, do I get the results I cited: ==> succeeds .yyy ==> fails (no results) I guess that I've been assuming that the search in Luke is "correct" and I've been using that to "test my understanding", but maybe that'

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread ohaya
Ian, I just re-confirmed that StandardAnalyzer is used in both my indexer app and in the query/search web app. The actual file paths look like: C:\lucene-devel\dat\.dat or C:\lucene-devel\data\testdir\\.dat For field "path", Luke shows: lucene data c devel dat

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread ohaya
Andrzej, Hah! I tried as you suggested using Luke, and I found at least part of my problem. Luke was defaulting to KeywordAnalyzer. I changed that to StandardAnalyzer, and did queries for: path:x and path:xx.dat For the first, the Rewritten was:

Re: Why does this search succeed with web app, but not Luke?

2009-08-07 Thread ohaya
Hi Matt, Good catch! As I just posted, I *just* noticed that (Luke use Keyword Analyzer) :)!!! Once I switched Luke to using Standard Analyzer, the Luke search results matched my web query results. Thanks! Jim Matthew Hall wrote: > Luke defaults to KeywordAnalyzer when you do a sea

StandardAnalyzer and Windows vs. Linux "path"

2009-08-07 Thread ohaya
Hi, I've been doing development of my indexer app, which uses StandardAnalyzer on a WIndows machine, and today, I deployed an initial onto a Redhat Linux (RHEL) machine. On my development machine, I have the files that are being indexed in something like: C:\lucene-devel\files\dir1\xxx.d

Possible to invoke same Lucene query on a String?

2009-08-20 Thread ohaya
Hi, This question is going to be a little complicated to explain, but let me try. I have implemented an indexer app based on the demo IndexFiles app, and a web app based on the luceneweb web app for the searching. In my case, the "Documents" that I'm indexing are a proprietary file type, and e

Re: Possible to invoke same Lucene query on a String?

2009-08-20 Thread ohaya
Hi, I guess, that, in short, what I'm really trying to find out is: If I construct a Lucene query, can I (somehow) use that to query a String object that I have, rather than querying against a Lucene index? Thanks, Jim oh...@cox.net wrote: > Hi, > > This question is going to be a littl

Re: Possible to invoke same Lucene query on a String?

2009-08-20 Thread ohaya
Paul Cowan wrote: > oh...@cox.net wrote: > > Document1 subdoc1 term1 term2 > > subdoc2 term1a term2a > > subdoc3 term1b term2b > > > > However, I've now been asked to implement the ability to query t

Re: Possible to invoke same Lucene query on a String?

2009-08-20 Thread ohaya
Paul Cowan wrote: > oh...@cox.net wrote: > > - I'd have to create a (very small) index, for each sub-document, where I > > do the Document.add() with just the (for example) two terms, then > > - Run a query against the 1-entry index, which > > - Would either give me a "yes" or "no" (for th

Are there any non-alpha/numeric character that StandardAnalyzer won't treat as break?

2009-08-21 Thread ohaya
Hi, This is a kind of followup to a thread a couple of weeks ago. In my indexer, I want to pre-pend a string to certain terms to make it easier to search. So for example, if I have a string "XXX", I want to add, say, "field1" to it, to get "field1XXX" before I index it. To make it easier to s