(side note: if you are going to try and obfuscate your field names when
sending explain output so we don't know you are using wikipedia data (not
that we care), please at least be consistent about it so the final
explanations actual make sense -- it will save everyone a lot of confusion
and help u
: I am getting a "Too Many Open Files" Exception. I've read the FAQ about
: lowering the merge factor (currently set to 25), issuing a ulimit -n
: , etc... but I am still getting the "Too Many Open Files"
: Exception (yes... I'm making sure I close all writer/searchers/reader
: and I only have one
Here's the explain output I currently get for "George Bush" "George W
Bush", "John Kerry" "John Denver" and "John Bush". (there are others in
between, but they follow very much the same pattern; an enormous score
for one of "John" or "Bush" and a very small score for the other being
better than
I am getting a "Too Many Open Files" Exception. I've read the FAQ about
lowering the merge factor (currently set to 25), issuing a ulimit -n
, etc... but I am still getting the "Too Many Open Files"
Exception (yes... I'm making sure I close all writer/searchers/reader
and I only have one open at a
: Question: how do I go about manipulating the search results? Is it possible
: to "intercept" the listing of HTML pages returned by the Lucene search
: function and modify the report it sends to the screen.
:
: Can this be as simple as adding a line to the Lucene Java code so that
: instead of r
: "Lucene Download" as a query. I want something that strongly references
: "Lucene" (in the title) and strongly references "Download" but "Download
: Lucene" or "Lucene Project Download" are better than some page that
: happens to contain the exact phrase.
:
: Other examples are "camera review" o
Hi Chris,
That did it! Thanks for the help. I should have read the javadocs for
Field.Index more closely!
Thanks to everyone else for their input too.
--
Joe Attardi
[EMAIL PROTECTED]
http://thinksincode.blogspot.com/
On 7/3/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
It sounds like your
I've solved the problem, thanks to tips from Mark Miller and Ard
Schrijvers, and am simply recording it so that someone else walking
through the archives might get some benefit.
A while ago I had been working on a case-sensitive version of Lucene,
where with a prefix symbol, it was possible to in
It sounds like your problem is that your id field is analyzed and as a
result contains more then one token per document ... both the
deleteDocument and updateDocument methods that take in a Term only remove
documents that have that exact Term in them.
You need to add your documents with the "id"
Update on my problem.. it looks like I am having the same problem with
deleteDocuments that I am with updateDocument. Not sure why it's not
working, though. I'm using the StandardAnalyzer, as I mentioned. Are there
any other things that I might want to check that would keep this from
working?
Tha
When you do an explain on these results, what are all the factors
that contribute to the score?
Could you increase the coord() factor in a custom Similarity
implementation, to give a bigger boost to documents that have more
matching terms? The point of coord is to give a little bump to tho
Sounds like a welcome addition! I don't know of any guidelines other
than the general community ones about behaving nicely, don't spam,
etc. :-)
On Jul 3, 2007, at 2:24 PM, Renaud Waldura wrote:
Regarding the Lucene Wiki, is there an editing policy or should I
feel free
to change stuff a
That's true, but it's not clear that I want phrase matches. Consider for
example:
"Lucene Download" as a query. I want something that strongly references
"Lucene" (in the title) and strongly references "Download" but "Download
Lucene" or "Lucene Project Download" are better than some page that
You're not using any type of phrase search. Try ->
( (title:"John Bush"^4.0) OR (body:"John Bush") ) AND ( (title:John^4.0
body:John) AND (title:Bush^4.0 body:Bush) )
or maybe
( (title:"John Bush"~4^4.0) OR (body:"John Bush"~4) ) AND (
(title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush
Hi Erick,
I'm guessing that your problem is what gets indexed. What analyzer
are you using when indexing? One that breaks words apart on, say,
periods?
I am using the StandardAnalyzer. When I do a test query using Luke, it
returns the object I'm looking for. The query I use is:
id:"com.mycomp
I'm guessing that your problem is what gets indexed. What analyzer
are you using when indexing? One that breaks words apart on, say,
periods?
The way to check this would be to get a copy of Luke and examine
your index (or part thereof). Google (lucene luke). It'll help
greatly.
What is your evid
Regarding the Lucene Wiki, is there an editing policy or should I feel free
to change stuff as I see fit? E.g. I've added a page LuceneCaveats, and now
I want to edit http://wiki.apache.org/lucene-java/ConceptsAndDefinitions and
add a "Core Classes" section, and refactor that page.
--Renaud
Try out: http://issues.apache.org/jira/browse/LUCENE-850
If this is useful to you, be sure to add a comment to the issue.
-Mike
On 3-Jul-07, at 10:51 AM, Tim Sturge wrote:
I'm following myself up here to ask if anyone has experience or
code with a BooleanQuery that weights the terms it encou
Hi everybody,
First-time poster here. I've got a search index that I am using to index
live Java objects. I create a Document with the appropriate fields and index
them. No problem. I am indexing objects of different types, so I have an
"id" field in each Document which consists of the object's c
Hi,
I was wondering if it's possible to get the token offset based of the
position in the original text.
My problem is I'm working on my own "Snippet Generator" and I'm giving a
token index (call it t) as input and need to make a snippet of the original
text. I want the Snippet to be some numbe
I'm following myself up here to ask if anyone has experience or code
with a BooleanQuery that weights the terms it encounters on a product
basis rather than a sum basis.
This would effectively compute the geometric mean of the term score
(rather than the arithmetic mean) and would give me more
>>and "n" searches to get the Documents,
???
Where does the "n" come in? searcher.doc(id) is not a search. It is a call to
IndexReader.document() to retrieve a specific document.
Try run it. It shouldn't be slow.
- Original Message
From: Alixandre Santana <[EMAIL PROTECTED]>
To: java-
Mark,
Thanks for the code.
Well..I´m doing the same thing you are:
Retrieve some Doc IDs and then use the code
- Document doc=searcher.doc(sd[i].doc) - to get the Document itself.
But in this case, we are doing a search to get the IDs, and "n"
searches to get the Documents, which is not a good
It looks that we may have different cases.
What I do I index my items prior to insert them into the database. When I do a
search I get the ids that have the best match and then lookup the items from
the database. So far worked just fine. I have 5000 rows of items and I think
will still work fi
>>I get the ids then I do look the items in the database using select item.*
>>from item where item.id in ( ids )
Hmm. That's likely to confuse the already confused :)
The ids referred to so far are Lucene internal document ids and are typically
only meaningful to Lucene during a single IndexRea
: I have done some profiling , and it seems the response is slow when there
: are long queries(more than 5-6 words per query).
: The way I have implemented is : I pass in the search query and lucene
: returns the total number of hits, along with ids . I then fetch objects
: for only those ids , as
"Patrick Kimber" <[EMAIL PROTECTED]> wrote:
> I have been running the test for over an hour without any problem.
> The index writer log file is getting rather large so I cannot leave
> the test running overnight. I will run the test again tomorrow
> morning and let you know how it goes.
Ahhh, th
I get the ids then I do look the items in the database using select item.* from
item where item.id in ( ids )
-- Original message --
From: "Lee Li Bin" <[EMAIL PROTECTED]>
> Hi,
>
> Thanks Mark!
>
> I do have the same question as Alixandre. How do I get the con
Hi Michael
I have been running the test for over an hour without any problem.
The index writer log file is getting rather large so I cannot leave
the test running overnight. I will run the test again tomorrow
morning and let you know how it goes.
Thanks again...
Patrick
On 03/07/07, Patrick K
Hi Michael
I am setting up the test with the "take2" jar and will let you know
the results as soon as I have them.
Thanks for your help
Patrick
On 03/07/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
OK I opened issue LUCENE-948, and attached a patch & new 2.2.0 JAR.
Please make sure you u
OK I opened issue LUCENE-948, and attached a patch & new 2.2.0 JAR.
Please make sure you use the "take2" versions (they have added
instrumentation to help us debug):
https://issues.apache.org/jira/browse/LUCENE-948
Patrick, could you please test the above "take2" JAR? Could you also call
Ind
Hi Michael
I am really pleased we have a potential fix. I will look out for the patch.
Thanks for your help.
Patrick
On 03/07/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
"Patrick Kimber" <[EMAIL PROTECTED]> wrote:
> I am using the NativeFSLockFactory. I was hoping this would have
>
"Patrick Kimber" <[EMAIL PROTECTED]> wrote:
> I am using the NativeFSLockFactory. I was hoping this would have
> stopped these errors.
I believe this is not a locking issue and NativeFSLockFactory should
be working correctly over NFS.
> Here is the whole of the stack trace:
>
> Caused by: java
I think you should get " NFS, Lock obtain timed out" Exception (that you
mentioned in subject line) , instead of "java.io.FileNotFoundException:".
Because if one server is holding lock on the directory then other server
will wait till default LockTime Out and will throw Time out Exception
aft
Hi
I am using the NativeFSLockFactory. I was hoping this would have
stopped these errors.
Patrick
On 03/07/07, Neeraj Gupta <[EMAIL PROTECTED]> wrote:
Hi
this is the case where index create by one server is updated by other
server, results into index corruption. This exception occuring while
Hi
this is the case where index create by one server is updated by other
server, results into index corruption. This exception occuring while
creating instance of Index writer because at the time of index writer
instance creation it checks if index exists or not, if you are not
creating a new
Hi
I have added more logging to my test application. I have two servers
writing to a shared Lucene index on an NFS partition...
Here is the logging from one server...
[10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer
[10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete
has any one used Lucene-794? how stable it it. is it widely used in
industry.
I have used it extensively and I would say it is extremely stable. As I
said, much of the code from it is literally the same compiled code from
Contrib Highlighter (It is really just a new Scorer class for the
38 matches
Mail list logo