: In-Reply-To: <[EMAIL PROTECTED]>
: Subject: Changing the Score of a Document.
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if y
: Is it possible to integrate Nutch into MS Search Server via OpenSearch API?
you'll probably find someone who can answer this question on the
[EMAIL PROTECTED] mailing list.
-Hoss
-
To unsubscribe, e-mail: [EMAIL PROTECTE
We did this in our system, indexing a constant flow of news articles, by doing
as Otis described (reopened the indexsearcher)..
Every 3:d minute we are creating a new indexsearcher in the background after
this searcher has been created we are fireing some warm up queries against it
and after t
Try searching this list for Arabic Stemmer. I seem to recall one
under a GPL license. Also try Googling "arabic Lucene analyzer"
-Grant
On Jan 16, 2008, at 1:21 PM, Liaqat Ali wrote:
Hi
Kindly tell me about some open source Arabic Stemmer which can be
used with
Lucene.
Regards,
Liaqat
A non-clustered and clustered index has resovle the problem, but Lucene can
not do the same thing like that?
On Jan 16, 2008 11:44 PM, <[EMAIL PROTECTED]> wrote:
> > I can use the cluster index on the table. But you can create only one
> > cluster index in a table. In this table , lots of data ne
The index contains about a several ten thousand documents, with a field
count of about fifty. The index is going to be rebuild approx. every
day, but varies, since the searchable content doesn't change very often.
Now I face the challenge to work in more dynamic data into the index,
and even ma
Couple ideas I guess...
Rather than use queries (being so much more difficult) just make an index
that contains documents that are just a list of keywords (representing a
profilenet 'query'). Use the MoreLikeThis class from contrib to search that
index using your source document. The hits you get
On Jan 16, 2008 2:13 PM, Alexei Dets <[EMAIL PROTECTED]> wrote:
> Hi!
> Yonik Seeley wrote:
> > On Jan 15, 2008 7:15 PM, Alexei Dets <[EMAIL PROTECTED]> wrote:
> >> I'm curious, is there any particular reason why Lucene offers
> >> IndexReader.deleteDocument(int docNum) but not
> >> IndexWriter.del
Hi!
Yonik Seeley wrote:
> On Jan 15, 2008 7:15 PM, Alexei Dets <[EMAIL PROTECTED]> wrote:
>> I'm curious, is there any particular reason why Lucene offers
>> IndexReader.deleteDocument(int docNum) but not
>> IndexWriter.deleteDocument(int docNum)?
>
> Document ids are transient and can change.
I
The norms are modded so each norm value is stored as 4 byte instead of 1 byte,
this modification is using more memory. But anyway the hw we are running on are
2x 8 cpu hp servers with 16 gig ram in each of them.
We are scaling the index on daterange (and the ranking is modified to sort by
date)
Hi
Kindly tell me about some open source Arabic Stemmer which can be used with
Lucene.
Regards,
Liaqat Ali
Don't have any info to add, but out of curiosity, what kind of setup are you
using to host the 300 mil archive? Is the index distributed? Single machine?
Solr?
Thanks,
Mark
On Jan 16, 2008 12:27 PM, Marcus Falk <[EMAIL PROTECTED]> wrote:
> Hi again,
>
>
>
> Today we are hosting a 300 million la
Interesting question. Does zero-padding make primary key lookups faster or
slower in lucene?
From my tests it would seem that non-padded keys are quicker to lookup than
zero-padded ones (tested doing random access on indexes of varying sizes up to
5m unique keys).
However I imagine there could
Hi again,
Today we are hosting a 300 million large search index without any
problems in a lucene environment, with just some customization in the
lucene api for ranking etc...
So we are really satisfied with lucene.
We also have the demands to search with documents on profiles we are
David Vazquez Landa wrote:
Uhmm... A simple question:
I have a lucene index (the directory with the segment* files) in HDFS.
This index is created by Nutch (who acesses files in HDFS seamlessly). My
question is if there is a way of reading this Lucene Index without having
to copy it to the local
Uhmm... A simple question:
I have a lucene index (the directory with the segment* files) in HDFS.
This index is created by Nutch (who acesses files in HDFS seamlessly). My
question is if there is a way of reading this Lucene Index without having
to copy it to the local filesystem first...
Thanks
> I can use the cluster index on the table. But you can create only one
> cluster index in a table. In this table , lots of data need
> to search, so I
> choose the Lucene to do that.
Why do you need a clustered index in the database?
A non-clustered would do the job as well.
--
Hello,
When storing fields to serve as id's - is it better to use
NumberTools.longToString(id) or just store the id as a field?
I have noticed when using NumberTools to store number as a string, this
makes range queries easier, however - you end up storing a long string.
Considering millions of id
I can use the cluster index on the table. But you can create only one
cluster index in a table. In this table , lots of data need to search, so I
choose the Lucene to do that.
On Jan 16, 2008 6:57 PM, <[EMAIL PROTECTED]> wrote:
> > firstly, I submit the query like "select * from [tablename]".
>
As I read your latest post, it's not *searching* that's taking too long, but
*indexing*.
Well, 100,000,000 rows is a lot. It'll never be just a few minutes. But I
also
have to ask whether the most time is being spent actually indexing or
fetching from the database? You could time this easily by ju
As I remember from various threads, toString is more of
a debugging aid and you cannot completely rely on the
transformation from a parsed query -> tostring -> parsed query
to be reliable. But this is "something I remember", so take
it with a grain of salt (you might want to search the mail
archive
Hi,
looking into the code of IndexMergeTool I saw this:
IndexWriter writer = new IndexWriter(mergedIndex, new SimpleAnalyzer(),
true);
Then the indexes are added to this new index.
My question is:
How does the Analyzer of this IndexWriter instance effect the merge process?
It seems that is do
> firstly, I submit the query like "select * from [tablename]".
> And in this
> table, there are around 30th columns and 40,000 rows data.
> And I use the
> standrandAnalyzer to generate the index.
Why don't you use a database index?
-
Hi
I am sure that this topic has been once discusses on this forum, so, sorry
to ask again!
Let's suppose a Document d1 containing the five terms:
a
b
C
D
and a query (a AND b).
The document d1 is relevant and will be retrieved and typically, its score
will be a function tf*idf relative to the
Hi ,
I want to construct a query from string. how can I do it?? Actually i
saved a query(a boolean query) as string (using query.toString()).
Is there a way to reconstruct the query from the string i saved? How can i
add more clauses to the reconstructed query?
Thanks in advance.
Prabin
firstly, I submit the query like "select * from [tablename]". And in this
table, there are around 30th columns and 40,000 rows data. And I use the
standrandAnalyzer to generate the index. And as my experience, it cost 200M
disk to store the index.
for example, I will search the "Name" field in t
Hello,
Not exactly, a document represents an edge, having src and dst node its.
Nodes can be kept in another index or the same one. I can find number of
edges by running a boolean term query.
Currently I am looking for a way to distribute indexes, but in such a way
that when querying you know whic
15 jan 2008 kl. 20.31 skrev Michael Prichard:
When I run through and delete a few documents from my index, is it
wise to call .flush() afterwards? Or is it better to close the index?
Close means flush, but also releasing the write lock. What to
usereally depends on how your service is im
28 matches
Mail list logo