Thanks Paul, this is what I was looking for :)
-Anshul Johri
Noble Paul നോബിള് नोब्ळ् wrote:
>
> Did you take a look at DataImportHandler?
>
> On Wed, Jul 23, 2008 at 1:57 AM, Ravish Bhagdev
> <[EMAIL PROTECTED]> wrote:
>> Can't you write triggers for your database/tables you want to index?
Did you take a look at DataImportHandler?
On Wed, Jul 23, 2008 at 1:57 AM, Ravish Bhagdev
<[EMAIL PROTECTED]> wrote:
> Can't you write triggers for your database/tables you want to index?
> That way you can keep track of all kinds of changes and updates and
> not just addition of a new record.
>
>
On 22-Jul-08, at 4:34 PM, Chris Hostetter wrote:
Hey everybody, I'll be giving a talk called "Apache Solr: Beyond the
Box" at ApacheCon this year, which will focus on the how/when/why of
writing Solr Plugins...
http://us.apachecon.com/c/acus2008/sessions/10
I've got several use
Hey everybody, I'll be giving a talk called "Apache Solr: Beyond the Box"
at ApacheCon this year, which will focus on the how/when/why of
writing Solr Plugins...
http://us.apachecon.com/c/acus2008/sessions/10
I've got several use cases I can refer to for examples, both from my day
j
How about releasing the preliminary results so we can see if a run-off
is in order!
On Tue, Jul 22, 2008 at 6:37 AM, Mark Miller <[EMAIL PROTECTED]> wrote:
> My opinion: if its already a runaway, we might as well not prolong things.
> If not though, we should probably give some time for any possib
Yes, it is a cache, it stores "sorted" by "sorted field" array of
Document IDs together with sorted fields; query results can intersect
with it and reorder accordingly.
But memory requirements should be well documented.
It uses internally WeakHashMap which is not good(!!!) - a lot of
"unde
I am hoping [new StringIndex (retArray, mterms)] is called only once
per-sort-field and cached somewhere at Lucene;
theoretically you need multiply number of documents on size of field
(supposing that field contains unique text); you need not tokenize
this field; you need not store TermVect
I haven't seen the source code before, But I don't know why the sorting isn't
done after the fetch is done. Wouldn't that make it more faster. at least in
case of field level sorting? I could be wrong too and the implementation might
probably be better. But don't know why all of the fields have
Ok, after some analysis of FieldCacheImpl:
- it is supposed that (sorted) Enumeration of "terms" is less than
total number of documents
(so that SOLR uses specific field type for sorted searches:
solr.StrField with omitNorms="true")
It creates int[reader.maxDoc()] array, checks (sorted) En
Ok, what is confusing me is implicit guess that FieldCache contains
"field" and Lucene uses in-memory sort instead of using file-system
"index"...
Array syze: 100Mb (25M x 4 bytes), and it is just pointers (4-byte
integers) to documents in index.
org.apache.lucene.search.FieldCacheI
Thanks for your help Mark. Lemme explore a little more and see if some one else
can help me out too. :)
> Date: Tue, 22 Jul 2008 16:53:47 -0400> From: [EMAIL PROTECTED]> To:
> solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr sorting> >
> Someone else is going to have to take over
Someone else is going to have to take over Sundar - I am new to solr
myself. I will say this though - 25 million docs is pushing the limits
of a single machine - especially with only 2 gig of RAM, especially with
any sort fields. You are at the edge I believe.
But perhaps you can get by. Have
Hi Mark,
I am still getting an OOM even after increasing the heap to 1024.
The docset I have is
numDocs : 1138976 maxDoc : 1180554
Not sure how much more I would need. Is there any other way out of this. I
noticed another interesting behavior. I have a Solr setup on a personal B
Hmmm...I think its 32bits an integer with an index entry for each doc, so
**25 000 000 x 32 bits = 95.3674316 megabytes**
Then you have the string array that contains each unique term from your
index...you can guess that based on the number of terms in your index
and an avg length guess.
Thank you very much Mark,
it explains me a lot;
I am guessing: for 1,000,000 documents with a [string] field of
average size 1024 bytes I need 1Gb for single IndexSearcher instance;
field-level cache it is used internally by Lucene (can Lucene manage
size if it?); we can't have 1G of such
Mark,
Question: how much memory I need for 25,000,000 docs if I do a sort by
field, 256 bytes. 6.4Gb?
Quoting Mark Miller <[EMAIL PROTECTED]>:
Because to sort efficiently, Solr loads the term to sort on for each
doc in the index into an array. For ints,longs, etc its just an array
the siz
Can't you write triggers for your database/tables you want to index?
That way you can keep track of all kinds of changes and updates and
not just addition of a new record.
Ravish
On Tue, Jul 22, 2008 at 8:15 PM, anshuljohri <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> In my project i have to index whol
Fuad Efendi wrote:
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
I just noticed, this is an exact number of documents in index: 25191979
(http://www.tokenizer.org/, you can sort - click headers Id, [COuntry,
Site, Price] in a tab
Sorry, Not 30, but 300 :)
From: [EMAIL PROTECTED]: [EMAIL PROTECTED]: RE: Out of memory on Solr
sortingDate: Tue, 22 Jul 2008 20:19:49 +
Thanks for the explanation mark. The reason I had it as 512 max was cos earlier
the data file was just about 30 megs and it increased to this much for of
Thanks for the explanation mark. The reason I had it as 512 max was cos earlier
the data file was just about 30 megs and it increased to this much for of the
usage of EdgeNGramFactoryFilter for 2 fields. Thats great to know it just
happens for the first search. But this exception has been occur
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
I just noticed, this is an exact number of documents in index: 25191979
(http://www.tokenizer.org/, you can sort - click headers Id, [COuntry,
Site, Price] in a table; experimental)
I've even seen exceptions (posted here) when "sort"-type queries
caused Lucene to allocate 100Mb arrays, here is what happened to me:
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object
size: 100767936, Num elements: 25191979
at
org.apache.lucene.search.FieldCacheImpl$1
Because to sort efficiently, Solr loads the term to sort on for each doc
in the index into an array. For ints,longs, etc its just an array the
size of the number of docs in your index (i believe deleted or not). For
a String its an array to hold each unique string and an array of ints
indexing
Thanks Fuad.
But why does just sorting provide an OOM. I executed the
query without adding the sort clause it executed perfectly. In fact I even
tried remove the maxrows=10 and executed. it came out fine. Queries with bigger
results seems to come out fine too. But why just sort
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)
- this piece of code do not request Array[100M] (as I seen with
Lucene), it asks only few bytes / Kb for a field...
Probably 128 - 512 is not enough; it is also advisable to use equal sizes
-Xms1024M -Xmx1024M
(i
Doh! I mistakenly changed the request handler from dismax to standard.
Ignore me...
Jason
On Tue, Jul 22, 2008 at 2:59 PM, Jason Rennie <[EMAIL PROTECTED]> wrote:
> I'm using solrj and all I did was add a pf entry to solrconfig.xml. I
> don't think it could be an ampersand issue...
>
> Here's
Hi,
In my project i have to index whole database which contains text data only.
So if i follow incremental indexing approch than my problem is that how will
I pick delta data from database. Is there any utility in solr to keep track
the last indexed record. Or is there any other approch to solve
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Out of memory on Solr sorting
> Date: Tue, 22 Jul 2008 19:11:02 +
>
>
> Hi,
> Sorry again fellos. I am not sure whats happening. The day with solr is bad
> for me I guess. EZMLM didnt let me send any mails this morning
Hi,
SOrry again fellos. I am not sure whats happening. The day with solr is bad for
me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm
subscription and when I did, it said I was already a member. Now my mails are
all coming out bad. Sorry for troubling y'all this ba
I'm using solrj and all I did was add a pf entry to solrconfig.xml. I don't
think it could be an ampersand issue...
Here's an example query:
wt=xml&rows=10&start=0&q=urban+outfitters&qt=recsKeyword&version=2.2
Here's qt config:
0.06
name^1.5 tags description^0
Sorry for that. I didnt realise how my had finally arrived. Sorry!!!
From: [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Subject: OOM on Solr Sort
Date: Tue, 22 Jul 2008 18:33:43 +
Hi,
We are developing a product in a agile manner and the current
implementation has a data of size ju
On 22-Jul-08, at 11:53 AM, Jason Rennie wrote:
Just tried adding a pf field to my request handler. When I did
this, solr
returned all document fields for each doc (no "score") instead of
returning
the fields specified in fl. Bug? Feature? Anyone know what the
reason for
this behavior
Just tried adding a pf field to my request handler. When I did this, solr
returned all document fields for each doc (no "score") instead of returning
the fields specified in fl. Bug? Feature? Anyone know what the reason for
this behavior is? I'm using solr 1.2.
Thanks,
Jason
Hi,We are developing a product in a agile manner and the current
implementation has a data of size just about a 800 megs in dev. The memory
allocated to solr on dev (Dual core Linux box) is 128-512. My config=
trueMy Field===
Chris Hostetter wrote:
: http://people.apache.org/~shalin/poll.html
Except the existing Solr logo isn't on that list.
i smell election tampering :)
I had put it in my poll :) I actually considered bringing that up to
Shalin as well, but couldn't bring myself to be so fair I suppose
Serious
: http://people.apache.org/~shalin/poll.html
Except the existing Solr logo isn't on that list.
i smell election tampering :)
Seriously though: I realized a long time ago that there was too much email
to reply too, too many features to work on, too many patches to review,
and too few hours in
I'm somewhat perplexed, under what circumstances would you be able to
send one query to Solr but not two?
-Mike
On 21-Jul-08, at 8:37 PM, Jon Baer wrote:
Well that's my problem ... I can't :-)
When you put a fq=doctype:news in there your can't get an explicit
facet.query, it will only let
lookups : how many times the cache is referenced
hits : how many times the cache hits
hitratio : hits/lookups
and for other items, see my previous mail at:
http://www.nabble.com/about-cache-to10192953.html
Koji
Marshall Gunter wrote:
Can someone point me to an in depth explanation of the Solr c
Indeed - one of my shards had it listed as "text" doh!
thanks for the assurance that led me to find my bug
On Tue, Jul 22, 2008 at 11:43 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On Tue, Jul 22, 2008 at 11:39 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
>> > omitNorms="true"/>
>
> This will giv
Lucene index corrupted... which harddrive do you use?
Quoting Rohan <[EMAIL PROTECTED]>:
Hi Guys,
This is my first post. We are running solr with multiple Indexes, 20
Indexes. I'm facing problem with 5 one. I'm not able to run optimized on
that index. I'm getting following error. Your help is
Lucene has a maxFieldLength (the number of tokens to index for a given
field name).
It can be configured via solrconfig.xml:
1
-Yonik
On Tue, Jul 22, 2008 at 11:38 AM, Tom Lord <[EMAIL PROTECTED]> wrote:
> Hi, we've looked for info about this issue online and in the code and am
> none the wis
On Tue, Jul 22, 2008 at 11:39 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
> omitNorms="true"/>
This will give you an exact match. As I said, if it's not, then you
didn't restart and reindex, or you are querying the wrong field.
-Yonik
At the moment for "string", I have:
is there an example type so that it will do exact matches?
Would "alphaOnlySort" do the trick? It looks like it might.
On Tue, Jul 22, 2008 at 11:20 AM, Yonik Se
Hi, we've looked for info about this issue online and in the code and am
none the wiser - help would be much appreciated.
We are indexing the full text of journals using Solr. We currently pass
in the journal text, up to maybe 130 pages, and index it in one go.
We are seeing Solr stop indexing af
Hi Guys,
This is my first post. We are running solr with multiple Indexes, 20
Indexes. I'm facing problem with 5 one. I'm not able to run optimized on
that index. I'm getting following error. Your help is really appreciated.
java.io.IOException: read past EOF
at
org.apache.lucene.store.B
On Tue, Jul 22, 2008 at 11:08 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
> How can I require an exact field match in a query. For instance, if a
> title field contains "Nature" or "Nature Cell Biology", when I search
> title:Nature I only want "Nature" and not "Nature Cell Biology". Is
> that someth
On Tue, Jul 22, 2008 at 8:37 PM, Geoffrey Young <[EMAIL PROTECTED]>
wrote:
>
>
> Shalin Shekhar Mangar wrote:
>
>> The problems you described in the spellchecker are noted in
>> https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue
>> to
>> synchronize spellcheck.build so that
On Tue, Jul 22, 2008 at 11:07 AM, Geoffrey Young
<[EMAIL PROTECTED]> wrote:
> Shalin Shekhar Mangar wrote:
>>
>> The problems you described in the spellchecker are noted in
>> https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue
>> to
>> synchronize spellcheck.build so that the
How can I require an exact field match in a query. For instance, if a
title field contains "Nature" or "Nature Cell Biology", when I search
title:Nature I only want "Nature" and not "Nature Cell Biology". Is
that something I do as a query or do I need to re index it with the
field defined in a cert
Shalin Shekhar Mangar wrote:
The problems you described in the spellchecker are noted in
https://issues.apache.org/jira/browse/SOLR-622 -- I shall create an issue to
synchronize spellcheck.build so that the index is not corrupted.
I'd like to discuss this a little...
I'm not sure that I want
Can someone point me to an in depth explanation of the Solr cache
statistics? I'm having a hard time finding it online. Specifically, I'm
interested in these fields that are listed on the Solr admin statistics
pages in the cache section:
lookups
hits
hitratio
inserts
evictions
size
cumulative_
It seems that spellchecker works great except all the "7 words you
can't say on TV" resolve to very important people, is there a way to
contain just certain words so they don't resolve?
Thanks.
- Jon
All facet counts currently returned are _within_ the set of documents
constrained by query (q) and filter query (fq) parameters - just to
clarify what it does. Why? That's the general use case. Returning
back counts from differently constrained sets requires some custom
coding - perhaps
This is *exactly* my issue ... very nicely worded :-)
I would have thought facet.query=*:* would have been the solution but
it does not seem to work. Im interested in getting these *total*
counts for UI display.
- Jon
On Jul 22, 2008, at 6:05 AM, Stefan Oestreicher wrote:
Hi,
I have a
My opinion: if its already a runaway, we might as well not prolong
things. If not though, we should probably give some time for any
possible laggards. The 'admin look' poll received its first 19-20 votes
in the first night / morning, and has only gotten 2 or 3 since then, so
probably no use goi
28 votes so far and counting!
When should we close this poll?
On Tue, Jul 22, 2008 at 1:18 AM, Mark Miller <[EMAIL PROTECTED]> wrote:
> Perfect! Thank you Shalin. Much appreciated, and a dead simple system. My
> vote is in.
>
> - Mark
>
>
> Shalin Shekhar Mangar wrote:
>
>> Will this do? A 1-5 f
Hi,
I have a category field in my index which I'd like to use as a facet.
However my search frontend only allows you to search in one category at a
time for which I'm using a filter query. Unfortunately the filter query
restricts the facets as well.
My query looks like this:
?q=content:foo&fq=cat
hi,
try using faceted search,
http://wiki.apache.org/solr/SimpleFacetParameters
something like facet=true&facet.query=title:("web2.0" OR "ajax")
facet.query - gives the number of matching documents for a query.
You can run the examples in the above link and see how it works..
You can also try u
On Jul 22, 2008, at 5:08 AM, Adrian M Bell wrote:
We have a catalogue of documents that we have a solr index on. We
need to
provide an alphabetical search, so that a user can list all
documents with a
title beginning A, B and so on...
So how do we do this?
Currently we have built up the f
Ok this might be a simple one, or more likely, my understanding of solr is
shot to bits
We have a catalogue of documents that we have a solr index on. We need to
provide an alphabetical search, so that a user can list all documents with a
title beginning A, B and so on...
So how do we do th
Hi All,
I am working on a module using Solr, where I want to get the stats of
each keyword found in each field.
If my search term is: (title:("web2.0" OR "ajax") OR
description:("web2.0" OR "ajax"))
Then I want to know how many times web2.0/ajax were found in title or
description.
Any suggestio
61 matches
Mail list logo