Hi All,
Since there is no way of controlling the size of Lucene's internal
FieldCache, how can we make sure that we are making good use of it? One of
my shard has close to 1.5M documents and the fieldCache only contains about
10 elements.
Is there anything we can do to control
FieldCache with a binary field?
--
Kind regards,
Mathias
This is because you may be having only 10 unique terms in your indexed Field.
BTW, what do you mean by controlling the FieldCache?
--
View this message in context:
http://lucene.472066.n3.nabble.com/FieldCache-tp2987541p2988142.html
Sent from the Solr - User mailing list archive at Nabble.com.
f the FieldCache is wrong. I thought this
was the main cache for Lucene. Is that right?
Thanks for your feedback
-Original Message-
From: pravesh [mailto:suyalprav...@yahoo.com]
Sent: May-26-11 2:58 AM
To: solr-user@lucene.apache.org
Subject: Re: FieldCache
This is because you may be having
fieldCache stores one entry for each field that is used for sorting or for
field faceting when you use the fieldCache (fc) method. Before solr 1.4 the
method for field faceting was the enum method that executes a filter query for
each unique value of the field and stores it in the filterCache
Since FieldCache is an expert level API in lucene, there is no direct control
provided by SOLR/Lucene to control its size.
--
View this message in context:
http://lucene.472066.n3.nabble.com/FieldCache-tp2987541p2989443.html
Sent from the Solr - User mailing list archive at Nabble.com.
T.getTerms returns empty ByteRefs.
>
> Then I found the following post:
> http://www.mail-archive.com/d...@lucene.apache.org/msg05403.html
>
> How can I use the FieldCache with a binary field?
>
> --
> Kind regards,
> Mathias
>
>
o binary. I still use the EmbeddedSolrServer to
create the indices.
Also, I had to remove the uniquekey node because binary fields cannot be
indexed, which is the requirement for the unique key.
After reindexing I discovered that nonindexed or binary fields cannot be used
with the FieldCa
On Mon, Oct 25, 2010 at 3:41 AM, Mathias Walter wrote:
> I indexed about 90 million sentences and the PAS (predicate argument
> structures) they consist of (which are about 500 million). Then
> I try to do NER (named entity recognition) by searching about 5 million
> entities. For each entity I
Hi Mathias,
> [...] I tried to use IndexableBinaryStringTools to re-encode my 11 byte
> array. The size was increased to 7 characters (= 14 bytes)
> which is still a gain of more than 50 percent compared to the UTF8
> encoding. BTW: I found no sample how to use the
> IndexableBinaryStringTools cla
Hi Robert,
On 10/25/2010 at 8:20 AM, Robert Muir wrote:
> it is deprecated in trunk, because you can index binary terms (your
> own byte[]) directly if you want. To do this, you need to use a custom
> AttributeFactory.
It's not actually deprecated yet.
> See src/test/org/apache/lucene/index/Test
On Mon, Oct 25, 2010 at 9:00 AM, Steven A Rowe wrote:
> It's not actually deprecated yet.
you are right! only in my patch!
> AFAICT, Test2BTerms only deals with the indexing side of this issue, and
> doesn't test searching.
>
> LUCENE-2551 does, however, test searching. Why hasn't this been co
Hi,
> On Mon, Oct 25, 2010 at 3:41 AM, Mathias Walter
> wrote:
> > I indexed about 90 million sentences and the PAS (predicate argument
> structures) they consist of (which are about 500 million). Then
> > I try to do NER (named entity recognition) by searching about 5 million
> entities. For eac
On Mon, Oct 25, 2010 at 3:41 PM, Mathias Walter wrote:
> How do I use it with Solr, i. e. how to set up a schema.xml using a custom
> AttributeFactory?
>
at the moment there is no way to specify an AttributeFactory
(AttributeFactoryFactory? heh) in the schema.xml, nor do the
TokenizerFactories
On Mon, 2010-10-25 at 09:41 +0200, Mathias Walter wrote:
> [...] I enabled the field cache for my ID field and another
> single char field (PAS type) to get the benefit of accessing
> the fields with an array. Unfortunately, the IDs are too
> large to fit in memory. I gave 12 GB of RAM to each node
Hi,
> > [...] I tried to use IndexableBinaryStringTools to re-encode my 11 byte
> > array. The size was increased to 7 characters (= 14 bytes)
> > which is still a gain of more than 50 percent compared to the UTF8
> > encoding. BTW: I found no sample how to use the
> > IndexableBinaryStringTools c
I have an index with several fields, but just one stored: ID (string,
unique).
I need to access that ID field for each of the tops "nodes" docs in my
results (this is done inside a handler I wrote), code looks like:
Hits hits = searcher.search(query);
for(int i=0; i
Dear list,
after getting OOM exception after one week of operation with
solr 3.2 I used MemoryAnalyzer for the heapdumpfile.
It looks like the fieldCache eats up all memory.
Objects Shalow Heap
Retained Heap
Hi,
Can anyone confirm Lucene FieldCache memory requirements? I have 100
millions docs with non-tokenized field "country" (10 different countries); I
expect it requires array of ("int", "long"), size of array 100,000,000,
without any impact of "country"
Hi Mathias,
> > > I assume that the char[] returned form
> > > IndexableBinaryStringTools.encode is encoded in UTF-8 again
> > > and then stored. At some point the information is lost and
> > > cannot be recovered.
> >
> > Can you give an example? This should not happen.
>
> My character array r
On Sat, Nov 13, 2010 at 1:50 PM, Steven A Rowe wrote:
> Looks to me like the returned value is in a Solr-internal form of XML
> character escaping: \u is represented as "#0;" and \u0008 is represented
> as "#8;". (The escaping code is in
> solr/src/java/org/apache/common/util/XML.java.)
Y
On 11/13/2010 at 2:04 PM, Yonik Seeley wrote:
n Sat, Nov 13, 2010 at 1:50 PM, Steven A Rowe wrote:
> > Looks to me like the returned value is in a Solr-internal form of XML
> > character escaping: \u is represented as "#0;" and \u0008 is
> > represented as "#8;". (The escaping code is in
> >
ike:
>
> Hits hits = searcher.search(query);
> for(int i=0; iid[i]=hits.doc(i).get("ID");
>score[i]=hits.score(i);
> }
>
>I noticed that retrieving the code is slow.
>
>if I use the FieldCache, like:
>id[i]=FieldCache.DEFAULT.getStri
About stored/index difference: ID is a string, (= solr.StrField) so
FieldCache give me what I need.
I'm just wondering, as this cached object could be (theoretically)
pretty big, do I need to be aware of some OOM? I know that FieldCache
use weakmaps, so I presume the cached array for the
On 9/20/07, Walter Ferrara <[EMAIL PROTECTED]> wrote:
> I'm just wondering, as this cached object could be (theoretically)
> pretty big, do I need to be aware of some OOM? I know that FieldCache
> use weakmaps, so I presume the cached array for the older reader(s) will
> b
: Hits hits = searcher.search(query);
: for(int i=0; i
On 9/20/07, Walter Ferrara <[EMAIL PROTECTED]> wrote:
> I have an index with several fields, but just one stored: ID (string,
> unique).
> I need to access that ID field for each of the tops "nodes" docs in my
> results (this is done inside a handler I wrote), code looks like:
>
> Hits hits =
bq
parameter).
It seems the bq the cause of the misery.
Issue SOLR- keeps popping up but it has not been resolved. Is there anyone
who can confirm one of those patches fixes this issue before i waste hours of
work finding out it doesn't? ;)
Am i correct when i assume that Lucene Fi
Hi,
I can see only fieldCache (nothing about filter, query or document
cache) on stats page. What I'm doing wrong? We have two servers with
replication. There are two cores(prod, dev) on each server. Maybe I
have to add something to solrconfig.xml of cores?
Best Regards,
Solr Beginner
(see the solr admin page).
Best
Erick
On Wed, Jun 15, 2011 at 8:22 AM, Bernd Fehling
wrote:
> Dear list,
>
> after getting OOM exception after one week of operation with
> solr 3.2 I used MemoryAnalyzer for the heapdumpfile.
> It looks like
97
numDocs:28.940.964
numTerms: 686.813.235
optimized:true
hasDeletions:false
What can you read/calculate from this values?
Is my index to big for Lucene/Solr?
What I don't understand, why fieldCache is not garbage collected
and therefore reduced in size from time to time.
Rega
Any thoughts regarding the subject? I hope FieldCache doesn't use more than
6 bytes per document-field instance... I am too lazy to research Lucene
source code, I hope someone can provide exact answer... Thanks
> Subject: Lucene FieldCache memory requirements
>
> Hi,
>
>
Which FieldCache API are you using? getStrings? or getStringIndex
(which is used, under the hood, if you sort by this field).
Mike
On Mon, Nov 2, 2009 at 2:27 PM, Fuad Efendi wrote:
> Any thoughts regarding the subject? I hope FieldCache doesn't use more than
> 6 bytes per doc
I am not using Lucene API directly; I am using SOLR which uses Lucene
FieldCache for faceting on non-tokenized fields...
I think this cache will be lazily loaded, until user executes sorted (by
this field) SOLR query for all documents *:* - in this case it will be fully
populated...
> Subj
OK I think someone who knows how Solr uses the fieldCache for this
type of field will have to pipe up.
For Lucene directly, simple strings would consume an pointer (4 or 8
bytes depending on whether your JRE is 64bit) per doc, and the string
index would consume an int (4 bytes) per doc. (Each
Thank you very much Mike,
I found it:
org.apache.solr.request.SimpleFacets
...
// TODO: future logic could use filters instead of the fieldcache if
// the number of terms in the field is small enough.
counts = getFieldCacheCounts(searcher, base, field, offset,limit
t;
> I found it:
> org.apache.solr.request.SimpleFacets
> ...
> // TODO: future logic could use filters instead of the fieldcache if
> // the number of terms in the field is small enough.
> counts = getFieldCacheCounts(searcher, base, field, offset,limit,
&
hope it is (int) Document ID...
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: November-02-09 6:52 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene FieldCache memory requirements
>
> It also briefly requires more
Fuad Efendi wrote:
> Simple field (10 different values: Canada, USA, UK, ...), 64-bit JVM... no
> difference between maxdoc and maxdoc + 1 for such estimate... difference is
> between 0.4Gb and 1.2Gb...
>
>
I'm not sure I understand - but I didn't mean to imply the +1 on maxdoc
meant anything. T
*.*; FieldCache is not used for tokenized fields... how it is sorted
:)
Fortunately, no any OOM.
-Fuad
Mark,
I don't understand this:
> so with a ton of docs and a few uniques, you get a temp boost in the RAM
> reqs until it sizes it down.
Sizes down??? Why is it called Cache indeed? And how SOLR uses it if it is
not cache?
And this:
> A pointer for each doc.
Why can't we use (int) DocumentID?
Ok, my "naive" thinking about FieldCache: for each Term we can quickly
retrieve DocSet. What are memory requirements? Theoretically,
[maxdoc]x[4-bytes DocumentID], plus some (small) array to store terms
pointing to (large) arrays of DocumentIDs.
Mike suggested http://issues.apache.org/j
To be correct, I analyzed FieldCache awhile ago and I believed it never
"sizes down"...
/**
* Expert: The default cache implementation, storing all values in memory.
* A WeakHashMap is used for storage.
*
* Created: May 19, 2004 4:40:36 PM
*
* @since lucene 1.4
*/
Will it
static final class StringIndexCache extends Cache {
StringIndexCache(FieldCache wrapper) {
super(wrapper);
}
@Override
protected Object createValue(IndexReader reader, Entry entryKey)
throws IOException {
String field = StringHelper.intern(entryKey.field
o be
safe, use this in your basic memory estimates:
[512Mb ~ 1Gb] + [non_tokenized_fields_count] x [maxdoc] x [8 bytes]
-Fuad
> -Original Message-
> From: Fuad Efendi [mailto:f...@efendi.ca]
> Sent: November-02-09 7:37 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Lucene
ll it size down in purely
Lucene-based heavy-loaded production system? Especially if this cache is
used for query optimizations.
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: November-02-09 8:53 PM
> To: solr-user@lucene.apache.org
> Subject: R
Even in simplistic scenario, when it is Garbage Collected, we still
_need_to_be_able_ to allocate enough RAM to FieldCache on demand... linear
dependency on document count...
>
> Hi Mark,
>
> Yes, I understand it now; however, how will StringIndexCache size down in
a
> pr
FieldCache uses internally WeakHashMap... nothing wrong, but... no any
Garbage Collection tuning will help in case if allocated RAM is not enough
for replacing Weak** with Strong**, especially for SOLR faceting... 10%-15%
CPU taken by GC were reported...
-Fuad
That's right.
Except: as Mark said, you'll also need transient memory = pointer (4
or 8 bytes) * (1+maxdoc), while the FieldCache is being loaded. After
it's done being loaded, this sizes down to the number of unique terms.
But, if Lucene did the basic int packing, which really we sh
Sorry Mike, Mark, I am confused again...
Yes, I need some more memory for processing ("while FieldCache is being
loaded"), obviously, but it was not main subject...
With StringIndexCache, I have 10 arrays (cardinality of this field is 10)
storing (int) Lucene Document ID.
> Ex
35
> optimized: true
> hasDeletions: false
>
> What can you read/calculate from this values?
>
> Is my index to big for Lucene/Solr?
>
> What I don't understand, why fieldCache is not garbage collected
> and therefore reduced in size from time to time.
>
&
- currently 20g --> about 7 days until OOM
Starting the system takes about 3.5g and goes up to about 4g after a while.
The only dirty workaround so far is to restart the whole system after 5 days.
Not really nice.
The problem seams to be fieldCache which is under the hood of jetty.
Do you kno
tarting the system takes about 3.5g and goes up to about 4g after a while.
>
> The only dirty workaround so far is to restart the whole system after 5
> days.
> Not really nice.
>
> The problem seams to be fieldCache which is under the hood of jetty.
> Do you know of any sizing fe
Hi Erik,
as far as I can see with MemoryAnalyzer from the heap:
- the class fieldCache has a HashMap
- one entry of the HashMap is FieldCacheImpl$StringIndex which is "mister big"
- FieldCacheImpl$StringIndex is a WeakHashMap
- WeakHashMap has three entries
-- 63.58 percent of hea
Hello Erick,
I have a 1.7MM documents, 3.6GB index. I also hava an unusual amount of
dynamic fields, that I use for sorting. My FieldCache currently has about
13.000 entries, even though my index only has 1-3 queries per second. Each
query sorts by two dynamic fields, and facets on 3-4 fields
The current status of my installation is that with some tweeking of
JAVA I get a runtime of about 2 weeks until OldGen (14GB) is filled
to 100 percent and won't free anything even with FullGC.
The part of fieldCache in a HeapDump to that time is over 80 percent
from the whole heap (20GB)
Bernd, in our case, optimizing the index seems to flush the FieldCache for
some reason. On the other hand, doing a few commits without optimizing seems
to make the problem worse.
Hope that helps, we would like to give it a try and debug this in Lucene,
but are pressed for time right now. Perhaps
increasing memory in FieldCache everytime index is updated. Calling
SolrQueryRequest.close() solves the problem, you should see items disappear
from FieldCache (JMX) as soon as new searcher is registered.
My corrected code is
SolrQueryRequest request = buildSolrQueryRequest();
try {
SolrQue
dear erolagnab,
it is your code in the solr server?
which class i can put it?
--
View this message in context:
http://lucene.472066.n3.nabble.com/fieldCache-problem-OOM-exception-tp3067057p3517780.html
Sent from the Solr - User mailing list archive at Nabble.com.
@topcat: you need to call close() method for solr request after using them.
In general,
SolrQueryRequest request = new SolrQueryRequest();
try {
.
} finally {
request.close();
}
--
View this message in context:
http://lucene.472066.n3.nabble.com/fieldCache-problem-OOM-exception
@topcat: you need to call close() method for solr request after using them.
In general,
SolrQueryRequest request = new SolrQueryRequest();
try {
.
} finally {
request.close();
}
--
View this message in context:
http://lucene.472066.n3.nabble.com/fieldCache-problem-OOM-exception
Well, it's quite hard to debug because the values listed on the stats page in
the fieldCache section don't make much sense. Reducing precision with
NOW/HOUR, however, does seem to make a difference.
It is hard (or impossible) to reproduce this is a test setup with the same
index b
from
excessive memory consumption:
recip(ms(NOW/,),,1,1)
On Thursday 10 March 2011 15:14:25 Markus Jelsma wrote:
> Well, it's quite hard to debug because the values listed on the stats page
> in the fieldCache section don't make much sense. Reducing precision with
> NOW/HOUR,
you suffer from
: excessive memory consumption:
:
: recip(ms(NOW/,),,1,1)
FWIW: it sounds like your problem wasn't actually related to your
fieldCache, but probably instead if was because of how big your
queryResultCache is
: > > Am i correct when i assume that Lucene FieldCa
Hi,
> FWIW: it sounds like your problem wasn't actually related to your
> fieldCache, but probably instead if was because of how big your
> queryResultCache is
It's the same cluster as in the other thread. I decided a long time ago that
documentCache and queryResultCache
, Solr Beginner wrote:
> Hi,
>
> I can see only fieldCache (nothing about filter, query or document
> cache) on stats page. What I'm doing wrong? We have two servers with
> replication. There are two cores(prod, dev) on each server. Maybe I
> have to add something to sol
Time:Wed Apr 27 11:07:00 CEST 2011
According to cache I can see only following informations:
CACHE
name:fieldCache
class: org.apache.solr.search.SolrFieldCacheMBean
version: 1.0
description: Provides introspection of the Lucene FieldCache, this
is **NOT** a cache that is managed by
following informations:
>
> CACHE
>
> name: fieldCache
> class: org.apache.solr.search.SolrFieldCacheMBean
> version: 1.0
> description: Provides introspection of the Lucene FieldCache, this
> is **NOT** a cache that is managed by Solr.
> sourceid: $Id: S
Time:Wed Apr 27 11:07:00 CEST 2011
According to cache I can see only following informations:
CACHE
name:fieldCache
class: org.apache.solr.search.SolrFieldCacheMBean
version: 1.0
description: Provides introspection of the Lucene FieldCache, this
is **NOT** a cache that is
document.
My idea is to have a FieldCache for the myUniqueKey field in SolrIndexSearcher
(or somewhere else?) that would be used in cases where the only field that
needs to be retrieved is myUniqueKey. Is this something that would improve
performance?
In our actual setup, we are using an extended
called for each document. As I understand it,
: this will read each document from the index _on disk_ and retrieve the
: myUniqueKey field value for each document.
:
: My idea is to have a FieldCache for the myUniqueKey field in
: SolrIndexSearcher (or somewhere else?) that would be used in cases
back to the coordinator, SolrIndexSearcher.doc (int i,
> : Set fields) is called for each document. As I understand it,
> : this will read each document from the index _on disk_ and retrieve the
> : myUniqueKey field value for each document.
> :
> : My idea is to have a FieldCache f
>
> Ah, thanks Hoss - I had meant to respond to the original email, but
> then I lost track of it.
>
> Via pseudo-fields, we actually already have the ability to retrieve
> values via FieldCache.
> fl=id:{!func}id
>
> But using CSF would probably be better here - n
: > Quite probably ... you typically can't assume that a FieldCache can be
: > constructed for *any* field, but it should be a safe assumption for the
: > uniqueKey field, so for that initial request of the mutiphase distributed
: > search it's quite possible it would sp
On Tue, Jul 19, 2011 at 3:20 PM, Chris Hostetter
wrote:
>
> : > Quite probably ... you typically can't assume that a FieldCache can be
> : > constructed for *any* field, but it should be a safe assumption for the
> : > uniqueKey field, so for that initial request of
Hi,
We use solr and lucene fieldcache like this
static DocTerms myfieldvalues =
org.apache.lucene.search.FieldCache.DEFAULT.getTerms(reader, "myField");
which is initialized at first use and will stay in memory for fast retrieval
of field values based on DocID
The problem is afte
use solr and lucene fieldcache like this
> static DocTerms myfieldvalues =
> org.apache.lucene.search.FieldCache.DEFAULT.getTerms(reader, "myField");
> which is initialized at first use and will stay in memory for fast retrieval
> of field values based on DocID
>
> The problem is
"string" type (StrField). The facets *seem* to be
generated much faster. Is it expected that FieldCache would be faster than
UnInvertedField for single-token strings like this?
My goal is to make the facet re-generation after a commit as fast as possible. I
would like to continue using Tex
Hey there,
Does the lucene2.9-dev used in current Solr nighty-build (9-6-2009) include
the patch LUCENE-1662 to avoid doubling memory usage in lucene FieldCache??
Thanks in advance
--
View this message in context:
http://www.nabble.com/Lucene2.9-dev-version-in-Solr-nightly-build-and-FieldCache
Hi Micheal,
The FieldCache is an easier data structure and easier to create, so I
also expect it to be faster. Unfortunately for TextField
UnInvertedField
is always used even if you have one token per document. I think
overriding the multiValuedFieldCache method and return false would
work.
If
here,
> Does the lucene2.9-dev used in current Solr nighty-build (9-6-2009) include
> the patch LUCENE-1662 to avoid doubling memory usage in lucene FieldCache??
> Thanks in advance
> --
> View this message in context:
> http://www.nabble.com/Lucene2.9-dev-version-in-Solr-nightly
; revision was 779277
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> On Tue, Jun 9, 2009 at 5:32 AM, Marc Sturlese
> wrote:
>>
>> Hey there,
>> Does the lucene2.9-dev used in current Solr nighty-build (9-6-2009)
>> include
>> the patch LU
82 matches
Mail list logo