Re: Doc Transformer to remove document from the response

2012-10-29 Thread eks dev
Thanks Hoss, I probably did not formulate the question properly, but you gave me an answer. I do it already in SearchComponent, just wanted to centralise this control of the depth and width of the response to the single place in code [style={minimal, verbose, full...}]. It just sounds logical t

Doc Transformer to remove document from the response

2012-10-27 Thread eks dev
Transformer is great to augment Documents before shipping to response, but what would be a way to prevent document from being delivered? I have some search components that make some conclusions after search , duplicates removal, clustering and one Augmenter(solr Transformer) to shape the response

Re: Solr 4.0 and production environments

2012-03-07 Thread eks dev
I am here on lucene as a user since the project started, even before solr came to life, many many years. And I was always using trunk version for pretty big customers, and *never* experienced some serious problems. The worst thing that can happen is to notice bug somewhere, and if you have some rea

Re: [SoldCloud] Slow indexing

2012-03-04 Thread eks dev
hmm, loks like you are facing exactly the phenomena I asked about. See my question here: http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/61326 On Sun, Mar 4, 2012 at 9:24 PM, Markus Jelsma wrote: > Hi, > > With auto-committing disabled we can now index many millions of documents in

Re: Solr Cloud, Commits and Master/Slave configuration

2012-03-01 Thread eks dev
that gets > flushed depending on the requests coming through and the buffer size. > > - Mark Miller > lucidimagination.com > > On Feb 28, 2012, at 3:38 AM, eks dev wrote: > >> SolrCluod is going to be great, NRT feature is really huge step >> forward, as well as centra

Re: Solr Cloud, Commits and Master/Slave configuration

2012-02-28 Thread eks dev
SolrCluod is going to be great, NRT feature is really huge step forward, as well as central configuration, elasticity ... The only thing I do not yet understand is treatment of cases that were traditionally covered by Master/Slave setup. Batch update If I get it right (?), updates to replicas are

Re: SnapPull failed :org.apache.solr.common.SolrException: Error opening new searcher

2012-02-23 Thread eks dev
thin. On Thu, Feb 23, 2012 at 8:47 AM, eks dev wrote: > thanks Mark, I will give it a go and report back... > > On Thu, Feb 23, 2012 at 1:31 AM, Mark Miller wrote: >> Looks like an issue around replication IndexWriter reboot, soft commits and >> hard commits. >> >>

Re: SnapPull failed :org.apache.solr.common.SolrException: Error opening new searcher

2012-02-22 Thread eks dev
te our commit point to the right dir >       solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req, false)); > > That should allow the searcher that the following commit command prompts to > see the *new* IndexWriter. > > On Feb 22, 2012, at 10:56 AM, eks dev wrote: > >> W

dih and solr cloud

2012-02-22 Thread eks dev
out of curiosity, trying to see if new cloud features can replace what I use now... how is this (batch) update forwarding solved at cloud level? imagine simple one shard and one replica case, if I fire up DIH update, is this going to be replicated to replica shard? If yes, - is it going to be sen

Re: Unusually long data import time?

2012-02-22 Thread eks dev
Davon, you ought to try to update from many threads, (I do not know if DIH can do it, check it), but lucene does great job if fed from many update threads... depends where your time gets lost, but it is usually a) analysis chain or b) database if it os a) and your server has spare cpu-cores, you

SnapPull failed :org.apache.solr.common.SolrException: Error opening new searcher

2012-02-22 Thread eks dev
We started observing strange failures from ReplicationHandler when we commit on master trunk version 4-5 days old. It works sometimes, and sometimes not didn't dig deeper yet. Looks like the real culprit hides behind: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is clos

Re: reader/searcher refresh after replication (commit)

2012-02-22 Thread eks dev
with the master. > > What are you expecting a BeforeCommitListener could do for you, if one > would exist? > > Kind regards, > Em > > Am 21.02.2012 21:10, schrieb eks dev: >> Thanks Mark, >> Hmm, I would like to have this information asap, not to wait until the

Re: reader/searcher refresh after replication (commit)

2012-02-21 Thread eks dev
And drinks on me to those who decoupled implicit commit from close... this was tricky trap On Tue, Feb 21, 2012 at 9:10 PM, eks dev wrote: > Thanks Mark, > Hmm, I would like to have this information asap, not to wait until the > first search gets executed (depends on user) . Is solr

Re: reader/searcher refresh after replication (commit)

2012-02-21 Thread eks dev
licates can appear) are there any "IndexWriter" listeners around? Thanks again, eks. On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller wrote: > Post commit calls are made before a new searcher is opened. > > Might be easier to try to hook in with a new searcher listener? >

reader/searcher refresh after replication (commit)

2012-02-21 Thread eks dev
Hi all, I am a bit confused with IndexSearcher refresh lifecycles... In a master slave setup, I override postCommit listener on slave (solr trunk version) to read some user information stored in userCommitData on master -- @Override public final void postCommit() { // This returnes "stale"

Re: codec="Pulsing" per field broken?

2011-12-11 Thread eks dev
Thanks Robert, I've missed LUCENE-3490... Awesome! On Sun, Dec 11, 2011 at 6:37 PM, Robert Muir wrote: > On Sun, Dec 11, 2011 at 11:34 AM, eks dev wrote: >> on the latest trunk, my schema.xml with field type declaration >> containing //codec="Pulsing"//

codec="Pulsing" per field broken?

2011-12-11 Thread eks dev
on the latest trunk, my schema.xml with field type declaration containing //codec="Pulsing"// does not work any more (throws exception from FieldType). It used to work wit approx. a month old trunk version. I didn't dig deeper, can be that the old schema.xml was broken and worked by accident. --

Re: capacity planning

2011-10-11 Thread eks dev
Re. "I have little experience with VM servers for search." We had huge performance penalty on VMs, CPU was bottleneck. We couldn't freely run measurements to figure out what the problem really was (hosting was contracted by customer...), but it was something pretty scary, kind of 8-10 times slowe

Re: Update ingest rate drops suddenly

2011-09-26 Thread eks dev
Just to bring closure on this one, we were slurping data from the wrong DB (hardly desktop class machine)... Solr did not cough on 41Mio records @34k updates / sec., single threaded. Great! On Sat, Sep 24, 2011 at 9:18 PM, eks dev wrote: > just looking for hints where to look for... >

Re: Update ingest rate drops suddenly

2011-09-25 Thread eks dev
g locally > > Out of curiosity, how big is your ramBufferSizeMB and your -Xmx? > And on that 8-core box you have ~8 indexing threads going? > > Otis > > Sematext is Hiring -- http://sematext.com/about/jobs.html > > > > >> &g

Update ingest rate drops suddenly

2011-09-24 Thread eks dev
just looking for hints where to look for... We were testing single threaded ingest rate on solr, trunk version on atypical collection (a lot of small documents), and we noticed something we are not able to explain. Setup: We use defaults for index settings, windows 64 bit, jdk 7 U2. on SSD, machi

solr-user@lucene.apache.org

2011-09-16 Thread eks dev
probably stupid question, Which Directory implementation should be the best suited for index mounted on ramfs/tmpfs? I guess plain old FSDirectory, (or mmap/nio?)

Which Solr / Lucene direcotory for ramfs?

2011-09-16 Thread eks dev
probably stupid question, Which Directory implementation should be the best suited for index mounted on ramfs/tmpfs? I guess plain old FSDirectory, (or mmap/nio?)

Re: DataImportHandler using new connection on each query

2011-09-02 Thread eks dev
watch out, "running 10 hours" != "idling 10 seconds" and trying again. Those are different cases. It is not dropping *used* connections (good to know it works that good, thanks for reporting!), just not reusing connections more than 10 seconds idle On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty

Re: DataImportHandler using new connection on each query

2011-09-02 Thread eks dev
take care, "running 10 hours" != "idling 10 seconds" and trying again. Those are different cases. It is not dropping *used* connections (good to know it works that good, thanks for reporting!), just not reusing connections more than 10 seconds idle On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty

Re: DataImportHandler using new connection on each query

2011-09-02 Thread eks dev
I am not sure if current version has this, but DIH used to reload connections after some idle time if (currTime - connLastUsed > CONN_TIME_OUT) { synchronized (this) { Connection tmpConn = factory.call(); clos

NRT in Master- Slave setup, crazy?

2011-08-11 Thread eks dev
Thinking aloud and grateful for sparing .. I need to support high commit rate (low update latency) in a master slave setup and I have a bad feelings about it, even with disabling warmup and stripping everything down that slows down refresh. I will try it anyway, but I started thinking about "back

Re: DIH on sequence (or any type that supports ordering) possible?

2011-08-06 Thread eks dev
ne to do, but I really do not know simple and fast way... cheers, eks On Sat, Aug 6, 2011 at 8:32 PM, Shawn Heisey wrote: > On 8/6/2011 8:49 AM, eks dev wrote: >> >> I would appreciate some clarifications about DIH >> >> I do not have reliable timestamp, but I do have ato

DIH on sequence (or any type that supports ordering) possible?

2011-08-06 Thread eks dev
I would appreciate some clarifications about DIH I do not have reliable timestamp, but I do have atomic sequence that only grows on inserts/changes. You can understand it as a timestamp on some funky timezone not related to wall clock time, it is integer type. Is DIH keeping track of the MAX(comm

Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread eks dev
hey have yet to > bump that to trunk/4.x; it was only recently updated to 3.2. > > On Aug 2, 2011, at 5:26 PM, eks dev wrote: > >> Well, Lucid released "LucidWorks Enterprise" >> with  " Complete Apache Solr 4.x Release Integrated and tested with >> po

Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread eks dev
Well, Lucid released "LucidWorks Enterprise" with " Complete Apache Solr 4.x Release Integrated and tested with powerful enhancements" Whatever it means for solr 4.0 On Tue, Aug 2, 2011 at 11:10 PM, David Smiley (@MITRE.org) wrote: > My best guess (and it is just a guess) is between December

Re: conditionally update document on unique id

2011-06-29 Thread eks dev
On Wed, Jun 29, 2011 at 4:32 PM, eks dev wrote: >> req.getSearcher().getFirstMatch(t) != -1; > > Yep, this is currently the fastest option we have. > > -Yonik > http://www.lucidimagination.com >

Re: conditionally update document on unique id

2011-06-29 Thread eks dev
t 2:01 AM, eks dev wrote: > >> Quick question, >> Is there a way with solr to conditionally update document on unique >> id? Meaning, default, add behavior if id is not already in index and >> *not to touch index" if already there. >> >> Deletes are no

Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev
011-06-29 at 09:35 +0200, eks dev wrote: >> In MMAP, you need to have really smart warm up (MMAP) to beat IO >> quirks, for RAMDir  you need to tune gc(), choose your poison :) > > Other alternatives are operating system RAM disks (avoids the GC > problem) and using SSDs (nearly

Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev
Wed, 2011-06-29 at 09:35 +0200, eks dev wrote: >> In MMAP, you need to have really smart warm up (MMAP) to beat IO >> quirks, for RAMDir  you need to tune gc(), choose your poison :) > > Other alternatives are operating system RAM disks (avoids the GC > problem) and using SSDs (nea

Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-29 Thread eks dev
...Using RAMDirectory really does not help performance... I kind of agree, but in my experience with lucene, there are cases where RAMDirectory helps a lot, with all its drawbacks (huge heap and gc() tuning). We had very good experience with MMAP on average, but moving to RAMDirectory with prop

conditionally update document on unique id

2011-06-28 Thread eks dev
Quick question, Is there a way with solr to conditionally update document on unique id? Meaning, default, add behavior if id is not already in index and *not to touch index" if already there. Deletes are not important (no sync issues). I am asking because I noticed with deduplication turned on, i

overwirite if not already in index?

2011-06-28 Thread eks dev
Quick question, Is there a way with solr to conditionally update document on unique id? Meaning, default, add behavior if id is not already in index and *not to touch index" if already there. Deletes are not important (no sync issues). I am asking because I noticed with deduplication turned on, i

Re: Using RAMDirectoryFactory in Master/Slave setup

2011-06-27 Thread eks dev
Your best bet is MMapDirectoryFactory, you can come very close to the performance of the RAMDirectory. Unfortunatelly this setup with Master_on_disk->Slaves_in_ram type of setup is not possible using solr. We are moving our architecture to solr at the moment, and this is one of "missings" we have

Re: Question about http://wiki.apache.org/solr/Deduplication

2011-04-04 Thread eks dev
Thanks Hoss, Externanlizing this part is exactly the path we are exploring now, not only for this reason. We already started testing Hadoop SequenceFile for write ahead log for updates/deletes. SequenceFile supports append now (simply great!). It was a a pain to have to add hadoop into mix for "

Deduplication questions

2011-03-25 Thread eks dev
Q1. Is is possible to pass *analyzed* content to the public abstract class Signature { public void init(SolrParams nl) { } public abstract String calculate(String content); } Q2. Method calculate() is using concatenated fields from name,features,cat Is there any mechanism I could build "fi

Question about http://wiki.apache.org/solr/Deduplication

2011-03-24 Thread eks dev
Hi, Use case I am trying to figure out is about preserving IDs without re-indexing on duplicate, rather adding this new ID under list of document id "aliases". Example: Input collection: "id":1, "text":"dummy text 1", "signature":"A" "id":2, "text":"dummy text 1", "signature":"A" I add the first

Re: filter query from external list of Solr unique IDs

2010-10-16 Thread eks dev
if your index is read-only in production, can you add mapping unique_id-Lucene docId in your kv store and and build filters externally? That would make unique Key obsolete in your production index, as you would work at lucene doc id level. That way, you offline the problem to update/optimize phase