Indexing multiple documents in Solr/SolrCell

2009-11-16 Thread Kerwin
Hi, I am new to this forum and would like to know if the function described below has been developed or exists in Solr. If it does not exist, is it a good Idea and can I contribute. We need to index multiple documents with different formats. So we use Solr with Tika (Solr Cell). Question: Can

Re: Tika trouble

2009-11-16 Thread Markus Jelsma - Buyways B.V.
Anyone has a clue? List, I somehow fail to index certain pdf files using the ExtractingRequestHandler in Solr 1.4 with default solrconfig.xml but modified schema. I have a very simple schema for this case using only and ID field, a timestamp field and two dynamic fields; ignored_* and

Re: Tika trouble

2009-11-16 Thread Antonio Calò
What I could try to say is that if you want to index a Pdf, then you should use a Pdf extractor. A Pdf Extractor is able to extract the text content and the metadata of the files. I suppose you have just opened and indexed the pdf as is. So you stored bynary data and stop. For my applciation I've

Re: Indexing multiple documents in Solr/SolrCell

2009-11-16 Thread Sascha Szott
Hi, the problem you've described -- an integration of DataImportHandler (to traverse the XML file and get the document urls) and Solr Cell (to extract content afterwards) -- is already addressed in issue SOLR-1358 (https://issues.apache.org/jira/browse/SOLR-1358). Best, Sascha Kerwin

Re: javabin in .NET?

2009-11-16 Thread Mauricio Scheffer
Yep, I think I mostly nailed the unmarshalling. Need more tests though. And then integrate it to SolrNet. Is there any way (or are there any plans) to have an update handler that accepts javabin? 2009/11/16 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com start with a JavabinDecoder only so

Re: Tika trouble

2009-11-16 Thread Markus Jelsma - Buyways B.V.
Thank you for your reply. I had the assumption Tika could also extract text content from various documenttypes instead of only meta data. I'll use the CLI tools from http://www.foolabs.com/xpdf/ to extract text manually. - Markus Jelsma Buyways B.V. Technisch Architect

EmbeddedSolrServer: java.lang.NoClassDefFoundError: javax/servlet/ServletRequest

2009-11-16 Thread Leonardo Souza
Hi, I'm newbie using Solr and I'd like to run some tests against our data set. I have successful tested Solr + Cell using the standard Http Solr server and now we need to test the Embedded solution and when a try to start the embedded server i get this exception: INFO: registering core:

Experiences from migrating from FAST to Solr

2009-11-16 Thread Morten Tvenning
We'd like to share with the solr users a recent news item from http://sesat.no Sesam has spent some three months migrating all its indexes from FAST to Solr+Lucene. It was a joyful experience and allowed us to implement a number of improvements we never could under FAST. We've written a

RE: solr stops running periodically

2009-11-16 Thread Fuad Efendi
By that I mean that the java/tomcat process just disappears. I had similar problem when I started Tomcat via SSH, and then I improperly closed SSH without exit command. In some cases (OutOfMemory) memory is not enough to generate log (or CPU can be overloaded by Garbage Collector to such

Solr - Load Increasing.

2009-11-16 Thread kalidoss
Hi All. My server solr box cpu utilization increasing b/w 60 to 90% and some time solr is getting down and we are restarting it manually. No of documents in solr 30 laks. No of add/update requrest solr 30 thousand / day. Avg of every 30 minutes around 500 writes. No of search

Re: DataImportHandler Questions-Load data in parallel and temp tables

2009-11-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Mon, Nov 16, 2009 at 6:25 PM, amitj am...@ieee.org wrote: Is there also a way we can include some kind of annotation on the schema field and send the data retrieved for that field to an external application. We have a requirement where we require some data fields (out of the fields for an

Re: javabin in .NET?

2009-11-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Mon, Nov 16, 2009 at 5:55 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Yep, I think I mostly nailed the unmarshalling. Need more tests though. And then integrate it to SolrNet. Is there any way (or are there any plans) to have an update handler that accepts javabin? There is

Re: Solr 1.3 query and index perf tank during optimize

2009-11-16 Thread Jerome L Quinn
Otis Gospodnetic otis_gospodne...@yahoo.com wrote on 11/13/2009 11:15:43 PM: Let's take a step back. Why do you need to optimize? You said: As long as I'm not optimizing, search and indexing times are satisfactory. :) You don't need to optimize just because you are continuously adding

Re: Stop solr without losing documents

2009-11-16 Thread Michael
On Fri, Nov 13, 2009 at 4:09 PM, Chris Hostetter hossman_luc...@fucit.org wrote: please don't kill -9 ... it's grossly overkill, and doesn't give your [ ... snip ... ] Alternately, you could take advantage of the enabled feature from your client (just have it test the enabled url ever N updates

ext3 vs ext4 vs xfs for solr....recommendations needed...

2009-11-16 Thread William Pierce
Folks: For those of your experienced linux-solr hands, I am seeking recommendations for which file system you think would work best with solr. We are currently running with Ubuntu 9.04 on an amazon ec2 instance. The default file system I think is ext3. I am of course seeking, of course,

Re: Stop solr without losing documents

2009-11-16 Thread Michael
On Fri, Nov 13, 2009 at 11:02 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: So I think the question is really: If I stop the servlet container, does Solr issue a commit in the shutdown hook in order to ensure all buffered docs are persisted to disk before the JVM exits. Exactly

Re: Stop solr without losing documents

2009-11-16 Thread Michael
On Fri, Nov 13, 2009 at 11:45 PM, Lance Norskog goks...@gmail.com wrote: I would go with polling Solr to find what is not yet there. In production, it is better to assume that things will break, and have backstop janitors that fix them. And then test those janitors regularly. Good idea,

Re: ext3 vs ext4 vs xfs for solr....recommendations needed...

2009-11-16 Thread Mark Miller
William Pierce wrote: Folks: For those of your experienced linux-solr hands, I am seeking recommendations for which file system you think would work best with solr. We are currently running with Ubuntu 9.04 on an amazon ec2 instance. The default file system I think is ext3. I am of

Index time boosting troubles

2009-11-16 Thread Jón Helgi Jónsson
Hi, I had working index time boosting on documents like so: doc boost=10.0 Everything was great until I made some changes that I thought where no related to the doc boost but after that my doc boosting appears to be missing. I'm having a tough time debugging this and didn't have the sense to

Re: Some guide about setting up local/geo search at solr

2009-11-16 Thread Bertie Shen
Localsolr is not in contrib yet. I am interested in knowing whether currently there is a better solution for setting up a local search. Cheers. On Sun, Nov 15, 2009 at 9:25 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Nota bene: My understanding is the external versions of Local

$DeleteDocbyQuery in solr 1.4 is not working

2009-11-16 Thread Mark Ellul
Hi, I have added a deleted field in my database, and am using the Dataimporthandler to add rows to the index... I am using solr 1.4 I have added my the deleted field to the query and the RegexTransformer... and the field definition below field column=$deleteDocByQuery regex=^true$

Config Relationship between MaxWarmingSearchers and StreamingUpdateSolrServer

2009-11-16 Thread Erik Earle
My application updates the master index frequently, sometimes very frequently. Is there a good rule of thumb for configuring: 1) maxWarmingSearchers in the master 2) the SUSS thread pool size (and perhaps queue length) to match the server settings?

Re: SolrJ looping until I get all the results

2009-11-16 Thread Mck
On Mon, 2009-11-02 at 19:49 -0500, Paul Tomblin wrote: Here's what I'm thinking final static int MAX_ROWS = 100; int start = 0; query.setRows(MAX_ROWS); while (true) { QueryResponse resp = solrChunkServer.query(query); SolrDocumentList docs = resp.getResults(); if (docs.size()

Re: Wildcards at the Beginning of a Search.

2009-11-16 Thread Jay Hill
There is a text_rev field type in the example schema.xml file in the official release of 1.4. It uses the ReversedWildcardFilterFactory to revers a field. You can do a copyField from the field you want to use for leading wildcard searches to a field using the text_rev field, and then do a regular

PhP, Solr and Delta Imports

2009-11-16 Thread Pablo Ferrari
Hello, I have an already working Solr service based un full imports connected via php to a Zend Framework MVC (I connect it directly to the Controller). I use the SolrClient class for php which is great: http://www.php.net/manual/en/class.solrclient.php For now on, every time I want to edit a

Re: PhP, Solr and Delta Imports

2009-11-16 Thread Israel Ekpo
On Mon, Nov 16, 2009 at 2:49 PM, Pablo Ferrari pabs.ferr...@gmail.comwrote: Hello, I have an already working Solr service based un full imports connected via php to a Zend Framework MVC (I connect it directly to the Controller). I use the SolrClient class for php which is great:

Re: Config Relationship between MaxWarmingSearchers and StreamingUpdateSolrServer

2009-11-16 Thread Otis Gospodnetic
Hi Erik, I didn't look at the source code, and I think the javadoc for SUSS doesn't mention it, but I am under the impression that the number of threads to use should roughly match the number of CPU cores on the master. The maxWarmingSearchers should only be relevant to slaves, not masters,

Re: Solr 1.3 query and index perf tank during optimize

2009-11-16 Thread Otis Gospodnetic
I'd have to verify this to be sure, but I *believe* deleted docs data is expunged during index segment merges. See https://issues.apache.org/jira/browse/SOLR-1275 Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

Re: Solr - Load Increasing.

2009-11-16 Thread Otis Gospodnetic
Hi, Your autoCommit settings are very aggressive. I'm guessing that's what's causing the CPU load. btw. what is laks? Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From:

Re: Solr - Load Increasing.

2009-11-16 Thread Walter Underwood
Probably lakh: 100,000. So, 900k qpd and 3M docs. http://en.wikipedia.org/wiki/Lakh wunder On Nov 16, 2009, at 2:17 PM, Otis Gospodnetic wrote: Hi, Your autoCommit settings are very aggressive. I'm guessing that's what's causing the CPU load. btw. what is laks? Otis -- Sematext

RE: Solr - Load Increasing.

2009-11-16 Thread Sudarsan, Sithu D.
Hi, Lakh or Lac - 100,000 Crore - 100,00,000 (ten million) Commonly used in India Sincerely, Sithu D Sudarsan -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Monday, November 16, 2009 5:22 PM To: solr-user@lucene.apache.org Subject: Re: Solr -

Re: Solr - Load Increasing.

2009-11-16 Thread Israel Ekpo
On Mon, Nov 16, 2009 at 5:22 PM, Walter Underwood wun...@wunderwood.orgwrote: Probably lakh: 100,000. So, 900k qpd and 3M docs. http://en.wikipedia.org/wiki/Lakh wunder On Nov 16, 2009, at 2:17 PM, Otis Gospodnetic wrote: Hi, Your autoCommit settings are very aggressive. I'm

Re: Solr - Load Increasing.

2009-11-16 Thread Shashi Kant
I think it would be useful for members of this list to realize that not everyone uses the same metrology and terms. It is very easy for Americans to use the imperial system and presume everyone does the same; Europeans to use the metric system etc. Hopefully members on this list would be

Re: Solr - Load Increasing.

2009-11-16 Thread Tom Alt
Nice to learn a new word for the day! But to answer your question, or at least part of it, I don't really think you want a configuration like autoCommit maxDocs1/maxDocs maxTime10/maxTime /autoCommit Committing every doc, and every 10 milliseconds? That's just asking for

Re: exclude some fields from copying dynamic fields | schema.xml

2009-11-16 Thread Lance Norskog
Oh well. There is no direct feature for controlling what is copied. If you use the DataImportHandler, you can include Java plugins or Javascript/JRuby/Groovy code to do the copying. On Sun, Nov 15, 2009 at 9:37 PM, Vicky_Dev vikrantv_shirbh...@yahoo.co.in wrote: Thanks for response Defining

Re: Newbie Solr questions

2009-11-16 Thread yz5od2
thanks, so there is no way to create custom documents/field via the SolrJ client API @ runtime.? On Nov 16, 2009, at 4:49 PM, Lance Norskog wrote: here is no way to create custom documents/fields via the SolrJ client @ runtime.

Re: Newbie Solr questions

2009-11-16 Thread Lance Norskog
Sorry, I did not answer the question. Yes, that's right. SolrJ can only change the documents in the index. It has no power over the metadata. On Mon, Nov 16, 2009 at 4:00 PM, yz5od2 woods5242-outdo...@yahoo.com wrote: thanks, so there is no way to create custom documents/field via the SolrJ

core size

2009-11-16 Thread Phil Hagelberg
I'm are planning out a system with large indexes and wondering what kind of performance boost I'd see if I split out documents into many cores rather than using a single core and splitting by a field. I've got about 500GB worth of indexes ranging from 100MB to 50GB each. I'm assuming if we split

Replication admin page auto-reload

2009-11-16 Thread Jay Hill
The replication admin page on slaves used to have an auto-reload set to reload every few seconds. In the official 1.4 release this doesn't seem to be working, but it does in a nightly build from early June. Was this changed on purpose or is this a bug? I looked through CHANGES.txt to see if

Re: Some guide about setting up local/geo search at solr

2009-11-16 Thread Otis Gospodnetic
Not that I know. It's not in contrib, but if you apply that patch from http://wiki.apache.org/solr/SpatialSearch I am guessing it puts things in contrib/spatial. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

Re: core size

2009-11-16 Thread Otis Gospodnetic
If an index fits in memory, I am guessing you'll see the speed change roughly proportionally to the size of the index. If an index does not fit into memory (i.e. disk head has to run around the disk to look for info), then the improvement will be even greater. I haven't explicitly tested this

Re: Replication admin page auto-reload

2009-11-16 Thread Erik Hatcher
On Nov 17, 2009, at 2:48 AM, Jay Hill wrote: The replication admin page on slaves used to have an auto-reload set to reload every few seconds. In the official 1.4 release this doesn't seem to be working, but it does in a nightly build from early June. Was this changed on purpose or is