Hi,
I am new to this forum and would like to know if the function described
below has been developed or exists in Solr. If it does not exist, is it a
good Idea and can I contribute.
We need to index multiple documents with different formats. So we use Solr
with Tika (Solr Cell).
Question:
Can
Anyone has a clue?
List,
I somehow fail to index certain pdf files using the
ExtractingRequestHandler in Solr 1.4 with default solrconfig.xml but
modified schema. I have a very simple schema for this case using only
and ID field, a timestamp field and two dynamic fields; ignored_* and
What I could try to say is that if you want to index a Pdf, then you should
use a Pdf extractor. A Pdf Extractor is able to extract the text content and
the metadata of the files. I suppose you have just opened and indexed the
pdf as is. So you stored bynary data and stop. For my applciation I've
Hi,
the problem you've described -- an integration of DataImportHandler (to
traverse the XML file and get the document urls) and Solr Cell (to
extract content afterwards) -- is already addressed in issue SOLR-1358
(https://issues.apache.org/jira/browse/SOLR-1358).
Best,
Sascha
Kerwin
Yep, I think I mostly nailed the unmarshalling. Need more tests though. And
then integrate it to SolrNet.
Is there any way (or are there any plans) to have an update handler that
accepts javabin?
2009/11/16 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com
start with a JavabinDecoder only so
Thank you for your reply.
I had the assumption Tika could also extract text content from various
documenttypes instead of only meta data. I'll use the CLI tools from
http://www.foolabs.com/xpdf/ to extract text manually.
-
Markus Jelsma Buyways B.V.
Technisch Architect
Hi,
I'm newbie using Solr and I'd like to run some tests against our data set. I
have successful tested Solr + Cell using the standard Http Solr server
and now we need to test the Embedded solution and when a try to start the
embedded server i get this exception:
INFO: registering core:
We'd like to share with the solr users a recent news item from http://sesat.no
Sesam has spent some three months migrating all its indexes from FAST to
Solr+Lucene.
It was a joyful experience and allowed us to implement a number of improvements
we never could under FAST.
We've written a
By that I mean that the java/tomcat
process just disappears.
I had similar problem when I started Tomcat via SSH, and then I improperly
closed SSH without exit command.
In some cases (OutOfMemory) memory is not enough to generate log (or CPU can
be overloaded by Garbage Collector to such
Hi All.
My server solr box cpu utilization increasing b/w 60 to 90% and
some time solr is getting down and we are restarting it manually.
No of documents in solr 30 laks.
No of add/update requrest solr 30 thousand / day. Avg of every 30
minutes around 500 writes.
No of search
On Mon, Nov 16, 2009 at 6:25 PM, amitj am...@ieee.org wrote:
Is there also a way we can include some kind of annotation on the schema
field and send the data retrieved for that field to an external application.
We have a requirement where we require some data fields (out of the fields
for an
On Mon, Nov 16, 2009 at 5:55 PM, Mauricio Scheffer
mauricioschef...@gmail.com wrote:
Yep, I think I mostly nailed the unmarshalling. Need more tests though. And
then integrate it to SolrNet.
Is there any way (or are there any plans) to have an update handler that
accepts javabin?
There is
Otis Gospodnetic otis_gospodne...@yahoo.com wrote on 11/13/2009 11:15:43
PM:
Let's take a step back. Why do you need to optimize? You said: As
long as I'm not optimizing, search and indexing times are
satisfactory. :)
You don't need to optimize just because you are continuously adding
On Fri, Nov 13, 2009 at 4:09 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
please don't kill -9 ... it's grossly overkill, and doesn't give your
[ ... snip ... ]
Alternately, you could take advantage of the enabled feature from your
client (just have it test the enabled url ever N updates
Folks:
For those of your experienced linux-solr hands, I am seeking recommendations
for which file system you think would work best with solr. We are currently
running with Ubuntu 9.04 on an amazon ec2 instance. The default file system I
think is ext3.
I am of course seeking, of course,
On Fri, Nov 13, 2009 at 11:02 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
So I think the question is really:
If I stop the servlet container, does Solr issue a commit in the shutdown
hook in order to ensure all buffered docs are persisted to disk before the
JVM exits.
Exactly
On Fri, Nov 13, 2009 at 11:45 PM, Lance Norskog goks...@gmail.com wrote:
I would go with polling Solr to find what is not yet there. In
production, it is better to assume that things will break, and have
backstop janitors that fix them. And then test those janitors
regularly.
Good idea,
William Pierce wrote:
Folks:
For those of your experienced linux-solr hands, I am seeking recommendations
for which file system you think would work best with solr. We are currently
running with Ubuntu 9.04 on an amazon ec2 instance. The default file system
I think is ext3.
I am of
Hi,
I had working index time boosting on documents like so: doc boost=10.0
Everything was great until I made some changes that I thought where no
related to the doc boost but after that my doc boosting appears to be
missing.
I'm having a tough time debugging this and didn't have the sense to
Localsolr is not in contrib yet. I am interested in knowing whether
currently there is a better solution for setting up a local search.
Cheers.
On Sun, Nov 15, 2009 at 9:25 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Nota bene:
My understanding is the external versions of Local
Hi,
I have added a deleted field in my database, and am using the
Dataimporthandler to add rows to the index...
I am using solr 1.4
I have added my the deleted field to the query and the RegexTransformer...
and the field definition below
field column=$deleteDocByQuery
regex=^true$
My application updates the master index frequently, sometimes very frequently.
Is there a good rule of thumb for configuring:
1) maxWarmingSearchers in the master
2) the SUSS thread pool size (and perhaps queue length) to match the server
settings?
On Mon, 2009-11-02 at 19:49 -0500, Paul Tomblin wrote:
Here's what I'm thinking
final static int MAX_ROWS = 100;
int start = 0;
query.setRows(MAX_ROWS);
while (true)
{
QueryResponse resp = solrChunkServer.query(query);
SolrDocumentList docs = resp.getResults();
if (docs.size()
There is a text_rev field type in the example schema.xml file in the
official release of 1.4. It uses the ReversedWildcardFilterFactory to revers
a field. You can do a copyField from the field you want to use for leading
wildcard searches to a field using the text_rev field, and then do a regular
Hello,
I have an already working Solr service based un full imports connected via
php to a Zend Framework MVC (I connect it directly to the Controller).
I use the SolrClient class for php which is great:
http://www.php.net/manual/en/class.solrclient.php
For now on, every time I want to edit a
On Mon, Nov 16, 2009 at 2:49 PM, Pablo Ferrari pabs.ferr...@gmail.comwrote:
Hello,
I have an already working Solr service based un full imports connected via
php to a Zend Framework MVC (I connect it directly to the Controller).
I use the SolrClient class for php which is great:
Hi Erik,
I didn't look at the source code, and I think the javadoc for SUSS doesn't
mention it, but I am under the impression that the number of threads to use
should roughly match the number of CPU cores on the master. The
maxWarmingSearchers should only be relevant to slaves, not masters,
I'd have to verify this to be sure, but I *believe* deleted docs data is
expunged during index segment merges.
See
https://issues.apache.org/jira/browse/SOLR-1275
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
Hi,
Your autoCommit settings are very aggressive. I'm guessing that's what's
causing the CPU load.
btw. what is laks?
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
From:
Probably lakh: 100,000.
So, 900k qpd and 3M docs.
http://en.wikipedia.org/wiki/Lakh
wunder
On Nov 16, 2009, at 2:17 PM, Otis Gospodnetic wrote:
Hi,
Your autoCommit settings are very aggressive. I'm guessing that's what's
causing the CPU load.
btw. what is laks?
Otis
--
Sematext
Hi,
Lakh or Lac - 100,000
Crore - 100,00,000 (ten million)
Commonly used in India
Sincerely,
Sithu D Sudarsan
-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Monday, November 16, 2009 5:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr -
On Mon, Nov 16, 2009 at 5:22 PM, Walter Underwood wun...@wunderwood.orgwrote:
Probably lakh: 100,000.
So, 900k qpd and 3M docs.
http://en.wikipedia.org/wiki/Lakh
wunder
On Nov 16, 2009, at 2:17 PM, Otis Gospodnetic wrote:
Hi,
Your autoCommit settings are very aggressive. I'm
I think it would be useful for members of this list to realize that not
everyone uses the same metrology and terms.
It is very easy for Americans to use the imperial system and presume
everyone does the same; Europeans to use the metric system etc. Hopefully
members on this list would be
Nice to learn a new word for the day!
But to answer your question, or at least part of it, I don't really think
you want a configuration like
autoCommit
maxDocs1/maxDocs
maxTime10/maxTime
/autoCommit
Committing every doc, and every 10 milliseconds? That's just asking for
Oh well. There is no direct feature for controlling what is copied.
If you use the DataImportHandler, you can include Java plugins or
Javascript/JRuby/Groovy code to do the copying.
On Sun, Nov 15, 2009 at 9:37 PM, Vicky_Dev
vikrantv_shirbh...@yahoo.co.in wrote:
Thanks for response
Defining
thanks, so there is no way to create custom documents/field via the
SolrJ client API @ runtime.?
On Nov 16, 2009, at 4:49 PM, Lance Norskog wrote:
here is no way to create custom documents/fields
via the SolrJ client @ runtime.
Sorry, I did not answer the question. Yes, that's right. SolrJ can
only change the documents in the index. It has no power over the
metadata.
On Mon, Nov 16, 2009 at 4:00 PM, yz5od2 woods5242-outdo...@yahoo.com wrote:
thanks, so there is no way to create custom documents/field via the SolrJ
I'm are planning out a system with large indexes and wondering what kind
of performance boost I'd see if I split out documents into many cores
rather than using a single core and splitting by a field. I've got about
500GB worth of indexes ranging from 100MB to 50GB each.
I'm assuming if we split
The replication admin page on slaves used to have an auto-reload set to
reload every few seconds. In the official 1.4 release this doesn't seem to
be working, but it does in a nightly build from early June. Was this changed
on purpose or is this a bug? I looked through CHANGES.txt to see if
Not that I know. It's not in contrib, but if you apply that patch from
http://wiki.apache.org/solr/SpatialSearch I am guessing it puts things in
contrib/spatial.
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
If an index fits in memory, I am guessing you'll see the speed change roughly
proportionally to the size of the index. If an index does not fit into memory
(i.e. disk head has to run around the disk to look for info), then the
improvement will be even greater. I haven't explicitly tested this
On Nov 17, 2009, at 2:48 AM, Jay Hill wrote:
The replication admin page on slaves used to have an auto-reload set
to
reload every few seconds. In the official 1.4 release this doesn't
seem to
be working, but it does in a nightly build from early June. Was this
changed
on purpose or is
42 matches
Mail list logo