On Wed, Jan 12, 2011 at 11:50 AM, Dinesh mdineshkuma...@karunya.edu.in wrote:
I have installed and tested the sample xml file and tried indexing..
everything went successful and when i tried with log files i got an error..
Please provide details of what you are doing, and of the error
On Wed, Jan 12, 2011 at 12:10 PM, Dinesh mdineshkuma...@karunya.edu.in wrote:
if i convert it to CSV or XML then it will be time consuming cause the
indexing and getting data out of it should be real time.. is there any way i
can do other than this.. if not what are the ways i can convert them
kmf,
after a first read .. i would say, that sound's a bit like
http://wiki.apache.org/solr/FieldCollapsing ? But that depends mainly on
your current schema, take a look and let us know, if it helps :)
Regards
Stefan
On Tue, Jan 11, 2011 at 8:06 PM, kmf kfole...@gmail.com wrote:
I currently
i got some idea like creating a DIH and then doing with that.. thanks every
one for the help.. hope i'll create an regex DIH i guess that's right..
--
View this message in context:
http://lucene.472066.n3.nabble.com/Input-raw-log-file-tp2210043p2239947.html
Sent from the Solr - User mailing
Guten Morgen Solr-Users,
In Deutschland ist es Morgen daher diese Begrüßung.
Ich habe ein kleines Problem mit Solr.
Ich habe einen Index, erstellt von einem anderen Programm, es ist ein Lucene
Index und kann von Luke problemlos gelesen werden.
Diesen möchte ich nun jedoch mit Solr durchsuchen
satya,
nice to hear, that it work's :)
on your question to similar words: i would say no - suggestions are only
generated based on available records, and afaik only if the given
word/phrase is misspelled. Perhaps MoreLikeThis could help you, but not sure
on this - especially because you're
Can anyone explain me how to create regex DataImportHandler..
--
View this message in context:
http://lucene.472066.n3.nabble.com/Regex-DataImportHandler-tp2240084p2240084.html
Sent from the Solr - User mailing list archive at Nabble.com.
Has anyone had any success using the DataImportHandler on Webshpere 6.1
I am getting the following exception from Websphere when viewing the
DataImport Development Console in the browser. The ajax call to retrieve the
dataconfig.xml fails. The thing is that if you do an import the import
On Wed, Jan 12, 2011 at 3:07 PM, Dinesh mdineshkuma...@karunya.edu.in wrote:
Can anyone explain me how to create regex DataImportHandler..
[...]
Dear Dinesh,
No offence, but please do some basic leg work on your own
first, and then ask more specific questions.
Did you read the Hathi trust
Hi stefan,
I need the words from the index record itself. If java is given
then the relevant or similar or near words in the index should be shown.
Even the given keyword is true... can it be possible???
ex:-
ya i did.. i'm trying it.. still for a better solution i asked...
--
View this message in context:
http://lucene.472066.n3.nabble.com/Regex-DataImportHandler-tp2240084p2240295.html
Sent from the Solr - User mailing list archive at Nabble.com.
Corruption should only happen if 1) we have a bug in Lucene (but we
work hard to fix such bugs, though, LUCENE-2593, fixed in 2.9.4, is a
recent case) or 2) there are hardware problems on the machine.
Mike
On Tue, Jan 11, 2011 at 10:02 AM, Stéphane Delprat
stephane.delp...@blogspirit.com wrote:
Hello,
I'm indexing some content (articles) whose text I cannot store in its original
form for copyright reason. So I can index the content, but cannot store it.
However, I need snippets and search term highlighting.
Any way to accomplish this elegantly? Or even not so elegantly?
Here
Has anyone had any success using the DataImportHandler on Webshpere 6.1
Below are the logs for a call to reload-config.
I have turned on debug and stepped through the code and the
dataImportHandler correctly reloads the config and the response gets written
out to the http response without any
Dinesh,
it will stay 'real time' even if you convert it. Converting should be
done in the millisecond range if at all measureable (e.g. if you apply
streaming).
Beware: To use the real features you'll need the latest trunk of solr IMHO.
I've done similar log-feeding stuff here (with code!):
Otis,
just interested in .. storing the full text is not allowed, but splitting up
in separate sentences is okay?
while you think about using the sentences only as secondary/additional
source, maybe it would help to search in the sentences itself, or would that
give misleading results in your
Hi Dennis, thanks a lot for pointing the problem. It works.
On Tue, Jan 11, 2011 at 11:50 PM, Dennis Gearon gear...@sbcglobal.netwrote:
You didn't happen to notice that you have one field names
RestaurantLocation and
another named RestaurantName, did you?
You must be submitting
Hi Stefan,
Yes, splitting in separate sentences (and storing them) is OK because with a
bunch of sentences you can't really reconstruct the original article unless you
know which order to put them in.
Searching against the sentence won't work for queries like foo AND bar because
this should
Have you made any progress? Since the AnalyzingQueryParser doesn't inherit
from QParserPlugin solr doesn't want to use it but I guess we could implement a
similar parser that does inherit from QParserPlugin?
Switching parser seems to be what is needed? Has really no one solved this
before?
Nevermind, I found it. You can add xml children to your plugin declaration
in solrconfig.xml and then retrieve them by casting the namedList arguments
received by your plugin at initialitzaion to SolrParams.
On Tue, Jan 11, 2011 at 10:28 AM, dante stroe dante.st...@gmail.com wrote:
Hi,
Is
Hi,
I'm trying to find the source code for class: JaspellTernarySearchTrie. It's
supposed to be used for spelling suggestions.
It's referenced in the javadoc:
http://lucene.apache.org/solr/api/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.html
I realize this is a dumb
Hi Otis,
I think you can get what you want by doing the first stage retrieval, and then
in the second stage, add required constraint(s) to the query for the matching
docid(s), and change the AND operators in the original query to OR.
Coordination will cause the best snippet(s) to rise to the
Hi,
These two links helped me to solve the problem.
https://issues.apache.org/jira/browse/SOLR-1154
http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node
Thanks,
SRD
--
View this message in context:
I'm attempting to calculate term frequency across multiple documents
in Solr. I've been able to use TermVectorComponent to get this data on
a per-document basis but have been unable to find a way to do it for
multiple documents -- that is, get a list of terms appearing in the
documents and how
Otis Gospodnetic wrote:
Are people using Solr trunk in serious production environments? I suspect
the
answer is yes, just want to see if there are any gotchas/warnings.
Yes, since it seemed the best way to get edismax with this patch[1]; and to get
the more update-friendly MergePolicy[2].
Hi Gora,
Unfortunately reorganizing the data is not an option for me.
Multiple databases exist and a third party is taking care of
populating them. Once a database reaches a certain size, a switch
occurs and a new database is created with the same table structure.
Gora Mohanty-3 wrote:
I
What's the syntax for spatial for that version of Solr?
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from
I got another corruption.
It sure looks like it's the same type of error. (on a different field)
It's also not linked to a merge, since the segment size did not change.
*** good segment :
1 of 9: name=_ncc docCount=1841685
compound=false
hasProx=true
numFiles=9
size
i try this:
http://host:port
/solr/select?q=YOUR_QUERYstats=onstats.field=amountf.amount.stats.facet=currencyrows=0
and this:
http://host:portsolr
/select?q=amount_us:*+OR+amount_eur:*[+OR+amount_...:*]stats=onstats.field=amount_usdstats.field=amount_eur[stats.field=amount_...]rows=0
of
Curious... is it always a docFreq=1 != num docs seen 0 + num docs deleted 0?
It looks like new deletions were flushed against the segment (del file
changed from _ncc_22s.del to _ncc_24f.del).
Are you hitting any exceptions during indexing?
Mike
On Wed, Jan 12, 2011 at 10:33 AM, Stéphane
my field Type is double maybe sint is better ? but i need double ... =(
--
View this message in context:
http://lucene.472066.n3.nabble.com/Tuning-StatsComponent-tp2225809p2241903.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi all,
I'm getting started with a master/slave configuration for two solr
instances. Two distinguish between 'master' and 'slave', I've set he system
properties (e.g. -Dmaster.enabled) and using the same 'solrconfig.xml'.
I can see via the system properties admin UI that the jvm (and thus
Hi Steve,
- Original Message
From: Steven A Rowe sar...@syr.edu
Subject: RE: Not storing, but highlighting from document sentences
I think you can get what you want by doing the first stage retrieval, and
then
in the second stage, add required constraint(s) to the query for
That's correct. Only 1 instance should be writing. You should be able to
point
multiple Solr read-only instances to the same physical read-only index. I
don't
recall trying this recently, though.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search ::
I have found a workaround for this.
1. change the entry in solrconfig.xml for the DataImportHandler by removing
the slash from the name, like this requestHandler name=dataimport...
2. when making the request to the SolrJ server, don't use a slash in the qt
parameter, i.e.
solrParameters.set(qt,
Sebastian,
If I remember my regular expressions, that - and / are really just that. The
stuff inside angle brackets means any of the characters between [ and ]. -
and / are just two of those characters, along with newline, space, comma, etc.
Otis
Sematext :: http://sematext.com/ :: Solr
Hi Will,
I don't think we have a clean master or slave label anywhere in the Admin
UI.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Will Milspec will.mils...@gmail.com
To:
Well, slaves to show different things in the replication.jsp page.
Master http://10cc:8080/solr/replication
Poll Interval 00:00:10
Local Index Index Version: 1294666552434, Generation: 2515
Location: /var/lib/solr/data/index
Size: 4.65 GB
Times Replicated Since
Dennis,
Join #solr on Freenode.
But it's not necessarily any livelier than this ML. It depends who's actively
on.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Dennis Gearon
It isn't exactly what you want, but did you try with the onlyMorePopular
parameter?
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular
Regards,
Juan Grande
On Wed, Jan 12, 2011 at 7:29 AM, satya swaroop satya.yada...@gmail.comwrote:
Hi stefan,
I need the
What's the use-case you're trying to solve? Because if you're
still showing results to the user, you're taking information away
from them. Where are you expecting to get the list? If you try
to return the entire list, you're going to pay the penalty
of creating the entire list and transmitting it
Hi Steven,
if I understand correctly, you are suggesting query execution in two
phases: first execute query on whole article index core (where whole
articles are indexed, but not stored) to get article IDs (for articles
which match original query). Then for each match in article core:
change the
Maybe there is a better solution, but I think that you can solve this
problem using facets. You will get the number of documents where each term
appears. Also, you can filter a specific set of terms by entering a query
like +field:term1 OR +field:term2 OR ..., or using the facet.query
parameter.
Some times I've _considered_ trying to do this (but generally decided it
wasn't worth it) was when I didn't want those documents below the
threshold to show up in the facet values. In my application the facet
counts are sometimes very pertinent information, that are sometimes not
quite as
I think you can get what you want by doing the first stage retrieval,
and then in the second stage, add required constraint(s) to the query
for the matching docid(s), and change the AND operators in the
original query to OR. Coordination will cause the best snippet(s) to
rise to the
I have found where a root entity has completed processing and added the
logic to clear the entity's cache at that point (didn't change any of the
logic for clearing all entity caches once the import has completed). I have
also created an enhancement request found at
Hi Tomislav,
if I understand correctly, you are suggesting query execution in two
phases: first execute query on whole article index core (where whole
articles are indexed, but not stored) to get article IDs (for articles
which match original query). Then for each match in article core:
Hi all,
Thanks for the feedback. I've checked the code with a few different inputs
and believe I have found a bug.
Could someone comment as to whether I'm missing something? I will file go
ahead and file it if someone can attest looks like a bug.
Bug Summary:
==
- Admin UI
On Wed, Jan 12, 2011 at 8:49 PM, alexei achugu...@gmail.com wrote:
[...]
Unfortunately reorganizing the data is not an option for me.
Multiple databases exist and a third party is taking care of
populating them. Once a database reaches a certain size, a switch
occurs and a new database is
Hello,
I know you can explicitly specify list of fields returned via
fl=field1,field2,field3
Is there a way to specify return all fields but field1 and field2?
Thanks,
Dmitriy
On Thu, Jan 13, 2011 at 1:11 AM, Dmitriy Shvadskiy dshvads...@gmail.com wrote:
Hello,
I know you can explicitly specify list of fields returned via
fl=field1,field2,field3
Is there a way to specify return all fields but field1 and field2?
Not that I know of, but below is an earlier
Thanks Gora
The workaround of loading fields via LukeRequestHandler and building fl from
it will work for what we need. However it takes 15 seconds per core and we
have 15 cores.
The query I'm running is /admin/luke?show=schema
Is there a way to limit query to return just fields?
Thanks,
On Jan 12, 2011, at 12:53 , Dmitriy Shvadskiy wrote:
Thanks Gora
The workaround of loading fields via LukeRequestHandler and building fl from
it will work for what we need. However it takes 15 seconds per core and we
have 15 cores.
The query I'm running is /admin/luke?show=schema
Is
We've created an index from a number of different documents that are
supplied by third parties. We want the index to only contain UTF-8
encoded characters. I have a couple questions about this:
1) Is there any way to be sure during indexing (by setting something
in the solr configuration?) that
I'm running into a problem with StopFilterFactory in conjunction with (e)dismax
queries that have a mix of fields, only some of which use StopFilterFactory.
It seems that if even 1 field on the qf parameter does not use
StopFilterFactory, then stop words are not removed when searching any
I haven't used edismax but i can imagine its a feature. Ths is because
inconstent use of stopwords in the analyzers of the fields specified in qf can
yield really unexpected results because of the mm parameter.
In dismax, if one analyzer removed stopwords and the other doesn't the mm
parameter
This is supposed to be dealt with outside the index. All input must be UTF-8
encoded. Failing to do so will give unexpected results.
We've created an index from a number of different documents that are
supplied by third parties. We want the index to only contain UTF-8
encoded characters. I
Have used edismax and Stopword filters as well. But usually use the fq
parameter e.g. fq=title:the life and never had any issues.
Can you turn on the debugQuery and check whats the Query formed for all the
combinations you mentioned.
Regards,
Jayendra
On Wed, Jan 12, 2011 at 5:19 PM, Dyer,
Have used edismax and Stopword filters as well. But usually use the fq
parameter e.g. fq=title:the life and never had any issues.
That is because filter queries are not relevant for the mm parameter which is
being used for the main query.
Can you turn on the debugQuery and check whats the
Web page returns the following message:
Fatal error: Uncaught exception 'Exception' with message '0 Status:
Communication Error'
This happens in a dev environment, everything on one machine: Windows 7, WAMP,
CakePHP, Tomcat, Solr, and SolrPHPClient. Error message also references line
334 of
On 12.01.2011, at 23:50, Eric wrote:
Web page returns the following message:
Fatal error: Uncaught exception 'Exception' with message '0 Status:
Communication Error'
This happens in a dev environment, everything on one machine: Windows 7,
WAMP, CakePHP, Tomcat, Solr, and SolrPHPClient.
Had the same issues with international characters and wildcard searches.
One workaround we implemented, was to index the field with and without the
ASCIIFoldingFilterFactory.
You would have an original field and one with english equivalent to be used
during searching.
Wildcard searches with
Checkout and build the code from -
https://svn.apache.org/repos/asf/lucene/dev/trunk/
Class -
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.java
Here is what debug says each of these queries parse to:
1. q=lifedefType=edismaxqf=Title ... returns 277,635 results
2. q=the lifedefType=edismaxqf=Title ... returns 277,635 results
3. q=lifedefType=edismaxqf=Title Contributor ... returns 277,635
4. q=the lifedefType=edismaxqf=Title Contributor
converting on the fly is not supported by Solr but should be relative
easy in Java.
Also scanning is relative simple (accept only a range). Detection too:
http://www.mozilla.org/projects/intl/chardet.html
We've created an index from a number of different documents that are
supplied by third
Hi all!
Would you mind to write about your Solr project if it has an uncommon
approach or if it is somehow exciting?
I would like to extend my list for a new blog post.
Examples I have in mind at the moment are:
loggly (real time + big index),
solandra (nice solr + cassandra combination),
haiti
I was unable to get it to compile. From the author, got one reply about the
benefits of the compiled version. After submitting my errors to him, have not
yet received a reply.
##Weird thing 'on the way to the forum' today.##
I remember reading an article a couple of days ago which said the
Here's another thread on the subject:
http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-
td493483.html
And slightly off topic: you'd also might want to look at using common grams,
they are really useful for phrase queries that contain stopwords.
Resolved! In a rare flash of clarity, I removed the @ preceeding the
file_get_contents call. Doing so made it apparent that my app was passing an
incorrect Solr service port number to the SolrPHPClient code. Correcting the
port number fixed the issue.
The lesson is... suppressed errors are
Ok, this could be very easy to do but was not able to do this.
Need to enable location search i.e. if someone searches for location 'New
York' = show results for New York and results within 50 miles of New York.
We do have latitude/longitude stored in database for each record but not
sure how to
I believe this is what you are looking for. I renamed the field called
store to coords in the schema.xml file. The tricky part is building out
the query. I am using SolrNet to do this though and have not yet cracked the
problem.
Adam,
thanks. Yes that helps
but how does coords fields get populated? All I have is
field name=lat type=tdouble indexed=true stored=true /
field name=lng type=tdouble indexed=true stored=true /
field name=coord type=location indexed=true stored=true /
fields 'lat' and 'lng' get populated
Hi all,
Has anyone seen used Apache Portable Runtime (APR) in conjunction with Solr
and Tomcat? Has anyone seen (or better, measured) performance improvements
when using APR?
APR is a library that implements some functionality using Native C (see
http://apr.apache.org/ and
Actually, I by looking at the results from the geofilt filter it would
appear that it's not giving me the results I'm looking for. Or maybe it
is...I need to convert my results to KML to see if it is actually performing
a proper radius query.
In my case, I am getting data from a database and am able to concatenate the
lat/long as a coordinate pair to store in my coords field. To test this, I
randomized the lat/long values and generated about 6000 documents.
Adam
On Wed, Jan 12, 2011 at 8:29 PM, caman
Hi all,
I'm just stuck with exact keyword for several days. Hope you guys could help
me. Here is the scenario:
1. It need to be matched with multi-word keyword and case insensitive
2. Partial word or single word matching with this field is not allowed
I want to know the field type
When I have it running with a permission system (through both API and front
end), I will share i with everyone. It's beginning tohappen.
The search if fairly primative for now. But we hope to learn or hire skills ot
better match it to the business model as we grow/get funding.
Dennis Gearon
Hi Juan,
yeah.. i tried of onlyMorePopular and got some results but are
not similar words or near words to the word i have given in the query..
Here i state you the output..
We are just staring with Solr and have a multi core implementation and need to
delete all the rows in the index to clean things up.
When running an update via a url we are using something like the following
which works fine:
On Thu, Jan 13, 2011 at 6:08 AM, Wilson, Robert
rwil...@constantcontact.com wrote:
We are just staring with Solr and have a multi core implementation and need
to delete all the rows in the index to clean things up.
When running an update via a url we are using something like the following
Hi Robert,
You can find an example of something similar to this in the examples
that are part of the solr distribution. The tutorial (
http://lucene.apache.org/solr/tutorial.html) describes how to post data
to the solr server via the post.jar
user:~/solr/example/exampledocs$ *java -jar
OK, getting ready to be more intereactive with my index, (she likes me).
These are pretty much boolean answered questions to help my understanding.
I think having these in the mail list records might help other too.
A/ Is there a query that updates all the fields automatically on a record that
I'm little busy right now, but I'm going to try to find suitable
parser or if none is found then I think the only solution is to write
a new one.
2011/1/13 Jayendra Patil jayendra.patil@gmail.com:
Had the same issues with international characters and wildcard searches.
One workaround we
Use this type of url for delete all data with fallowed by commit
http://localhost:8983/solr/update/?stream.body=deletequery*:*/query/deletecommit=true
-
Grijesh
--
View this message in context:
84 matches
Mail list logo