On Sat, Jan 24, 2009 at 6:56 AM, Cam Bazz wrote:
> Hello;
>
> I got a multiField named tagList which may contain multiple tags. I am
> making a query like:
>
> tagList:a AND tagList:b AND tagList:c
>
> and I am also getting a tagList facet returning me some values.
>
> What I would like is Solr t
On Sat, Jan 24, 2009 at 5:56 AM, Nathan Adams wrote:
> Is there a way to us Data Import Handler to index non-XML (i.e. simple
> text) files (either via HTTP or FileSystem)? I need to put the entire
> contents of a text file into a single field of a document and the other
> fields are being pulle
There's another option. Using DIH with Solrj. Take a look at:
https://issues.apache.org/jira/browse/SOLR-853
There's a patch there but it hasn't been updated to trunk. A contribution
would be most welcome.
On Sat, Jan 24, 2009 at 3:11 AM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:
>
These might be obvious, but:
* I assume you did a Solr commit command after indexing, right?
* If you are using the fieldtype definitions from the default
schema.xml, then your "string" fields are not being analyzed, which
means you should expect search results only if you enter the entire,
exact
I've indexed my XML using the below in the schema:
Message-ID
However searching via the Message-ID or Content fields returns 0. Using Luke
I can still see these fields are stored however.
Out of interest
Hello;
I got a multiField named tagList which may contain multiple tags. I am
making a query like:
tagList:a AND tagList:b AND tagList:c
and I am also getting a tagList facet returning me some values.
What I would like is Solr to return me facets as if the query was:
tagList:a AND tagList:b
is
Is there a way to us Data Import Handler to index non-XML (i.e. simple
text) files (either via HTTP or FileSystem)? I need to put the entire
contents of a text file into a single field of a document and the other
fields are being pulled out of Oracle...
-Nathan
Chris, Ahmet - thanks for the responses.
Ahmet - yes, i want to see "run" as a top term + the original words that
formed that term
The reason is that due to mis-stemming, the terms could become non-english.
ex: "permanent" would stem to "perm", "archive" would become "archiv".
I need to extract
I didn't understand what exactly you want.
if a document has run(10), running(20), runner(2), runners(8):
(assuming stemmer reduces all those words to run)
with non-stemmed you will see:
running(20)
run(10)
runners(8)
runner(2)
with stemmed you will see:
run(40)
You want to see run as a top te
It seems like what's desired is not so much a stemmer as what you might call
a "canonicalizer", which would translate each source word not into its
"stem" but into its "most canonical form". Critically, the latter, by
definition, is always a legitimate word, e.g. "run". What's more, it's
always the
Hi
I had earlier described my requirement of needing to 'post XMLs as-is'
to SOLR and have it handled just as the DIH would do on import using
the mapping in data-config.xml. I got multiple answers for the 'post
approach' - the top two being
- Use SOLR CELL
- Use SOLRJ
In general I would
The best way to find out what was wrong with the request is going to be the
web server logs. It should throw an exception that usually complains about
fields missing or incorrect.
As to the committing solr has an autocommit option that will fire after a
designated amount of changes have been ente
I keep getting the error "FATAL: Solr returned an error: Bad Request"
Solr is running on a different port (8080) so I changed the command line
request to "java -Durl=http://localhost:8080/solr/update -jar post.jar
*.xml"
which seems to at least initiate.
"WARNING: Make sure your XML documents a
hi Ahmet,
thanks. when i look at the non_stemmed_text field to get the top terms, i
will not be getting the useful feature of aggregating many related words
into one (which is done by stemming).
for ex: if a document has run(10), running(20), runner(2), runners(8) - i
would like to see a a "top t
I think best way to get non-stemmed top terms is to index the field using a
fieldType that does not employes any stem filter. For example:
By using copyField you can store two (or more) versions of a field. Stemmed and
non-stemmed.
Just a new field:
And a copy field:
Schema Brow
Wicked...you fixed it!
Thanks very much.
Pretty simple in the end I guess...but I thought it might be.
Cheers.
Jeff Newburn wrote:
>
> The important info you are looking for is "undefined field sku at". It
> looks like there may be a copyfield in the schema looking for a field
> named
> s
Well here are the first 10/15 lines:
HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change:
false in null
-
The important info you are looking for is "undefined field sku at". It
looks like there may be a copyfield in the schema looking for a field named
sku which does not exist. Just search "sku" in the file and see what comes
up.
On 1/23/09 11:15 AM, "Johnny X" wrote:
>
> Well here are the first
The first 10-15 lines of the jargon might help. Additionally, the full
exceptions will be in the webserver logs (ie tomcat or jetty logs).
On 1/23/09 10:40 AM, "Johnny X" wrote:
>
> Ah, gotcha.
>
> Where do I go to find the log messages? Obviously it prints a lot of jargon
> on the admin pag
hello,
Is it possible to retrieve the original words once solr (Porter algorithm)
stems them?
I need to index a bunch of data, store it in solr, and get back a list of
most frequent terms out of solr. and i want to see the non-stemmed version
of this data.
so basically, i want to enhance this:
ht
Ah, gotcha.
Where do I go to find the log messages? Obviously it prints a lot of jargon
on the admin page reporting the error, but is that what you want?
Jeff Newburn wrote:
>
> Are there any error log messages?
>
> The difference between a string and text is that string is basically
> store
On Fri, Jan 23, 2009 at 10:54 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:
> Hello,
>
> Those two numbers won't necessarily give you the number of duplicates, as
> they reflect the number of deletes in the index, and those deletes were not
> necessarily caused by Solr detecting a dupl
Are there any error log messages?
The difference between a string and text is that string is basically stored
with no modification (it is the solr.StrField). The text type is actually
defined in the fieldtype section and usually contains a tokenizer and some
analyzers (usually stemming, lowercasi
Hi there,
I just configured my Solr schema file to support the data types I wish to
submit for indexing. However, as soon as try and start the Solr server I get
an error trying to reach the admin page.
I know this only has something to do with my definitions in the schema,
because when I tried
Hello,
Those two numbers won't necessarily give you the number of duplicates, as they
reflect the number of deletes in the index, and those deletes were not
necessarily caused by Solr detecting a duplicate insert.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Origi
The type of garbage collector definitely affects performance, but there are
other settings as well. There's a related thread currently discussing this:
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-td21588427.html
hbi dev wrote:
>
> Hi wojtekpia,
>
> That's inter
The easiest way is to run maybe 100,000 or more queries and take an
average. A single microsecond value for a query would be incredibly
inaccurate.
-ToddFeak
-Original Message-
From: AHMET ARSLAN [mailto:iori...@yahoo.com]
Sent: Friday, January 23, 2009 1:33 AM
To: solr-user@lucene.apa
Can you share your experience with the IBM JDK once you've evaluated it?
You are working with a heavy load, I think many would benefit from the
feedback.
-Todd Feak
-Original Message-
From: wojtekpia [mailto:wojte...@hotmail.com]
Sent: Thursday, January 22, 2009 3:46 PM
To: solr-user@luc
Hi,
I'm getting confused about the method Map
toMultiMap(NamedList params) in SolrParams class.
When some of your parameter is instanceof String[] it's converted to to
String using the toString() method, which seems
to me to be wrong. It is probably assuming, that the values in NamedList are
all
Begin forwarded message:
From: Tony Stevenson
Date: January 23, 2009 8:28:19 AM EST
To: travel-assista...@apache.org
Subject: [Travel Assistance] Applications for ApacheCon EU 2009 -
Now Open
The Travel Assistance Committee is now accepting applications for
those
wanting to attend Apa
Julian Davchev wrote on 01/20/2009 10:07:48 AM:
> Julian Davchev
> 01/20/2009 10:07 AM
>
> I get SEVERE: Lock obtain timed out
>
> Hi,
> Any documents or something I can read on how locks work and how I can
> controll it. When do they occur etc.
> Cause only way I got out of this mess was rest
I think you could use dismax and restric de result with a filter query.
Suposing you're using dismaxquery parser it should look like:
http://localhost:8080/solr/select?q=whatever&fq=category:3
I think this would sort your case
surfer10 wrote:
>
> definitly disMax do the thing by searching one
Thanks for the response. Let me clarify things a bit.
Regarding the Slaves:
Our project is a web application. It is our desire to embedd Solr into the
web application. The web applications are configured with a local embedded
Solr instance configured as a slave, and a remote Solr instance confi
Try:
http://wiki.apache.org/solr/SolrConfigXml?highlight=(maxfieldlength)
Best
Erick
On Fri, Jan 23, 2009 at 7:29 AM, Gargate, Siddharth wrote:
> Hi,
> I am trying to index a 25 MB word document. I am not able to search all
> the keywords. Looks like only certain number of initial words are
> ge
Hi,
I am trying to utilize solr into an autocomplete thingy.
Let's assume I query for 'foo'.
Assuming we work with case insensitive here.
I would like to have records returned in specific order. First all that
have exact match, then all that start with Foo in alphabetical order,
then all that con
Hi,
I have tested this as well, looking fine! Both issues are indeed fixed, and
the index directory of the slaves gets cleaned up nicely. I will apply the
changes to all systems I've got running and report back in this thread in
case any issues are found.
Thanks for the very fast help! I usually
Hi,
I am trying to index a 25 MB word document. I am not able to search all
the keywords. Looks like only certain number of initial words are
getting indexed.
Is there any limit to the size of document getting indexed? Or is there
any word count limit per field?
Thanks,
Siddharth
I have opened an issue to track this
https://issues.apache.org/jira/browse/SOLR-978
On Fri, Jan 23, 2009 at 5:22 PM, Noble Paul നോബിള് नोब्ळ्
wrote:
> I tested with the patch
> it has solved both the issues
>
> On Fri, Jan 23, 2009 at 5:00 PM, Shalin Shekhar Mangar
> wrote:
>>
>>
>> On Fri, Ja
I tested with the patch
it has solved both the issues
On Fri, Jan 23, 2009 at 5:00 PM, Shalin Shekhar Mangar
wrote:
>
>
> On Fri, Jan 23, 2009 at 2:12 PM, Jaco wrote:
>>
>> Hi,
>>
>> I applied the patch and did some more tests - also adding some LOG.info()
>> calls in delTree to see if it actual
On Fri, Jan 23, 2009 at 2:12 PM, Jaco wrote:
> Hi,
>
> I applied the patch and did some more tests - also adding some LOG.info()
> calls in delTree to see if it actually gets invoked (LOG.info("START:
> delTree: "+dir.getName()); at the start of that method). I don't see any
> entries of this sho
On Fri, Jan 23, 2009 at 2:55 PM, Paul Libbrecht wrote:
>
> Le 23-janv.-09 à 10:10, Noble Paul നോബിള് नोब्ळ् a écrit :
>>
>> if the response is not XML ,then there is no EntityProcessor that can
>> consume this. We may need to add one.
>
> well, even binary data such as word documents (base64-enc
Ian,
A new field is indeed needed and warranted for this case. Facets only
work off indexed terms, not stored.
Erik
On Jan 22, 2009, at 11:48 PM, Ian Connor wrote:
The facet prefix method to get suggestions for search terms really
helps.
However, it seems to show the indexed rat
Hi wojtekpia,
That's interesting, I shall be looking into this over the weekend so I shall
look at the GC also. I was briefly reading about GC last night, am I right
in thinking it could be affected by what version of the jvm I'm using
(1.5.0.8), and also what type of Collector is set? What collec
Hey there, I would like to understand why distributed search doesn't suport
facet dates. As I understand it would have problems because if the time of
the servers is not syncronized, the results would not be exact but... In
case I wouldn't mind if results are completley exacts... would be possible
Hi, thanks for your reply.
Sorry for lesser information that i gave in my first post, i just didn't
know what to share.
Yes, java proccess is still working, but search in the site does not
work and i cannot see any http request at this time in the logs. I have
not tested the admin page, this is s
Is there a way to get QTime in microsecond from solr?
I have small set of collection and my response time (QTime) is 0 or 1
milliseconds. I am running benchmark tests and I need more sensitive running
times for comparision.
Thanks for your help.
Le 23-janv.-09 à 10:10, Noble Paul നോബിള്
नोब्ळ् a écrit :
if the response is not XML ,then there is no EntityProcessor that can
consume this. We may need to add one.
well, even binary data such as word documents (base64-encoded for
example) run the risk of appearing here. They sure need
Seems to work fin on this mornings 23-jan-2009 nightly.
Thanks very much.
>On Wed, Jan 21, 2009 at 6:05 PM, Fergus McMenemie wrote:
>
>>
>> After looking looking at http://issues.apache.org/jira/browse/SOLR-964,
>> where
>> it seems this issue has been addressed, I had another go at indexing
>
On Fri, Jan 23, 2009 at 2:28 PM, Paul Libbrecht wrote:
> Well,
>
> the idea is that the solr engine indexes the contents of a web platform.
>
> Each document is a user-side-URL out of which several fields would be
> fetched through various URL-get-documents (e.g. the full-text-view, e.g. the
> fut
Well,
the idea is that the solr engine indexes the contents of a web platform.
Each document is a user-side-URL out of which several fields would be
fetched through various URL-get-documents (e.g. the full-text-view,
e.g. the future openmath representation, e.g. the topics (URIs in an
onto
Hi,
I applied the patch and did some more tests - also adding some LOG.info()
calls in delTree to see if it actually gets invoked (LOG.info("START:
delTree: "+dir.getName()); at the start of that method). I don't see any
entries of this showing up in the log file at all, so it looks like delTree
d
Hi all,
i am new to solr.I have posted nearly 10 lakh xml docs for the last few
months.
Now i want to find out the total number of duplicate posts untill now.
whether the stats.jsp's numDocs and maxDocs is the appropriate one to find
out the total duplicate post(maxDocs-numDocs) so far?
please
Yes Solr does. But DataImportHandler with the 1.3 release does not support
it.
However, you can use the trunk data import handler jar with Solr 1.3 if you
do not feel comfortable using Solr 1.4 trunk.
On Fri, Jan 23, 2009 at 1:36 PM, Gunaranjan Chandraraju <
chandrar...@apple.com> wrote:
>
> I t
I thought 1.3 supported dynamic fields in schema.xml?
Guna
On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
On Fri, Jan 2
I thought 1.3 supported dynamic fields in schema.xml?
Guna
On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote:
Oops, one more gotcha. The dynamic field support is only in 1.4 trunk.
On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
On Fri, Jan 2
55 matches
Mail list logo