Re: solr result....

2010-10-27 Thread Lance Norskog
I'm not quite sure what Tika exceptions mean in this context. You can give the 'fl=field1,field2' option to only return some fields in a query. You can get google-like results using highlighting and 'snippetizing'. These are documented on the wiki. satya swaroop wrote: Hi , Can the resu

solr result....

2010-10-27 Thread satya swaroop
Hi , Can the result of solr show the only a part of the content of a document that got in the result. example if i send a query for to search tika then the result should be as follows::: - 0 79 - - text/html 1html - - Apache Tomcat/6.0.26 - Error reportHTT

Re: How does DIH multithreading work?

2010-10-27 Thread markwaddle
Anyone know how it works? -- View this message in context: http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1784419.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stored or indexed?

2010-10-27 Thread kenf_nc
Interesting wiki link, I hadn't seen that table before. And to answer your specific question about indexed=true, stored=false, this is most often done when you are using analyzers/tokenizers on your field. This field is for search only, you would never retrieve it's contents for display. It may i

Re: solr 1.3 suggester component

2010-10-27 Thread abhayd
hi erick, I was able to implement this using link you posted. I am using SOLR 1.3 I wanted to add spellcheck component to it so did this explicit spellcheck but for some reason it does not return suggestion for misspelled words. For instance

replication not working between 1.4.1 and 3.1-dev

2010-10-27 Thread Shawn Heisey
I started to upgrade my slave servers from 1.4.1 to 3.1-dev checked out this morning. Because of SOLR-2034 (new javabin version) the replication fails. Asking about it in comments on SOLR-2034 brought up the suggestion of switching to XML instead of javabin, but so far I have not been able to

RE: Inconsistent slave performance after optimize

2010-10-27 Thread Jonathan Rochkind
Seriously, at least try JVM argument -XX:+UseConcMarkSweepGC . That argument took care of very similar symptoms I was having. I never did figure out exactly what was causing them, but at some point I tried that JVM argument, and they went away never to come back (which I guess is a clue about

Re: documentCache clarification

2010-10-27 Thread Chris Hostetter
: schema.) My evidence for this is the documentCache stats reported by : solr/admin. If I request "rows=10&fl=id" followed by : "rows=10&fl=id,title" I would expect to see the 2nd request result in : a 2nd insert to the cache, but instead I see that the 2nd request hits : the cache from the 1st re

Re: ClassCastException Issue

2010-10-27 Thread Alex Matviychuk
I found it! Ran this on against the webapps folder: find . -name *.jar | sed 's/^.*\/\(.*\)$/\1/' | sort ... lucene-analyzers-2.9.3.jar lucene-core-2.9.1.jar lucene-highlighter-2.9.3.jar lucene-memory-2.9.3.jar lucene-misc-2.9.3.jar lucene-queries-2.9.1.jar lucene-queries-2.9.3.jar lucene-snowbal

Use SolrCloud (SOLR-1873) on trunk, or with 1.4.1?

2010-10-27 Thread Jeremy Hinegardner
Hi all, I see that as of r1022188 Solr Cloud has been committed to trunk. I was wondering about the stability of Solr Cloud on trunk. We are planning to do a major reindexing soon (within 30 days), several billion docs, and would like to switch to a Solr Cloud based infrastructure. We are wond

Re: Inconsistent slave performance after optimize

2010-10-27 Thread Mason Hale
On Wed, Oct 27, 2010 at 7:18 PM, Ken Krugler wrote: > Normally I'd say like you were getting into swap hell, but based on your > settings you only have 5GB of JVM space being used, on a 16GB box. > > Just to confirm, nothing else is using lots of memory, right? And the "top" > command isn't showin

Re: Inconsistent slave performance after optimize

2010-10-27 Thread Ken Krugler
Normally I'd say like you were getting into swap hell, but based on your settings you only have 5GB of JVM space being used, on a 16GB box. Just to confirm, nothing else is using lots of memory, right? And the "top" command isn't showing any swap usage, right? When you encounter very slow s

Searching for terms on specific fields

2010-10-27 Thread Imran
Hi All We need to be able to perform a search based on two search terms (from the user) against specific fields and a location. For example assume our index (for a collection of books) has fields as title, description, authors (multi-valued), categories(multi-valued), location (ofcourse lng and la

RE: Inconsistent slave performance after optimize

2010-10-27 Thread Jonathan Rochkind
I'm guessing the slaves you restarted were running low on RAM, and possibly engaged in out of control GC. I have had good luck using the JVM option "-XX:+UseConcMarkSweepGC ", which seems to result in GC happening in another thread and not interfering with the servicing of requests. If that'

Re: documentCache clarification

2010-10-27 Thread Koji Sekiguchi
(10/10/28 6:32), Jonathan Rochkind wrote: Woah, I hadn't known about that. queryResultMaxDocsCached is actually a part of Solr 1.4? Is it documented anywhere at all? I guess it is included in the example solrconfig.xml, but is not in my own personal solrconfig.xml. The feature was added since

Re: Inconsistent slave performance after optimize

2010-10-27 Thread Mason Hale
Hi Lance -- Thanks for the reply. > Did you restart all of these slave servers? That would help. We discovered independently that restarting the slave nodes resulted in dramatically improved performance (e.g. from 2.0 sec average response to 0.25 sec average). Can you please explain why this is

If I want to move a core from one physical machine to another....

2010-10-27 Thread Ron Mayer
If I want to move a core from one physical machine to another, is it as simple as just scp -r core5 otherserver:/path/on/other/server/ and then adding on that other server's solr.xml file and restarting the server there? PS: Should have I been able to figure the answer to that out by

RE: how well does multicore scale?

2010-10-27 Thread Toke Eskildsen
mike anderson [saidthero...@gmail.com] wrote: > That's a great point. If SSDs are sufficient, then what does the "Index size > vs Response time" curve look like? Since that would dictate the number > of machines needed. I took a look at > http://wiki.apache.org/solr/SolrPerformanceData but only on

Re: documentCache clarification

2010-10-27 Thread Jonathan Rochkind
Woah, I hadn't known about that. queryResultMaxDocsCached is actually a part of Solr 1.4? Is it documented anywhere at all? I guess it is included in the example solrconfig.xml, but is not in my own personal solrconfig.xml. Anyone know if it has a default size if left unspecified? Shawn He

Re: Stored or indexed?

2010-10-27 Thread Markus Jelsma
http://wiki.apache.org/solr/FieldOptionsByUseCase] > Hi all- > > I've read through the documentation, but I'm still a little confused about > the tag, in terms of the indexed and stored attributes. If I have > something marked as indexed="true", why would I ever want stored="false"? > Are there

Re: documentCache clarification

2010-10-27 Thread Shawn Heisey
On 10/27/2010 12:17 PM, Jay Luker wrote: A 2nd question: while watching these stats I noticed something else weird with the queryResultCache. It seems that inserts to the queryResultCache depend on the number of rows requested. For example, an initial request (solr restarted, clean cache, etc) wi

Stored or indexed?

2010-10-27 Thread Olson, Ron
Hi all- I've read through the documentation, but I'm still a little confused about the tag, in terms of the indexed and stored attributes. If I have something marked as indexed="true", why would I ever want stored="false"? Are there any good tips-n-tricks anywhere about how to properly set the

Searching with wrong keyboard layout or using translit

2010-10-27 Thread Pavel Minchenkov
Hi, When I'm trying to search Google with wrong keyboard layout -- it corrects my query, example: http://www.google.ru/search?q=vjcrdf (I typed word "Moscow" in Russian but in English keyboard layout). Also, when I'm searching using translit, It does the same:

Michigan Information Retrieval Enthusiasts Group Quarterly Meetup - November 13, 2010

2010-10-27 Thread Provalov, Ivan
Cengage Learning is organizing a second quarterly meetup in Michigan (web-conference and dial-in are available) for the IR Enthusiasts. Please RSVP at http://www.meetup.com/Michigan-Information-Retrieval-Enthusiasts-Group Presentations: 1. Search Assist Dictionary Based on Corpus Terms Colloca

Re: Solr sorting problem

2010-10-27 Thread Ron Mayer
Savvas-Andreas Moysidis wrote: > In my understanding sorting on a field for which analysis has yielded > multiple terms just doesn't make sense.. > If you have document#1 with a field A which has the terms Epsilon, Alpha, > and document#2 with field A which has the terms Beta, Delta and request > a

Re: how well does multicore scale?

2010-10-27 Thread mike anderson
That's a great point. If SSDs are sufficient, then what does the "Index size vs Response time" curve look like? Since that would dictate the number of machines needed. I took a look at http://wiki.apache.org/solr/SolrPerformanceData but only one use case seemed comparable. We currently have about 2

Re: documentCache clarification

2010-10-27 Thread Jay Luker
(btw, I'm running 1.4.1) It looks like my assumption was wrong. Regardless of the fields selected using the "fl" parameter and the enableLazyFieldLoading setting, solr apparently fetches from disk and caches all the fields in the document (or maybe just those that are stored="true" in my schema.)

Re: newSearcher vs. firstSearcher

2010-10-27 Thread Chris Hostetter
: But thinking about warming queries, which is my use of new/firstSearcher (and : probably the most common use?), I can't think of any case but ones where I'd : want newSearcher and firstSearcher warming queries to be identical. a firstSearcher event is one in which there is no previous searcher,

RE: Solr sorting problem

2010-10-27 Thread Toke Eskildsen
Jonathan Rochkind [rochk...@jhu.edu] wrote: > I too sometimes have similar use cases, and my best ideas about how to > solve them involve using faceting --- you can facet on a multi-valued > field, and you can sort facets--but you can only sort facets by "index > order", a strict byte-by-byte sort.

Re: documentCache clarification

2010-10-27 Thread Markus Jelsma
I've been wondering about this too some time ago. I've found more information in SOLR-52 and some correspondence on this one but it didn't give me a definitive answer.. [1]: https://issues.apache.org/jira/browse/SOLR-52 [2]: http://www.mail-archive.com/solr-...@lucene.apache.org/msg01185.html O

documentCache clarification

2010-10-27 Thread Jay Luker
Hi all, The solr wiki says this about the documentCache: "The more fields you store in your documents, the higher the memory usage of this cache will be." OK, but if i have enableLazyFieldLoading set to true and in my request parameters specify "fl=id", then the number of fields per document shou

Re: Implementing Search Suggestion on Solr

2010-10-27 Thread Israel Ekpo
I think you may want to configure the field type used for the spell check to use the synonyms file/database. That way synonyms are also processed during index time. This could help. On Wed, Oct 27, 2010 at 6:47 AM, Antonio Calo' wrote: > Hi > > If I understood, you will build a kind of diction

Re: after the slave node pull index from master, when will solr del the tmp index dir

2010-10-27 Thread Jayendra Patil
We faced the same issue. If you are executing a complete clean build, the Slave copies the complete index and just switches the pointer in the index.properties to point to the new index. directory, leaving behind the old copies. And it does not clean it up. Had logged an JIRA and patch to Snap

Re: How do I this in Solr?

2010-10-27 Thread Varun Gupta
Toke, the search query will contain 4-5 words on an average (excluding the stopwords). Mike, I don't care about the result count. Excluding the terms at the client side may be a good idea. Is there any way to alter scoring such that the docs containing only the searched-for terms are shown first?

Re: Multiple Word Facets

2010-10-27 Thread Ken Krugler
On Oct 27, 2010, at 6:29am, Adam Estrada wrote: Ahhh...I see! I am doing my testing crawling a couple websites using Nutch and in doing so I am assigning my facets to the title field which is type=text. Are you saying that I will need to manually generate the content for my facet field? I can s

Re: how well does multicore scale?

2010-10-27 Thread Toke Eskildsen
On Wed, 2010-10-27 at 14:20 +0200, mike anderson wrote: > [...] By my simple math, this would mean that if we want each shard's > index to be able to fit in memory, [...] Might I ask why you're planning on using memory-based sharding? The performance gap between memory and SSDs is not very big so

Re: How do I this in Solr?

2010-10-27 Thread Mike Sokolov
Yes I missed that requirement (as Steven also pointed out in a private e-mail). I now agree that the combinatorics are required. Another possibility to consider (if the queries are large, which actually seems unlikely) is to use the default behavior where all terms are optional, sort by relev

Re: How do I this in Solr?

2010-10-27 Thread Toke Eskildsen
That does not work either as it requires that all the terms in the query are present in the document. The original poster did not state this requirement. On the contrary, his examples were mostly single-word matches, implying an OR-search at the core. The query-explosion still seems like the only

Re: Multiple Word Facets

2010-10-27 Thread Adam Estrada
Ahhh...I see! I am doing my testing crawling a couple websites using Nutch and in doing so I am assigning my facets to the title field which is type=text. Are you saying that I will need to manually generate the content for my facet field? I can see the reason and need for doing it that way but I r

Re: Multiple Word Facets

2010-10-27 Thread Jayendra Patil
The Shingle Filter Breaks the words in a sentence into a combination of 2/3 words. For faceting field you should use :- The type of the field should be *string *so that it is not tokenised at all. On Wed, Oct 27, 2010 at 9:12 AM, Adam Estrada wrote: > Thanks guys, the solr.ShingleFilterFactory

Re: Multiple Word Facets

2010-10-27 Thread Adam Estrada
Thanks guys, the solr.ShingleFilterFactory did work to get me multiple terms per facet but now I am seeing some redundancy in the facets numbers. See below... Highway (62) Highway System (59) National (59) National Highway (59) National Highway System (59) System (59) See what's going on here? Ho

Re: How do I this in Solr?

2010-10-27 Thread Mike Sokolov
Right - my point was to combine this with the previous approaches to form a query like: samsung AND android AND GPS AND word_count:3 in order to exclude documents containing additional words. This would avoid the combinatoric explosion problem otehrs had alluded to earlier. Of course this wou

RE: How do I this in Solr?

2010-10-27 Thread Steven A Rowe
I'm pretty sure the word-count strategy won't work. > If I search with the text "samsung andriod GPS", search results > should only conain "samsung", "GPS", "andriod" and "samsung andriod". Using the word-count strategy, a document containing "samsung andriod PDQ" would be a hit, but Varun doesn

Re: how well does multicore scale?

2010-10-27 Thread Tharindu Mathew
Hi mike, I think I wasn't clear, Each document will only be tagged with one user_id, or to be specific one tenant_id. Users of the same tenant can't upload the same document to the same path. So I use this to make the key unique for each tenant. So I can index, delete without a problem. On Wed,

Re: how well does multicore scale?

2010-10-27 Thread mike anderson
Tagging every document with a few hundred thousand 6 character user-ids would increase the document size by two orders of magnitude. I can't imagine why this wouldn't mean the index would increase by just as much (though I really don't know much about that file structure). By my simple math, this

Feeding Solr with its own Logs

2010-10-27 Thread Peter Karich
In case someone is interested: http://karussell.wordpress.com/2010/10/27/feeding-solr-with-its-own-logs/ a lot of TODOs but: it is working. I could also imagine that this kind of example would be suited for an intro-tutorial, because it covers dynamic fields, rapid solr prototyping, filter and

RE: How do I this in Solr?

2010-10-27 Thread Michael Sokolov
You might try adding a field containing the word count and making sure that matches the query's word count? This would require you to tokenize the query and document yourself, perhaps. -Mike > -Original Message- > From: Varun Gupta [mailto:varun.vgu...@gmail.com] > Sent: Tuesday, Octob

Re: a bug of solr distributed search

2010-10-27 Thread Toke Eskildsen
On Tue, 2010-10-26 at 15:48 +0200, Ron Mayer wrote: > And a third potential reason - it's arguably a feature instead of a bug > for some applications. Depending on how I organize my shards, "give me > the most relevant document from each shard for this search" seems like > it could be useful. You

Re: Implementing Search Suggestion on Solr

2010-10-27 Thread Antonio Calo'
Hi If I understood, you will build a kind of dictionary or ontology or thesauru and you will use it if Solr query results are few. At query time (before or after) you will perform a query on this dictionary in order to retrieve the suggested word. If you need to do this, you can try to cvre

Re: Implementing Search Suggestion on Solr

2010-10-27 Thread Pablo Recio
Thanks, it's not what I'm looking for. Actually I need something like search "Ubuntu" and it will prompt "Maybe you will like 'Debian' too" or something like that. I'm not trying to do it automatically, manually will be ok. Anyway, is good article you shared, maybe I will implement it, thanks! 2

Re: ClassCastException Issue

2010-10-27 Thread Alex Matviychuk
On Wed, Oct 27, 2010 at 03:57, Chris Hostetter wrote: > This almost certainly inidcates a classloader issue - i suspect you have > multiple solr related jars in various places, and the FieldType class > instance found when StrField is loaded comes from a different > (incompatible) jar. Thanks for

Re: Implementing Search Suggestion on Solr

2010-10-27 Thread Jakub Godawa
I am a real rookie at solr, but try this: http://solr.pl/2010/10/18/solr-and-autocomplete-part-1/?lang=en 2010/10/27 Pablo Recio > Hi, > > I don't want to be annoying, but I'm looking for a way to do that. > > I repeat the question: is there a way to implement Search Suggestion > manually? > > T

Re: Implementing Search Suggestion on Solr

2010-10-27 Thread Pablo Recio
Hi, I don't want to be annoying, but I'm looking for a way to do that. I repeat the question: is there a way to implement Search Suggestion manually? Thanks in advance. Regards, 2010/10/18 Pablo Recio Quijano > Hi! > > I'm trying to implement some kind of Search Suggestion on a search engine

Re: Strange search

2010-10-27 Thread Gora Mohanty
On Wed, Oct 27, 2010 at 1:23 PM, ramzesua wrote: > > Can anyone give me working schema.xml and solrconfig from own project? [...] Solr comes with an example configuration in example/solr/conf/ . Please see http://lucene.apache.org/solr/tutorial.html for an example of how to get started with that.

Re: Strange search

2010-10-27 Thread ramzesua
Can anyone give me working schema.xml and solrconfig from own project? -- View this message in context: http://lucene.472066.n3.nabble.com/Strange-search-tp998961p1778760.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Step by step tutorial for multi-language indexing and search

2010-10-27 Thread Lance Norskog
Yes, you can declare each field with the Spanish, French, etc. types. The _t and other types are "dynamic" and don't have to be declared. This feature is generally used when you have hundreds or thousands of fields. It is more clear to declare your fields. You're right- that error should not b

Re: FieldCollapsing and Stats or Sum ?!

2010-10-27 Thread stockiii
okay. i want one number per group. yes its similar to the "group by" command. is there another way to get this ? -- View this message in context: http://lucene.472066.n3.nabble.com/FieldCollapsing-and-Stats-or-Sum-tp1773842p1778731.html Sent from the Solr - User mailing list archive at Nabbl

Re: xpath processing

2010-10-27 Thread Lance Norskog
The XPathEntityProcessor does not do full XPath. It is a very limited set intended to be very fast. You can add code in any scripting language, but that is not really performant. Is it possible to use the RegexTransformer to find your records with regular expressions? Ken Stanley wrote: On Fr

Re: command line to check if Solr is up running

2010-10-27 Thread Pradeep Singh
How about - Please do not respond to 20 emails at one time? On Wed, Oct 27, 2010 at 12:33 AM, Lance Norskog wrote: > Please start new threads for new topics. > > > Xin Li wrote: > >> As we know we can use browser to check if Solr is running by going to >> http://$hostName:$portNumber/$masterName

Re: Failing to successfully import international characters via DIH

2010-10-27 Thread Lance Norskog
CLOB is probably better for what you want. Also, make sure the table is declared UTF-8 (or Unicode or whatever mysql calls it.) virtas wrote: As it turns out issue was somewhere in mysql. Not sure exactly where, but something to do to with BLOB. Now, I changed text field from BLOB to varchar

Re: command line to check if Solr is up running

2010-10-27 Thread Lance Norskog
Please start new threads for new topics. Xin Li wrote: As we know we can use browser to check if Solr is running by going to http://$hostName:$portNumber/$masterName/admin, say http://localhost:8080/solr1/admin. My questions is: are there any ways to check it using command line? I used "curl

Re: Documents are deleted when Solr is restarted

2010-10-27 Thread Lance Norskog
These directories are shown at the top of the admin/index.jsp page. Check out all of the pages off of admin/index.jsp- there is a lot of information there about what solr is doing. Israel Ekpo wrote: The Solr home is the -Dsolr.solr.home Java System property Also make sure that -Dsolr.data.di

Re: Inconsistent slave performance after optimize

2010-10-27 Thread Lance Norskog
Did you restart all of these slave servers? That would help. What garbage collection options do you use? Which release of Solr? How many Searchers are there in admin/stats.jsp? Searchers hold open all kinds of memory. They are supposed to cycle out. These are standard questions, but- what you are

Re: Jars required in classpath to run embedded solr server?

2010-10-27 Thread Lance Norskog
It requires all of the jars that are packed into solr.war. It is a full and complete implementation of indexing and searching. Tharindu Mathew wrote: Hi everyone, Do we need all lucene jars in the class path for this? Seems that the solr-solrj and solr-core jars are not enough (http://wiki.apa

Re: Solr sorting problem

2010-10-27 Thread Lance Norskog
You may not sort on a tokenized field. You may not sort on a multiValued field. You can only have one term in a field. If there are more search terms than documents, A) sorting doesn't mean anything and B) Lucene will throw an exception. Erick Erickson wrote: In general, the behavior when so

Re: How do I this in Solr?

2010-10-27 Thread Lance Norskog
There is also a feature called a 'filter'. If you use certain words a lot, you can make filter queries with just those words. Look for 'filter' and 'fq=' on the wiki. But really you can have hundreds of words in a query and not have a performance problem. Solr/Lucene is very fast. In benchmar