Re: Data loading from DB - data sizes and obstacles

2009-08-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Aug 7, 2009 at 11:15 AM, Amit Nithiananith...@gmail.com wrote: All, An off and on project of mine has been to work on refactoring the way we load data from MySQL into Solr. Our current approach is fairly hard coded and not configurable as I would like. I was curious of people who have

Re: Data loading from DB - data sizes and obstacles

2009-08-07 Thread Avlesh Singh
I have been a satisfied DIH user for a long time. The project I use Solr for, runs on a MySQL (5.1) version. There are 6 solr-cores in total with a combined index size of 12G. The database design is as relational as it can get, and writing SQL queries to fetch the data has always been always a

Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.

2009-08-07 Thread Ninad Raut
Hi, I want to know how to setup master-slave configuration for Solr 1.3 . I can't get documentation on the net. I found one for 1.4 but not for 1.3 . ReplicationHandler is not present in 1.3. Also, I would like to know from will I get the Solr 14. distribution. The Solr Site lists mirrors only

Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.

2009-08-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
1.4 is not released yet. you can grab a nightly from here http://people.apache.org/builds/lucene/solr/nightly/ On Fri, Aug 7, 2009 at 12:47 PM, Ninad Rauthbase.user.ni...@gmail.com wrote: Hi, I want to know how to setup  master-slave configuration for Solr  1.3 . I can't get documentation on

Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.

2009-08-07 Thread Shalin Shekhar Mangar
On Fri, Aug 7, 2009 at 12:47 PM, Ninad Raut hbase.user.ni...@gmail.comwrote: Hi, I want to know how to setup master-slave configuration for Solr 1.3 . I can't get documentation on the net. I found one for 1.4 but not for 1.3 . ReplicationHandler is not present in 1.3. Also, I would like to

Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.

2009-08-07 Thread Ninad Raut
Hi Noble, can these builds be used in production environment? Are they stable? we are not going live now, but in a few months we will. as such when will 1.4 be officially released? 2009/8/7 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com 1.4 is not released yet. you can grab a nightly from

CorruptIndexException: Unknown format version

2009-08-07 Thread Maximilian Hütter
Hi, how can that happen, it is a new index, and it is already corrupt? Did anybody else something like this? WARN - 2009-08-07 10:44:54,925 | Solr index directory 'data/solr/index' doesn't exist. Creating new index... WARN - 2009-08-07 10:44:56,583 | solrconfig.xml uses deprecated

Re: mergeFactor / indexing speed

2009-08-07 Thread Chantal Ackermann
Juhu, great news, guys. I merged my child entity into the root entity, and changed the custom entityprocessor to handle the additional columns correctly. And - indexing 160k documents now takes 5min instead of 1.5h! (Now I can go relaxed on vacation. :-D ) Conclusion: In my case performance

Re: Language Detection for Analysis?

2009-08-07 Thread Andrzej Bialecki
Otis Gospodnetic wrote: Bradford, If I may: Have a look at http://www.sematext.com/products/language-identifier/index.html And/or http://www.sematext.com/products/multilingual-indexer/index.html .. and a Nutch plugin with similar functionality:

Re: Language Detection for Analysis?

2009-08-07 Thread Jukka Zitting
Hi, On Fri, Aug 7, 2009 at 12:31 PM, Andrzej Bialeckia...@getopt.org wrote: .. and a Nutch plugin with similar functionality: http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/LanguageIdentifier.html See also TIKA-209 [1] where I'm currently integrating the Nutch code

Help creating schema for indexable document

2009-08-07 Thread rossputin
Hi Guys. I am struggling to create a schema with a determinist content model for a set of documents I want to index. My indexable documents will look something like: add doc field name=id1/field field name=codecode1/field field name=codecode2/field field

Re: mergeFactor / indexing speed

2009-08-07 Thread Shalin Shekhar Mangar
On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Juhu, great news, guys. I merged my child entity into the root entity, and changed the custom entityprocessor to handle the additional columns correctly. And - indexing 160k documents now takes 5min

Re: mergeFactor / indexing speed

2009-08-07 Thread Chantal Ackermann
Thanks for the tip, Shalin. I'm happy with 6 indexes running in parallel and completing in less than 10min, right now, but I'll have look anyway. Shalin Shekhar Mangar schrieb: On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Juhu, great news, guys.

Solr 1.4 in Production Environment-- Is it stable?

2009-08-07 Thread Ninad Raut
Hi, Has anyone used Solr 1.4 in production? There are some really nice features in it like - Directly adding POJOs to Solr - ReplicationHandler etc. Is 1.4 stable enought to be used in production?

Re: solr v1.4 in production?

2009-08-07 Thread Shalin Shekhar Mangar
On Wed, Jul 1, 2009 at 6:17 PM, Ed Summers e...@pobox.com wrote: Here at the Library of Congress we've got several production Solr instances running v1.3. We've been itching to get at what will be v1.4 and were wondering if anyone else happens to be using it in production yet. Any information

Re: Solr 1.4 in Production Environment-- Is it stable?

2009-08-07 Thread Otis Gospodnetic
I know a number of large companies using 1.4-dev. But you could also wait another month or so and get the real 1.4. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Ninad

Re: Language Detection for Analysis?

2009-08-07 Thread Grant Ingersoll
There are several free Language Detection libraries out there, as well as a few commercial ones. I think Karl Wettin has even written one as a plugin for Lucene. Nutch also has one, AIUI. I would just Google language detection. Also see

Re: Item Facet

2009-08-07 Thread David Lojudice Sobrinho
Thanks Avlesh. But I didn't get it. How a dynamic field would aggregate values in query time? On Thu, Aug 6, 2009 at 11:14 PM, Avlesh Singhavl...@gmail.com wrote: Dynamic fields might be an answer. If you had a field called product_* and these were populated with the corresponding values

Re: Solr 1.4 in Production Environment-- Is it stable?

2009-08-07 Thread Jeff Newburn
We also use 1.4 which has gotten hit with load tests of up to 2000queries/sec. Biggest thing is make sure you are using the slaves for that kind of load. Other than that 1.4 is pretty impressive. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Otis

Re: Item Facet

2009-08-07 Thread Yao Ge
Are your product_name* fields numeric fields (integer or float)? Dals wrote: Hi... Is there any way to group values like shopping.yahoo.com or shopper.cnet.com do? For instance, I have documents like: doc1 - product_name1 - value1 doc2 - product_name1 - value2 doc3 -

Re: CorruptIndexException: Unknown format version

2009-08-07 Thread Yonik Seeley
Wow, that is an interesting one... I bet there is more than one Lucene version kicking around the classpath somehow. Try removing all of the servlet container's working directories. -Yonik http://www.lucidimagination.com On Fri, Aug 7, 2009 at 4:41 AM, Maximilian

Re: Item Facet

2009-08-07 Thread David Lojudice Sobrinho
The behavior i'm expecting is something similar to a GROUP BY in a relational database. SELECT product_name, model, min(price), max(price), count(*) FROM t GROUP BY product_name, model The current schema: product_name (type: text) model (type: text) price (type: sfloat) On Fri, Aug 7, 2009 at

Is kill -9 safe or not?

2009-08-07 Thread Michael _
I've seen several threads that are one or two years old saying that performing kill -9 on the java process running Solr either CAN, or CAN NOT corrupt your index. The more recent ones seem to say that it CAN NOT, but before I bake a kill -9 into my control script (which first tries a normal kill,

Re: Preserving C++ and other weird tokens

2009-08-07 Thread Michael _
On Thu, Aug 6, 2009 at 11:38 AM, Michael _ solrco...@gmail.com wrote: Hi everyone, I'm indexing several documents that contain words that the StandardTokenizer cannot detect as tokens. These are words like C# .NET C++ which are important for users to be able to search for, but get

Re: Is kill -9 safe or not?

2009-08-07 Thread Yonik Seeley
Kill -9 will not corrupt your index, but you would lose any uncommitted documents. -Yonik http://www.lucidimagination.com On Fri, Aug 7, 2009 at 11:07 AM, Michael _solrco...@gmail.com wrote: I've seen several threads that are one or two years old saying that performing kill -9 on the java

Re: Preserving C++ and other weird tokens

2009-08-07 Thread Yonik Seeley
http://search.lucidimagination.com/search/document/2d325f6178afc00a/how_to_search_for_c -Yonik http://www.lucidimagination.com On Thu, Aug 6, 2009 at 11:38 AM, Michael _solrco...@gmail.com wrote: Hi everyone, I'm indexing several documents that contain words that the StandardTokenizer

Re: Attempt to query for max id failing with exception

2009-08-07 Thread Yonik Seeley
I just tried this sample code... it worked fine for me on trunk. -Yonik http://www.lucidimagination.com On Thu, Aug 6, 2009 at 8:28 PM, Reuben Firminreub...@benetech.org wrote: I'm using SolrJ. When I attempt to set up a query to retrieve the maximum id in the index, I'm getting an exception.

Re: Is kill -9 safe or not?

2009-08-07 Thread Otis Gospodnetic
Yonik, Uncommitted (as in solr uncommited) on unflushed? Thanks, Otis - Original Message From: Yonik Seeley yo...@lucidimagination.com To: solr-user@lucene.apache.org Sent: Friday, August 7, 2009 11:10:49 AM Subject: Re: Is kill -9 safe or not? Kill -9 will not corrupt your

Re: Attempt to query for max id failing with exception

2009-08-07 Thread Reuben Firmin
Yep, thanks - this turned out to be a systems configuration error. Our sysadmin hadn't opened up the http port on the server's internal network interface; I could browse to it from outside (i.e. firefox on my machine), but the apache landing page was being returned when CommonsHttpSolrServer tried

Re: Is kill -9 safe or not?

2009-08-07 Thread Yonik Seeley
On Fri, Aug 7, 2009 at 12:04 PM, Otis Gospodneticotis_gospodne...@yahoo.com wrote: Yonik, Uncommitted (as in solr uncommited) on unflushed? Solr uncommitted. Even if the docs hit the disk via a segment flush, they aren't part of the index until the index descriptor (segments_n) is written

Solr CMS Integration

2009-08-07 Thread wojtekpia
I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with

Re: Preserving C++ and other weird tokens

2009-08-07 Thread solrcoder
Ach, sorry I didn't find this before posting! - Michael Yonik Seeley-2 wrote: http://search.lucidimagination.com/search/document/2d325f6178afc00a/how_to_search_for_c -Yonik http://www.lucidimagination.com -- View this message in context:

Question regarding merging Solr indexes

2009-08-07 Thread ahammad
Hello, I have a MultiCore setup with 3 cores. I am trying to merge the indexes of core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear on what needs to happen. This is what I used:

Re: Solr CMS Integration

2009-08-07 Thread Andre Hagenbruch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 wojtekpia schrieb: Hi Wojtek, I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management

Re: Solr CMS Integration

2009-08-07 Thread Grant Ingersoll
lucidimagination.com is powered off of Drupal and we index it using Solr (but not the Drupal plugin, as we have non CMS data as well). It has blogs, articles, white papers, mail archives, JIRA tickets, Wiki's etc. On Aug 7, 2009, at 1:01 PM, wojtekpia wrote: I've been asked to suggest

localSolr install

2009-08-07 Thread Brian Klippel
Is there any sort of guide to installing and configuring localSolr into an existing solr implementation? I'm not extremely versed with java applications, but I've managed to cobble together jetty and solr multicore fairly reliably. I've downloaded localLucine 2.0 and localSolr 6.1, and this

Re: localSolr install

2009-08-07 Thread Bhargava Sriram
Hi All, I also need the same information. I am planning to set up solr. I have data around 20 to 30 million records and those in csv formats. Your help is highly appreciable. Regards, Bhargava S Akula. 2009/8/7 Brian Klippel br...@theport.com Is there any sort of guide to installing and

Re: Is kill -9 safe or not?

2009-08-07 Thread solrcoder
Thanks for the confirmation and reassurance! - Michael Yonik Seeley-2 wrote: On Fri, Aug 7, 2009 at 12:04 PM, Otis Gospodneticotis_gospodne...@yahoo.com wrote: Yonik, Uncommitted (as in solr uncommited) on unflushed? Solr uncommitted. Even if the docs hit the disk via a segment

Re: Solr CMS Integration

2009-08-07 Thread Tim Archambault
I would second that and add that you may want to consider acquia.com as they provide a solid infrustracture to support the solr instance. On Fri, Aug 7, 2009 at 11:20 AM, Andre Hagenbruch andre.hagenbr...@rub.dewrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 wojtekpia schrieb: Hi

Re: Solr CMS Integration

2009-08-07 Thread wojtekpia
Thanks for the responses. I'll give Drupal a shot. It sounds like it'll do the trick, and if it doesn't then at least I'll know what I'm looking for. Wojtek -- View this message in context: http://www.nabble.com/Solr-CMS-Integration-tp24868462p24870218.html Sent from the Solr - User mailing

PhoneticFilterFactory related questions

2009-08-07 Thread Reuben Firmin
Hi, I have a schema with three (relevant to this question) fields: title, author, book_content. I found that if PhoneticFilterFactory is used as a filter on book_content, it was bringing back all kinds of unrelated results, so I have it applied only against title and author. Questions -- 1) I

Solr Security

2009-08-07 Thread Francis Yakin
Have anyone had an experience to setup the Solr Security? http://wiki.apache.org/solr/SolrSecurity I would like to implement using HTTP Authentication or using Path Based Authentication. So, in the webdefault.xml I set like the following: security-constraint web-resource-collection

Re: Solr CMS Integration

2009-08-07 Thread Olivier Dobberkau
Am 07.08.2009 um 19:01 schrieb wojtekpia: I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open

Re: Solr CMS Integration

2009-08-07 Thread Paul Libbrecht
Hello Wojtek, I don't want to discourage all the famous CMSs around nor solr uptake but xwiki is quite a powerful CMS and has a search that is lucene based. paul Le 07-août-09 à 22:42, Olivier Dobberkau a écrit : I've been asked to suggest a framework for managing a website's content

spellcheck component in 1.4 distributed

2009-08-07 Thread mike anderson
I am e-mailing to inquire about the status of the spellchecking component in 1.4 (distributed). I saw SOLR-785, but it is unreleased and for 1.5. Any help would be much appreciated. Thanks in advance, Mike

Re: solr v1.4 in production?

2009-08-07 Thread Ian Connor
Pubget has been using 1.4 for a while now to make the replication easier. http://pubget.com We compiled a while back and are thinking of updating to the latest build to start playing with distributed spell checking. On Fri, Aug 7, 2009 at 7:42 AM, Shalin Shekhar Mangar shalinman...@gmail.com

Can multiple Solr webapps access the same lucene index files?

2009-08-07 Thread Mark Diggory
Hello, I have a question I can't find an answer to in the list. Can mutliple solr webapps (for instance in separate cluster nodes) share the same lucene index files stored within a shared filesystem? We do this with a custom Lucene search application right now, I'm trying to switch to using

MoreLikeThis: How to get quality terms from html from content stream?

2009-08-07 Thread Jay Hill
I'm using the MoreLikeThisHandler with a content stream to get documents from my index that match content from an html page like this: http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/08/06/SP5R194Q13.DTLmlt.fl=bodyrows=4debugQuery=true But, not

How to use key with facet.prefix?

2009-08-07 Thread Jón Helgi Jónsson
I'm trying to facet multiple times on same field using key. This works fine except when I use prefixes for these facets. What I got so far (and not functional): .. facet=true facet.field=categoryf.category.facet.prefix=01 facet.field={!key=subcat}categoryf.subcat.facet.prefix=00 This will give

Re: Can multiple Solr webapps access the same lucene index files?

2009-08-07 Thread Otis Gospodnetic
Yes, they could all point to an index that lives on a NAS or SAN, for example. You'd still have to make sure only one server is writing to the index at a time. Zookeeper can help with coordination of that. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr,

Re: Question regarding merging Solr indexes

2009-08-07 Thread Shalin Shekhar Mangar
On Fri, Aug 7, 2009 at 10:45 PM, ahammad ahmed.ham...@gmail.com wrote: Hello, I have a MultiCore setup with 3 cores. I am trying to merge the indexes of core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear on what needs to happen. This is what I used: