Any standard/specification for Search ??

2009-03-04 Thread souravm
Hi Guys, Are you aware of any standard/specification (like JSR 168/286 for portals, CMIS for CMS) for Search engines ? Is there any such specification people are working on currently ? Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND

RE: Limitations of Distributed Search ....

2008-12-09 Thread souravm
not, then you're looking more at a query-type solution, where Solr would be less interesting. -- Ken -Original Message- From: souravm Sent: Saturday, December 06, 2008 9:41 PM To: solr-user@lucene.apache.org Subject: Limitations of Distributed Search Hi, We are planning to use Solr

RE: Limitations of Distributed Search ....

2008-12-08 Thread souravm
Hi, Any inputs on this would be really helpful. Looking for suggestions/viewpoints from you guys. Regards, Sourav -Original Message- From: souravm Sent: Saturday, December 06, 2008 9:41 PM To: solr-user@lucene.apache.org Subject: Limitations of Distributed Search Hi, We

Limitations of Distributed Search ....

2008-12-06 Thread souravm
Hi, We are planning to use Solr for processing large volume of application log files (around ~ 10 Billions documents of size 5-6 TB). One of the approach we are considering for the same is to use Distributed Search extensively. What we have in mind is distributing the log files in multiple

Query performance insight ...

2008-12-03 Thread souravm
Hi All, Though my testing I found that query performance, when it is not served from cache, is largely depending on number of hits and concurrent number of queries. And in both the cases the query is essentially CPU bound. Just wondering whether we can update this somewhere in Wiki as this

What are the scenarios when a new Searcher is created ?

2008-11-30 Thread souravm
Hi All, Say I have started a new Solr server instance using the start.jar in java command. Now for this Solr server instance when all a new Searcher would be created ? I am aware of following scenarios - 1. When the instance is started for autowarming a new Searcher is created. But not sure

Solr with Networkn File Server

2008-11-30 Thread souravm
Hi, I have huge index files to query. On a first cut calculation it looks like I would need around 3 boxes (each box not more than 125 M records of size 12.5GB) for around 25 apps - so all together 75 boxes. However the number of concurrent users would be lesser - may not be more than 20 at

Using Solr with Hadoop ....

2008-11-28 Thread souravm
Hi All, I have huge number of documents to index (say per hr) and within a hr I cannot compete it using a single machine. Having them distributed in multiple boxes and indexing them in parallel is not an option as my target doc size per hr itself can be very huge (3-6M). So I am considering

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
in parallel. If you're not doing any link inversion for web search, it doesn't seem like hadoop is needed for parallelism. If you are doing web crawling, perhaps look to nutch, not hadoop. -Yonik On Fri, Nov 28, 2008 at 1:31 PM, souravm [EMAIL PROTECTED] wrote: Hi All, I have huge number

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
leave the indexes on multiple boxes and use Solr's distributed search to search across them (assuming you really didn't really need everything on a single box). -Yonik On Fri, Nov 28, 2008 at 7:01 PM, souravm [EMAIL PROTECTED] wrote: Hi Yonik, Let me explain why I thought using hadoop will help

RE: Using Solr with Hadoop ....

2008-11-28 Thread souravm
Ah sorry, I had misread your original post. 3-6M docs per hour can be challenging. Using the CSV loader, I've indexed 4000 docs per second (14M per hour) on a 2.6GHz Athlon, but they were relatively simple and small docs. On Fri, Nov 28, 2008 at 9:54 PM, souravm [EMAIL PROTECTED] wrote

RE: Sorting and JVM heap size ....

2008-11-25 Thread souravm
:40 AM To: solr-user@lucene.apache.org Cc: souravm Subject: Re: Sorting and JVM heap size On Tue, Nov 25, 2008 at 7:49 AM, souravm [EMAIL PROTECTED]mailto:[EMAIL PROTECTED] wrote: 3. Another case is - if there are 2 search requests concurrently hitting the server, each with sorting

RE: Query for Distributed search -

2008-11-24 Thread souravm
). Have a look at http://wiki.apache.org/solr/DistributedSearch for more info. You could also take a look at Hadoop. (http://hadoop.apache.org/) regards, Aleks On Mon, 24 Nov 2008 06:24:51 +0100, souravm [EMAIL PROTECTED] wrote: Hi, Looking for some insight on distributed search. Say I have

Sorting and JVM heap size ....

2008-11-24 Thread souravm
Hi, I have indexed data of size around 20GB. My JVM memory is 1.5GB. For this data if I do a query with sort flag on (for a single field) I always get java out of memory exception even if the number of hit is 0. With no sorting (or default sorting with score) the query works perfectly fine.`

RE: Sorting and JVM heap size ....

2008-11-24 Thread souravm
: Sorting and JVM heap size On Mon, Nov 24, 2008 at 6:26 PM, souravm [EMAIL PROTECTED] wrote: I have indexed data of size around 20GB. My JVM memory is 1.5GB. For this data if I do a query with sort flag on (for a single field) I always get java out of memory exception even if the number

RE: Sorting and JVM heap size ....

2008-11-24 Thread souravm
is correct. Regards, Sourav -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, November 24, 2008 6:03 PM To: solr-user@lucene.apache.org Subject: Re: Sorting and JVM heap size On Mon, Nov 24, 2008 at 8:48 PM, souravm [EMAIL

RE: Sorting and JVM heap size ....

2008-11-24 Thread souravm
, souravm [EMAIL PROTECTED] wrote: Hi Yonik, Thanks again for the detail input. Let me try to re-confirm my understanding - 1. What you say is - if sorting is asked for a field, the same field from all documents, which are indexed, would be put in a memory in an un-inverted form. So given

Query for Distributed search -

2008-11-23 Thread souravm
Hi, Looking for some insight on distributed search. Say I have an index distributed in 3 boxes and the index contains time and text data (typical log file). Each box has index for different timeline - say Box 1 for all Jan to April, Box 2 for May to August and Box 3 for Sep to Dec. Now if I

specifying Sort criteria through Solr admin ui ...

2008-11-15 Thread souravm
Hi, Is there a way to specify sort criteria through Solr admin ui. I tried doing it thorugh the query statement box but it did not work. Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the

RE: STATS functions ....

2008-11-14 Thread souravm
. We probably should be creating all these sorts of goodies and independent modules of code that aren't core, but that gets fuzzy to say what's core and what isn't too. Erik On Nov 13, 2008, at 8:26 PM, souravm wrote: Hi, As I understand the STATS functions (Min, Max, Average

STATS functions ....

2008-11-13 Thread souravm
Hi, As I understand the STATS functions (Min, Max, Average, Standard Deviation etc.) would be available in Solr 1.4. Just wondering if they are already there in the latest trunk. Else can anyone suggest any other tool which can be used with Solr 1.3 to achieve this requirement ? Regards,

RE: Distributed Search ...

2008-11-07 Thread souravm
requests to other Solr instances you specified and will merge the results. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: souravm [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, November 6

RE: Solr Multicore ...

2008-11-07 Thread souravm
Thanks Noble for your answer. Regards, Sourav -Original Message- From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Thursday, November 06, 2008 7:41 PM To: solr-user@lucene.apache.org Subject: Re: Solr Multicore ... On Fri, Nov 7, 2008 at 3:28 AM, souravm [EMAIL PROTECTED

RE: Solr Multicore ...

2008-11-07 Thread souravm
Hi Guys, Here I'm struggling with to decide whether Solr would be a fitting solution for me. Highly appreciate you The key requirements can be summarized as below - 1. Need to process very high volume of data online from log files of various applications - around 100s of Millions of total

Solr for large volume data processing with minimal full-text serach

2008-11-07 Thread souravm
http://wiki.apache.org/hadoop/Chukwa http://incubator.apache.org/pig/ On Fri, Nov 7, 2008 at 9:03 PM, souravm [EMAIL PROTECTED] wrote: Hi Guys, Here I'm struggling with to decide whether Solr would be a fitting solution for me. Highly appreciate you The key requirements can be summarized

Solr Multicore ...

2008-11-06 Thread souravm
Hi, Can I use multi core feature to have multiple indexes (That is each core would take care of one type of index) within a single Solar instance ? Will there be any performance impact due to this type of setup ? Regards, Sourav CAUTION - Disclaimer * This

Re: Large Data Set Suggestions

2008-11-05 Thread souravm
Hi Fergus, Does the 6.6m doc resides on a single box (node) or multiple boxes ? Do u use distributed search ? Regards, Sourav - Original Message - From: Fergus McMenemie [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wed Nov 05 08:21:45 2008

Query on distributed search ...

2008-11-03 Thread souravm
Hi, I'm new to Solr. Here is a query on distributed search. I have huge volume of log files which I would like to search. Apart from generic test search I would also like to get statistics - say each record has a field telling request processing time and I would like to get average of