Hi Guys,
Are you aware of any standard/specification (like JSR 168/286 for portals, CMIS
for CMS) for Search engines ?
Is there any such specification people are working on currently ?
Regards,
Sourav
CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND
not, then
you're looking more at a query-type solution, where Solr would be
less interesting.
-- Ken
-Original Message-
From: souravm
Sent: Saturday, December 06, 2008 9:41 PM
To: solr-user@lucene.apache.org
Subject: Limitations of Distributed Search
Hi,
We are planning to use Solr
Hi,
Any inputs on this would be really helpful. Looking for suggestions/viewpoints
from you guys.
Regards,
Sourav
-Original Message-
From: souravm
Sent: Saturday, December 06, 2008 9:41 PM
To: solr-user@lucene.apache.org
Subject: Limitations of Distributed Search
Hi,
We
Hi,
We are planning to use Solr for processing large volume of application log
files (around ~ 10 Billions documents of size 5-6 TB).
One of the approach we are considering for the same is to use Distributed
Search extensively.
What we have in mind is distributing the log files in multiple
Hi All,
Though my testing I found that query performance, when it is not served from
cache, is largely depending on number of hits and concurrent number of queries.
And in both the cases the query is essentially CPU bound.
Just wondering whether we can update this somewhere in Wiki as this
Hi All,
Say I have started a new Solr server instance using the start.jar in java
command. Now for this Solr server instance when all a new Searcher would be
created ?
I am aware of following scenarios -
1. When the instance is started for autowarming a new Searcher is created. But
not sure
Hi,
I have huge index files to query. On a first cut calculation it looks like I
would need around 3 boxes (each box not more than 125 M records of size 12.5GB)
for around 25 apps - so all together 75 boxes.
However the number of concurrent users would be lesser - may not be more than
20 at
Hi All,
I have huge number of documents to index (say per hr) and within a hr I cannot
compete it using a single machine. Having them distributed in multiple boxes
and indexing them in parallel is not an option as my target doc size per hr
itself can be very huge (3-6M). So I am considering
in parallel. If you're not doing any link inversion
for web search, it doesn't seem like hadoop is needed for parallelism.
If you are doing web crawling, perhaps look to nutch, not hadoop.
-Yonik
On Fri, Nov 28, 2008 at 1:31 PM, souravm [EMAIL PROTECTED] wrote:
Hi All,
I have huge number
leave the indexes on multiple
boxes and use Solr's distributed search to search across them
(assuming you really didn't really need everything on a single box).
-Yonik
On Fri, Nov 28, 2008 at 7:01 PM, souravm [EMAIL PROTECTED] wrote:
Hi Yonik,
Let me explain why I thought using hadoop will help
Ah sorry, I had misread your original post. 3-6M docs per hour can be
challenging.
Using the CSV loader, I've indexed 4000 docs per second (14M per hour)
on a 2.6GHz Athlon, but they were relatively simple and small docs.
On Fri, Nov 28, 2008 at 9:54 PM, souravm [EMAIL PROTECTED] wrote
:40 AM
To: solr-user@lucene.apache.org
Cc: souravm
Subject: Re: Sorting and JVM heap size
On Tue, Nov 25, 2008 at 7:49 AM, souravm [EMAIL PROTECTED]mailto:[EMAIL
PROTECTED] wrote:
3. Another case is - if there are 2 search requests concurrently hitting the
server, each with sorting
).
Have a look at http://wiki.apache.org/solr/DistributedSearch for more info.
You could also take a look at Hadoop. (http://hadoop.apache.org/)
regards,
Aleks
On Mon, 24 Nov 2008 06:24:51 +0100, souravm [EMAIL PROTECTED] wrote:
Hi,
Looking for some insight on distributed search.
Say I have
Hi,
I have indexed data of size around 20GB. My JVM memory is 1.5GB.
For this data if I do a query with sort flag on (for a single field) I always
get java out of memory exception even if the number of hit is 0. With no
sorting (or default sorting with score) the query works perfectly fine.`
: Sorting and JVM heap size
On Mon, Nov 24, 2008 at 6:26 PM, souravm [EMAIL PROTECTED] wrote:
I have indexed data of size around 20GB. My JVM memory is 1.5GB.
For this data if I do a query with sort flag on (for a single field) I always
get java out of memory exception even if the number
is correct.
Regards,
Sourav
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Monday, November 24, 2008 6:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Sorting and JVM heap size
On Mon, Nov 24, 2008 at 8:48 PM, souravm [EMAIL
, souravm [EMAIL PROTECTED] wrote:
Hi Yonik,
Thanks again for the detail input.
Let me try to re-confirm my understanding -
1. What you say is - if sorting is asked for a field, the same field from all
documents, which are indexed, would be put in a memory in an un-inverted
form. So given
Hi,
Looking for some insight on distributed search.
Say I have an index distributed in 3 boxes and the index contains time and text
data (typical log file). Each box has index for different timeline - say Box 1
for all Jan to April, Box 2 for May to August and Box 3 for Sep to Dec.
Now if I
Hi,
Is there a way to specify sort criteria through Solr admin ui. I tried doing it
thorugh the query statement box but it did not work.
Regards,
Sourav
CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the
. We probably should be creating all these sorts of goodies
and independent modules of code that aren't core, but that gets
fuzzy to say what's core and what isn't too.
Erik
On Nov 13, 2008, at 8:26 PM, souravm wrote:
Hi,
As I understand the STATS functions (Min, Max, Average
Hi,
As I understand the STATS functions (Min, Max, Average, Standard Deviation
etc.) would be available in Solr 1.4.
Just wondering if they are already there in the latest trunk. Else can anyone
suggest any other tool which can be used with Solr 1.3 to achieve this
requirement ?
Regards,
requests to other
Solr instances you specified and will merge the results.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: souravm [EMAIL PROTECTED]
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Thursday, November 6
Thanks Noble for your answer.
Regards,
Sourav
-Original Message-
From: Noble Paul നോബിള് नोब्ळ् [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 06, 2008 7:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Multicore ...
On Fri, Nov 7, 2008 at 3:28 AM, souravm [EMAIL PROTECTED
Hi Guys,
Here I'm struggling with to decide whether Solr would be a fitting solution for
me. Highly appreciate you
The key requirements can be summarized as below -
1. Need to process very high volume of data online from log files of various
applications - around 100s of Millions of total
http://wiki.apache.org/hadoop/Chukwa
http://incubator.apache.org/pig/
On Fri, Nov 7, 2008 at 9:03 PM, souravm [EMAIL PROTECTED] wrote:
Hi Guys,
Here I'm struggling with to decide whether Solr would be a fitting solution
for me. Highly appreciate you
The key requirements can be summarized
Hi,
Can I use multi core feature to have multiple indexes (That is each core would
take care of one type of index) within a single Solar instance ?
Will there be any performance impact due to this type of setup ?
Regards,
Sourav
CAUTION - Disclaimer *
This
Hi Fergus,
Does the 6.6m doc resides on a single box (node) or multiple boxes ? Do u use
distributed search ?
Regards,
Sourav
- Original Message -
From: Fergus McMenemie [EMAIL PROTECTED]
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Wed Nov 05 08:21:45 2008
Hi,
I'm new to Solr. Here is a query on distributed search.
I have huge volume of log files which I would like to search. Apart from
generic test search I would also like to get statistics - say each record has a
field telling request processing time and I would like to get average of
28 matches
Mail list logo