I am trying to add document to a slor index via :
$> curl
"http://localhost:8983/solr/update/csv?commit=true&fieldnames=id,title_s&separator=%09";
--data "Doc1\tTitle1" -H 'Content-type:text/plain; charset=utf-8'
Solr doesn't seem to recognize the \t in the content, and is failing
with followin
eplacement for
>>>> an RDBMS. It is a *text search engine*.
>>>> Whenever you start asking "how do I implement
>>>> a SQL statement in Solr", you have to stop
>>>> and reconsider *why* you are trying to do that.
>>>> Then recast the questio
s are pretty high, so you'll need
> some experimentation to size your site
> correctly.
>
> Best
> Erick
>
> On Wed, Jan 4, 2012 at 12:17 AM, prasenjit mukherjee
> wrote:
>> I have a requirement where reads and writes are quite high ( @ 100-500
>> per-sec ). A
I have a requirement where reads and writes are quite high ( @ 100-500
per-sec ). A document has the following fields : timestamp,
unique-docid, content-text, keyword. Average content-text length is ~
20 bytes, there is only 1 keyword for a given docid.
At runtime, given a query-term ( which coul
njit
>
>
> Felipe Hummel
>
>
> On Sun, Oct 23, 2011 at 9:33 PM, prasenjit mukherjee
> wrote:
>
>> Any pointers/suggestions on my approach ?
>>
>>
>> On 10/22/11, prasenjit mukherjee wrote:
>> > My use case is the following :
>> > Give
ryparsersyntax.html#Boosting%20a%20Term
>
> Am 25.10.2011 11:19, schrieb prasenjit mukherjee:
>>
>> During search time I get the following input ( only for 1 field ) =
>> "solr:3 rocks:2 apache:1" . For this I have to create the lucene query
>> in the
On Tue, Oct 25, 2011 at 1:17 PM, Simon Willnauer
wrote:
> On Tue, Oct 25, 2011 at 5:08 AM, prasenjit mukherjee
> wrote:
>> Thats exactly I was trying to avoid :(
>>
>> I can afford to do that during indexing time, but it will be
>> time-consuming to do that a
e this directly? I think the easiest way is to write a
> simple tokenFilter that emit the term X times where X is the term
> frequency. There is no easy way to pass these tuples to lucene
> directly.
>
> simon
>
> On Mon, Oct 24, 2011 at 3:28 AM, prasenjit mukherjee
> wrote:
>
Any pointers/suggestions on my approach ?
On 10/22/11, prasenjit mukherjee wrote:
> My use case is the following :
> Given an n-dimensional vector ( only +ve quadrants/points ) find its
> closest neighbours. I would like to try out with lucene's default
> ranking. Here is how a
my point of view, it's meaningless, since the analysis process has
> to be performed to collect such as prox, offset, or syno, payload and so on.
>
> On Sun, Oct 23, 2011 at 11:22 PM, prasenjit mukherjee
> wrote:
>
>> I already have the term-frequency-count for all the term
I already have the term-frequency-count for all the terms in a
document. Is there a way I can re-use that info while indexing. I
would like to use solr for this.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
F
My use case is the following :
Given an n-dimensional vector ( only +ve quadrants/points ) find its
closest neighbours. I would like to try out with lucene's default
ranking. Here is how a typical document will look like :
( or same thing
)
doc1 = 1245:15 3490:20 8856:20 etc.
As reflected in th
interesting alternate lucene query language.
>
> could this work?
>
>
> prasenjit mukherjee wrote:
>> This looks like a use case more suited for Pig ( over Hadoop ).
>>
>> It could be difficult for lucene to do sort and sum simultaneously as
>> sorting itself dep
eate_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
> prasenjit mukherjee wrote:
>>
>> This looks like a use case more suited for Pig ( over Hadoop ).
>>
>> It could be
ll types (afiiliates,
> sales) so looping tons of records each time isn't possible.
>
> - Mike
> aka...@gmail.com
>
>
> On Thu, Apr 1, 2010 at 2:11 PM, prasenjit mukherjee
> wrote:
>
-
To
t;= '2010-03-06'
> GROUP BY Affiliate
> ORDER BY TotalSales DESC;
>
> - Mike
> aka...@gmail.com
>
>
> On Thu, Apr 1, 2010 at 8:11 AM, prasenjit mukherjee
> wrote:
>
>> Not sure what you mean by "joining" in lucene , since conceptually
>
d excel in something it wasn't designed for ;-)
>
> -D
>
>
> -Original Message-
> From: prasenjit mukherjee [mailto:prasen@gmail.com]
> Sent: Thursday, April 01, 2010 8:11 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Challenge - sum, count,
our other data and sorting?
>
> - Mike
> aka...@gmail.com
>
>
> On Wed, Mar 31, 2010 at 9:23 PM, prasenjit mukherjee
> wrote:
>
>> I too am trying to achieve something.
>>
>> I am thinking of storing the integer values in payloads and then
>> using spanque
I too am trying to achieve something.
I am thinking of storing the integer values in payloads and then
using spanquery classes to compute the respective SUMs
-Prasen
On Thu, Apr 1, 2010 at 6:47 AM, Michel Nadeau wrote:
> Hi,
>
> We're currently in the process of switching many of our screens f
I am trying to implement oracle's aggregation like SQL's ( e.g.
SUM(col3) where col1='foo' and col2='bar' ) using lucene's payload
feature.
I can add the integer_value ( of col3 ) as a payload to my searchable
fields ( col1 and col2 ). I can probably extend the
DefaultSImilarity's scorePayload()
did someone delete the shared doc ?
[EMAIL PROTECTED] wrote:
Hello,
I have got lot of personal emails for sharing the "Lucene Investigation"
document. It is not possible to reply each of the Emails. So I am putting
this document inside my briefcase. Anyone interested please go to following
sit
I have a requirement ( use highlighter) to store the doc content
somewhere., and I am not allowed to use a RDBMS. I am thinking of using
Lucene's Field with (Field.Store.YES and Field.Index.NO) to store the
doc content. Will it have any negative affect on my search performance ?
I think I hav
I want to do some document clustering on a corpus of ~ 100,000
documents, with average doc size being ~ 7k. I have looked into carrot2
but it seems to work only for relatively short documents and has soem
scalign issues for large corpus. Certainly for these kind of corpus
size, one cannot us
I want to enforce the concept of a unique primary key in lucene index by
having a field whose values has to be unique for all lucene documents.
One way is to do a search just before indexing, but that seems to
consume lot of time as you have to create a new IndexSearcher every time
you want to
Given a term "myterm", what kind of search algorithm lucene uses to
get to the postings list(i.e. the term-frequency location in .frq file)
? From what I understood by looking into the lucene fileformat, is that
it keeps the whole of .tii file in memory and and does a skipped linear
search o
It seems that the performance aspects of any indexing/searching
algorithm is very much dependent upon the disk-access-technologies.
Just curious, anybody know of any company working(mostly storage
companies) in improving their storage/disk access technology to make
indexing/searching effici
Agreed, an inverted index cannot be efficiently maintained in a
B-tree(hence RDBMS). But I think we can(or should) have the option of
a B-tree based storage for unindexed fields, whereas for indexed fields
we can use the existing lucene's architecture.
prasen
[EMAIL PROTECTED] wrote:
Dmi
Dmitry Goldenberg wrote:
Ideally, I'd love to see an article explaining both in detail: the index
structure as well as the merge algorithm...
________
From: Prasenjit Mukherjee [mailto:[EMAIL PROTECTED]
Sent: Tue 3/28/2006 11:57 PM
To: java-user@lucene.apache.org
S
I have already gone through the fileformat. What I was looking for, is
the underlying theory behind the chosen fileformats. I am sure those
fileformats were decided based on some theoritical axioms.
--prasen
[EMAIL PROTECTED] wrote:
On Mar 28, 2006, at 11:57 PM, Prasenjit Mukherjee wrote
It seems to me that lucene doesn't use B-tree for its indexing storage.
Any paper/article which explains the theory behind data-structure of
single index(segment). I am not referring to the merge algorithm, I am
curious to know the storage structure of a single optimized lucene index.
Any po
I think nutch has a distributed lucene implementation. I could have used
nutch straightaway, but I have a different crawler, and also dont want
to use NDFS(which is used by nutch) . What I have proposed earlier is
basically based on mapReduce paradigm, which is used by nutch as well.
It would
I already have an implementation of a distributed crawler farm, where
crawler instances are runnign on different boxes. I want to come up with
a distributed indexing scheme using lucene and take advantage of the
distributed nature of my crawlers' distributed nature. Here is what I am
thinking.
32 matches
Mail list logo