CL Solr API. This to some extend circles
back to the documentation argument, but of course goes a bit further. Its just
more convenient to "explore" and learn the API via tab completion, which is of
course not so easy to offer via a C extension API :-/
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
[1] http://pooteeweet.org/blog/1796
even if this one is based on PHP 5.3 (namespaces etc). btw there is
already another PHP 5.3 based API, though it tries to also unify other Lucene
based API's as much as possible:
https://github.com/dstendardi/Ariadne
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
in our release announcements or via the Lucene
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [X] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a downstream
> project)
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
s not uncommon so the
> answer might be helpful to others as well.
I ran into this issue compiling PHP with--curl-wrappers.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
ng/rewriting.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 19.12.2010, at 23:30, Alexey Serba wrote:
>
> Also Ephraim proposed a really neat solution with GROUP_CONCAT, but
> I'm not sure that all RDBMS-es support that.
Thats MySQL only syntax.
But if you google you can find similar solution for other RDBMS.
regards,
Lukas
On 13.11.2010, at 10:30, Yonik Seeley wrote:
> On Wed, Nov 10, 2010 at 9:12 AM, Lukas Kahwe Smith
> wrote:
>> The above wiki page seems to be out of date. Reading the comments in
>> https://issues.apache.org/jira/browse/SOLR-236 it seems like "group" should
&
stuff like "united states of america". it would then generate a shingle with
"united states america" which in turn wouldnt generate a proper phrase search
string.
one option of course would be to restrict the shingles to 2 words and then
using the stop word filter would work a
a way I
can sensibly bring in a stop word filter here? Actually in theory the stop
words could appear as the first or second word as well.
So I guess when producing shingle's I want to skip any stop word from being
part of any shingle.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 07.11.2010, at 20:13, Lukas Kahwe Smith wrote:
> Hi,
>
> I am pondering making use of field collapsing. I am currently indexing
> clauses (sections) inside UN documents:
> http://resolutionfinder.org/search/unifiedResults?q=africa&=&t[22]=medication&dc=&st=cla
get the facet filters to display the right counts. So I am wondering if
field collapsing in its current form supports faceting, since its not mentioned
on the wiki page:
http://wiki.apache.org/solr/FieldCollapsing
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
on is X, some think however its Y. but no information means users are
essentially without any information about the future.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
uld provided should probably be on
http://lucene.apache.org/solr/#news
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 14.10.2010, at 19:50, Jonathan Rochkind wrote:
> I'm kind of confused about Solr development plans in general, highlighted by
> this thread.
>
> I think 1.4.1 is the latest officially stable release, yes?
>
> Why is there both a 1.5 and a 3.x, anyway? Not to mention a 4.x? Which of
> the
the method signatures.
also i do think that there should be methods for escaping and also tokenizing
lucene queries to enable "validation" of the syntax used etc.
see here for a use case and a user land implementation:
http://pooteeweet.org/blog/1796
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 20.09.2010, at 08:32, Lukas Kahwe Smith wrote:
> Hi,
>
> ok since it didnt seem like there was interest to document this approach on
> the wiki i have simply documented it on my blog:
> http://pooteeweet.org/blog/1827
sorry for the spam. Lance (and Erik) did think it would
Hi,
ok since it didnt seem like there was interest to document this approach on the
wiki i have simply documented it on my blog:
http://pooteeweet.org/blog/1827
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
new last update timestamp should be available.
>
> Lukas Kahwe Smith wrote:
>> Hi,
>>
>> I think i have mentioned this approach before on this list, but I really
>> think that the deltaQuery approach which is currently explained as the "way
>> to do upda
ny downside to this approach? Should this be added to the wiki?
regards.
Lukas Kahwe Smith
m...@pooteeweet.org
can have multiple
tags and so maybe you can just split up your queries like that?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
ovements)
are for the next release and a ball park when to expect them would go a long
way.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
whats going to happen in the near future would make it
all the more easier for us users to bet our futures on solr :)
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
et in
production but in our tests it worked fine.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 17.07.2010, at 15:39, Lukas Kahwe Smith wrote:
> Hi,
>
> I am following:
> http://wiki.apache.org/solr/LoggingInDefaultJettySetup
>
> All works fine except defining the logging properties files from jetty.xml
> Does this approach work for anyone else?
problem s
Hi,
I am following:
http://wiki.apache.org/solr/LoggingInDefaultJettySetup
All works fine except defining the logging properties files from jetty.xml
Does this approach work for anyone else?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
lhost:8983/solr/select?debugQuery=on&q=foo_s:"";
The raw query parser would also work (it skips analysis):
http://localhost:8983/solr/select?debugQuery=on&q={!raw f=foo_s}
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
re sense to leave all of that in
the q part.
> Also, you may want to apply patch SOLR-1553 and start using the eDisMax
> handler which allows fielded search and boolean operators, if you need more
> advanced user-facing query syntax.
yeah .. i am keeping an eye on that already.
thx!
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 29.06.2010, at 13:38, Lukas Kahwe Smith wrote:
>
> On 29.06.2010, at 13:24, Jan Høydahl / Cominvent wrote:
>
>> Hi,
>>
>> In DisMax the "mm" parameter controls whether terms are required or
>> optional. The default is 100% which means all term
lt it behaves like "optional"
in my tests. i will test with mm=0 if it behaves like prohibited.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
dler
Quotes can be used to group phrases, and +/- can be used to denote mandatory
and optional clauses ... but all other Lucene query parser special characters
are escaped to simplify the user experience.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 26.06.2010, at 16:30, Lukas Kahwe Smith wrote:
>
> On 26.06.2010, at 16:22, Koji Sekiguchi wrote:
>
>> (10/06/26 22:19), Lukas Kahwe Smith wrote:
>>> Hi,
>>>
>>> Form googling and looking at jira tickets it seems like phrase highlighting
>>
On 26.06.2010, at 16:22, Koji Sekiguchi wrote:
> (10/06/26 22:19), Lukas Kahwe Smith wrote:
>> Hi,
>>
>> Form googling and looking at jira tickets it seems like phrase highlighting
>> should work out of the box, but even enabling it manually didnt get me the
y%3Dorig_tag_ids}tag_ids&hl.usePhraseHighlighter=true&qq="security+council"}
hits=0 status=0 QTime=31
but as you can see in the above website "security" and "council" are still
highlighted separately.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
40M,
> but then I upgraded the test machine to the latest 4.0 from trunk, and ran
> into the timeout issue you described, so I am going back to the 5.1.12
> connector. I just saw the message on the list about branch_3x in SVN, which
> looks like a better option than trunk.
Any news on this topic?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
he second thing i looked at. it doesnt really contain the
infos required, plus its obviously quite slow too.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
ing to set the given fields to
stored?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 13.06.2010, at 16:46, Lukas Kahwe Smith wrote:
> Hi,
>
> what could cause this issue?
> i cannot reproduce it on my dev machine, but i am pretty sure its not an
> access control issue in either the file system nor the database.
>
> INFO: Creating a connection for ent
are doing searches.
What I might end up doing though is not escape dashes only in specific cases:
foo-bar (escape)
foo - bar (escape)
foo -bar (not escape, aka probihit bar)
This should enable power users and should rarely hit non power users.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
Lukas Kahwe Smith
m...@pooteeweet.org
)
... 12 more
Jun 13, 2010 4:13:47 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Jun 13, 2010 4:13:47 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
cordingly, but I guess in that case I rather just remove support for
prohibiting words.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
s who are really
> smart about the distributed searching stuff have to say.
ok i have created it:
https://issues.apache.org/jira/browse/SOLR-1937
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
erized the DSN information. Plus I have one query
defined for the deletes and another one for both the full import and updates.
if clear is set to anything but false, the where condition evalutes to true and
the updated_at would be ignored in pretty much any decent RDBMS. if its false,
then the updated_at is checked as per usual.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
pt.
>>
i guess the easiest is to do the intervals at index time, obviously less
flexible.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
;=&tm=any&s=Search
The field itself is just an untokenized string. Of course I could just turn an
empty string into "none" at index time, but I am wondering how to do it in
general :)
I tried using just "" or ["" TO ""] to match for empty strings,
along the xaxis to decide how to count the docs for the y axis.
>
> http://en.wikipedia.org/wiki/Sparkline
> http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR
kayak.com uses a double slider to handle the flight departure range:
http://screencast.com/t/ZjExMTE5
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
ect any of the checkboxes, it updates the counts. however i display
both the count without and with those additional checkbox filters applied
(actually i only display two numbers of they are not the same):
http://screencast.com/t/MWUzYWZkY2Yt
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
(*) i
ature
request or should stuff like this rather be done in userland (I have noticed
for example that Solr prefers to have users normalize the scores in userland
too)?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 25.05.2010, at 08:55, Lukas Kahwe Smith wrote:
> Now when I deselect one of the checkboxes I add an fq parameters:
> facet=true&fl=*,score&sort=score+desc&start=0&q=(tag_ids:("23"))&facet.field={!ex%3Ddt}organisation_id&facet.field={!ex%3Ddt}tag
x%3Ddt}organisation_id&facet.field={!ex%3Ddt}tag_ids&facet.field={!ex%3Ddt}addressee_ids&facet.field={!ex%3Ddt}operative_phrase_id&facet.field={!ex%3Ddt}documenttype_id&facet.field={!ex%3Ddt}information_type_id&facet.field={!ex%3Ddt}legal_value&json.nl=map&wt=json&fq={!tag%3Ddt}organisation_id:(-"9")+AND+{!tag%3Ddt}information_type_id:(-"1")&rows=21}
Can someone give me a hint?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
gt; generate new fixed range values after updating Solr. If you think something
> like what I've made is useful to you, I'll be happy to answer any questions
> about how I implemented this.
yeah .. i was thinking of this as a fallback approach. seems easy enough to
implement and might be less confusing for the users.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 16.05.2010, at 21:01, Ahmet Arslan wrote:
http://wiki.apache.org/solr/StatsComponent can give you
min and max values.
Sorry my bad, I just tested StatsComponent with tdate field. And it
is not working for date typed fields. Wiki says it is for numeric
fields.
ok thx for checking. i
d end points for the slider. The user can then
move the sliders to further filter the result set.
How can I best go about fetching just those min and max values, ideally without
having to add a separate query just for this?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
[1]
http://wiki.apache.org
ize is 4.6GB with about 16M entities.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
Hi,
just FYI I am using mysql-connector-java-5.1.10-bin.jar and I my full import
takes about 3 hours and I am not experiencing crashes.
regards,
Lukas
On 26.04.2010, at 12:48, Lukas Kahwe Smith wrote:
> Hi,
>
> I am currently putting together a search for a DB where I have resolutions
> along with their metadata as well as chapters, its text and metadata. Most of
> the searching will actually be done on the metadata. The
s, so I could just as well use one core.
grouping:
how do I best group the scores for the (a) type search? should I just do two
searches and combine the results? then again this will make paging tricky.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 07.04.2010, at 14:24, Lukas Kahwe Smith wrote:
> For Solr the idea is also just copy the index files into a new directory and
> then use http://wiki.apache.org/solr/CoreAdmin#RELOAD after updating the
> config file (I assume its not possible to hot swap like with MySQL).
Since
. Plus if we run into any issues we can also easily rollback by just
swapping the data around again.
I would appreciate any comments you guys might have on this concept.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
i have found an answer to the above question. so
its not some crazy use case ..
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
st parameters in
your DIH xml:
http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
However you can even also define default for these parameters inside your
solrconfig.xml request handler configuration.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 17.03.2010, at 11:36, Lukas Kahwe Smith wrote:
>
> On 16.03.2010, at 15:42, Lukas Kahwe Smith wrote:
>
>> Hi,
>>
>> I am trying to use $deleteDocById to delete rows based on an SQL query in my
>> db-data-config.xml. The following tag is a top level tag i
On 16.03.2010, at 15:42, Lukas Kahwe Smith wrote:
> Hi,
>
> I am trying to use $deleteDocById to delete rows based on an SQL query in my
> db-data-config.xml. The following tag is a top level tag in the
> tag.
>
>
thats obviously a typo from trying to simplify
Hi,
I am trying to use $deleteDocById to delete rows based on an SQL query in my
db-data-config.xml. The following tag is a top level tag in the tag.
However it seems like its only fetching the rows, its not actually issuing any
index deletes.
regards,
Lukas Kahwe Smith
m
(for
example to pass in the password)?
Furthermore is there some way to define default values for these request
parameters in case no value is passed in?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
ple values it would need to do he following (which
would need to be index supported in order to perform decently):
for (i=0, i x_lower_right_coord AND
y_coord[i] > y_upper_left_coord AND y_coord[i] < y_lower_right_coord
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
items">
>
>
Nice approach. In MySQL you also have a handy function that might come into
play in your use case:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
This lets you do a group by and then concatenate all the values i
tself.
Anyways so how can I get "st.gallen" split into two terms at query time?
...
It seems I should probably use the solr.StandardTokenizerFactory anyways, but
for this case it wouldnt help
))~2) ()
+(((telefon:fc^0.5 | firstname:fc | email:fc^0.5 | street:fc^0.6 | city:fc^0.6
| name:fc) (telefon:"st gallen"^0.5 | firstname:"st gallen" | email:"st
gallen"^0.5 | street:"st gallen"^0.6 | city:"st gallen"^0.6 | name:"st
gallen"))~2) ()
Whats going on there?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 10.02.2010, at 16:41, Lukas Kahwe Smith wrote:
> There is a solution to update via DIH, but is there also a way to define a
> query that fetches id's for documents that should be removed?
Or to phrase the question a bit more open. I have a file with id's of documents
to
Hi,
There is a solution to update via DIH, but is there also a way to define a
query that fetches id's for documents that should be removed?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
Hi Ahmet,
Well after some more testing I am now convinced that you rock :)
I like the solution because its obviously way less hacky and more importantly I
expect this to be a lot faster and less memory intensive, since instead of a
facet prefix or terms search, I am doing an "equality" compariso
h my hack. and since there can also be
matches inside words that have no real meaning i am also not sure if this
really gets me better quality on this level either.
will play around with this some more tough.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
turn original string. In this method you are not using faceting
> anymore. You are just querying and requesting a field.
>
> q=suggest_field:di&fl=suggest_field
Yeah, I just realized that while I was trying it out. :-)
Still testing ..
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
bar ding dong" ?
obviously i have to decide how important it is for me to get the original mixed
case string for auto suggest, but it does matter a bit more over here in Europe
than in the US for example.
if i would both index the original mixed case and the lower case version and
remove the solr.LowerCaseFilterFactory in both analyzer sections, then it
should work however as long as terms usually start with an upper case letter if
they do contain upper case letters.
let me try this out ..
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 03.02.2010, at 13:54, Lukas Kahwe Smith wrote:
> The issue is that I have multiple fields of data (names, address etc) that
> should all be relevant for the auto suggest. Furthermore a "phrase" entered
> can either match on one field or any combination of fields. Phrase
fields of data (names, address etc) that
should all be relevant for the auto suggest. Furthermore a "phrase" entered can
either match on one field or any combination of fields. Phrase in this context
means separated by spaces or dash. For this I found the above approach the only
feasible solution.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
o" or "bar" and in both
cases i can show the user "Foo Bar" with a bit of frontend logic to split off
the "payload" aka original data.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
uld have this logic inside my DIH script, but then i would need
to read in the stopword.txt file the script, which i would like to avoid, then
again it would probably be the more efficient approach.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
y with
> limit and offset (limit 100 offset 0), it is indexing the records
> properly.
>
> The batchSize is not really fetching 25000 records from the DB, it is still
> try to get all the 12 million rows. I am using MySQL server version 5.0.77.
try batchsize=-1
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 01.02.2010, at 13:27, Lukas Kahwe Smith wrote:
>
> On 29.01.2010, at 15:40, Lukas Kahwe Smith wrote:
>
>> I am still a bit unsure how to handle both the lowercased and the case
>> preserved version:
>>
>> So here are some examples:
>> UBS =&g
On 29.01.2010, at 15:40, Lukas Kahwe Smith wrote:
> I am still a bit unsure how to handle both the lowercased and the case
> preserved version:
>
> So here are some examples:
> UBS => ubs|UBS
> Kreuzstrasse => kreuzstrasse|Kreuzstrasse
>
> So when I type &quo
f "Kreuzstrasse" and with
"kreu" I would get "kreuzstrasse".
Since I do not expect any words to start with a lowercase letter and still
contain some upper case letter we should be fine with this approach.
As in I doubt there would be stuff like "fooBar" which would lead to suggestion
both "foobar" and "fooBar".
How can I achieve this?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
Sure. I guess Lucene doesnt support 2PhaseCommits yet?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
the companies dismax handler and I find
no (or just very few) result, then I want to also include a field that has a
doublemethaphone analyzer on the name. So I just want to append that field to
the qf setting of the request handler defaults.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
On 20.01.2010, at 15:50, Lukas Kahwe Smith wrote:
>
> On 19.01.2010, at 22:52, Lukas Kahwe Smith wrote:
>
>>>> I also want to match multiple fields at once.
>>>
>>> Can you give an example?
>>
>>
>> I enter "Kreuz" but
On 19.01.2010, at 22:52, Lukas Kahwe Smith wrote:
>>> I also want to match multiple fields at once.
>>
>> Can you give an example?
>
>
> I enter "Kreuz" but this could either be part of a persons name or of a
> street name, which are sepa
On 19.01.2010, at 21:55, Otis Gospodnetic wrote:
> Hi Lukas,
>
>
> - Original Message
>
>> From: Lukas Kahwe Smith
>
>> I want to use TermsComponent for both auto complete suggestions but also
>> showing
>
> Is TermsComponent really th
splitter etc.). I can of
course also do an OR query as well. But it would be nice to be able to do:
q=*:foo
and that would simply search all fields against the query "foo".
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
be able to give me this
number more efficiently.
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
90 matches
Mail list logo