Mike,
I've had my solr.py in production use internally for about a month now.
So, as you can imagine, I've worked through a few oddball bugs that
occasionally pop up. It seems pretty stable for me.
I'm planning to upload a new file attachment to this issue containing my
changes, plus fixing the bug reports that were filed against my open
ticket. But first, a few quick questions....
I would prefer to have a complete directory structure (i.e., setup.py,
unit tests, samples, etc) instead of just the solr.py file. Would
anyone see a problem with this?
Also, on some of your comments:
- list comprehensions solely to perform looped execution are harder to parse
and slower than explicitly writing a for loop
List comprehensions seem to be a matter of contention for some.
However, it's a battle I'm not interested in fighting, so changed it to
a for loop.
- shadowing builtins is generally a bad idea
Any shadowing of builtins was unintentional. Did you see specific
examples? I run the code through pychecker and pylint to try to avoid
such cases.
- SolrConnection is an old-style class, but Response is new-style
This was a holdover from the old SolrConnection class I copied from. I'm
fixing this.
functionality:
- why are 'status'/'QTime' returned as floats?
This was just a misunderstanding on my part of what QTime was actually
returning. Fixing.
- all NamedList's appearing in the output are converted to dicts--this loses
information (in particular, it will be unnecessarily hard for the user to use
highlighting/debug data). Using the python/json response format would prevent
this. Not returning highlight/debug data in the standard response format (and
yet providing said parameters in the query() method) seems odd. Am I missing
something? Oh, they are set as dynamic attributes of Response, I see.
Definitely needs documentation.
Yes, this needs to be documented. (Please c.f. to my question about
allowing a complete directory structure.)
- passing fields='' to query() will return all fields, when the desired return
is likely no fields
I've changed the default for fields= to be '*', instead of None or "".
This way, passing in 'fields=""' will result in 'fl=' being passed to
the backend. However, I still don't see the point, as passing both
'fl=' and 'fl=*' return the exact same set of fields (i.e., "all") on my
test setup.
- it might be better to settle on an api that permits doc/field boosts. How
about using a tuple as the field name in the field dict?
conn.add_many([{'id': 1, ('field2', 2.33): u"some text"}])
doc boosts could be handled by optionally providing the fielddict as a
(<fielddict>, boost) tuple.
I agree. I was not aware of field boosts at the time. I'll code this change.
- for 2.5+, a cool addition might be:
if sys.version > 2.5
import contextlib
def batched(solrconn):
solrconn.begin_batch()
yield solrconn
solrconn.end_batch()
batched = contextlib.contextmanager(batched)
Use as:
with batched(solrconn):
solrconn.add(...)
solrconn.add(...)
solrconn.add(...)
Adding...