Re: Commented: (SOLR-216) Improvements to solr.py

Mike Klaas Tue, 29 May 2007 13:45:15 -0700

[reposting to solr-dev as JIRA destroyed my quoting...]


On 29-May-07, at 12:41 PM, Jason Cater wrote:

I've had my solr.py in production use internally for about a monthnow. So, as you can imagine, I've worked through a few oddball bugsthat occasionally pop up. It seems pretty stable for me.

Yes, I agree that it is looking good. Since we would be replacingthe existing implementation completely, I think that it is worthtaking extra care and examining the api choices carefully so we won'thave to replace it or deprecate things in the near future.

I would prefer to have a complete directory structure (i.e.,setup.py, unit tests, samples, etc) instead of just the solr.pyfile. Would anyone see a problem with this?

+1. This would be great--a unittest that could be run against thesolr example would be spectacular!

Also, on some of your comments:
- list comprehensions solely to perform looped execution areharder to parse and slower than explicitly writing a for loop
List comprehensions seem to be a matter of contention for some.However, it's a battle I'm not interested in fighting, so changedit to a for loop.

It is not a matter of contention for me for use in creating a list,but ISTM less clear and less efficient if the purpose is _solely_ toperform a loop:


$ python -m timeit '[i+i for i in xrange(10000)]'
100 loops, best of 3: 1.95 msec per loop

$ python -m timeit 'for i in xrange(10000): i+i'
100 loops, best of 3: 1.38 msec per loop

 - shadowing builtins is generally a bad idea
Any shadowing of builtins was unintentional. Did you see specificexamples? I run the code through pychecker and pylint to try toavoid such cases.


`id` is shadowed in a few places.

- all NamedList's appearing in the output are converted to dicts--this loses information (in particular, it will be unnecessarilyhard for the user to use highlighting/debug data). Using thepython/json response format would prevent this. Not returninghighlight/debug data in the standard response format (and yetproviding said parameters in the query() method) seems odd. Am Imissing something? Oh, they are set as dynamic attributes ofResponse, I see. Definitely needs documentation.
Yes, this needs to be documented. (Please c.f. to my questionabout allowing a complete directory structure.)
- passing fields='' to query() will return all fields, when thedesired return is likely no fields
I've changed the default for fields= to be '*', instead of None or"". This way, passing in 'fields=""' will result in 'fl=' beingpassed to the backend. However, I still don't see the point, aspassing both 'fl=' and 'fl=*' return the exact same set of fields(i.e., "all") on my test setup.

Hmm, what if you pass fields='', score=True? Ideally tha would passfl=score to the backend, bypassing all stored fields.

- it might be better to settle on an api that permits doc/fieldboosts. How about using a tuple as the field name in the field dict?
conn.add_many([{'id': 1, ('field2', 2.33): u"some text"}])
doc boosts could be handled by optionally providing the fielddictas a (<fielddict>, boost) tuple.
I agree. I was not aware of field boosts at the time. I'll codethis change.

Unfortunately, it is still somewhat awkward. In my python client Iend up passing (<name>, <value>, <field boost or None>) everywhere,but that clutters up the api considerably.

It might be worth taking a look at the ruby client to see what Eric'sdone for the api.

- for 2.5+, a cool addition might be:

if sys.version > 2.5
   import contextlib      def batched(solrconn):
          solrconn.begin_batch()
        yield solrconn
        solrconn.end_batch()
  batched = contextlib.contextmanager(batched)

Use as:

with batched(solrconn):
       solrconn.add(...)
       solrconn.add(...)
       solrconn.add(...)


Adding...

Unfortunately, it does push the required python version to 2.4.Personally, I think that requiring 2.4 is not unreasonable, but I'msomewhat of a bleeding edge guy...


-Mike

Re: Commented: (SOLR-216) Improvements to solr.py

Reply via email to