[jira] Commented: (SOLR-216) Improvements to solr.py

Mike Klaas (JIRA) Tue, 29 May 2007 13:37:40 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499913
 ]

Mike Klaas commented on SOLR-216:
---------------------------------

On 29-May-07, at 12:41 PM, Jason Cater wrote:
I've had my solr.py in production use internally for about a month now. So, as 
you can imagine, I've worked through a few oddball bugs that occasionally pop 
up.  It seems pretty stable for me.

Yes, I agree that it is looking good.  Since we would be replacing the existing 
implementation completely, I think that it is worth taking extra care and 
examining the api choices carefully so we won't have to replace it or deprecate 
things in the near future.

I would prefer to have a complete directory structure (i.e., setup.py, unit 
tests, samples, etc) instead of just the solr.py file.  Would anyone see a 
problem with this?

+1.  This would be great--a unittest that could be run against the solr example 
would be spectacular!

Also, on some of your comments:

 - list comprehensions solely to perform looped execution are harder to parse 
and slower than explicitly writing a for loop

List comprehensions seem to be a matter of contention for some.  However, it's 
a battle I'm not interested in fighting, so changed it to a for loop.

It is not a matter of contention for me for use in creating a list, but ISTM 
less clear and less efficient if the purpose is _solely_ to perform a loop:

$ python -m timeit '[i+i for i in xrange(10000)]'
100 loops, best of 3: 1.95 msec per loop

$ python -m timeit 'for i in xrange(10000): i+i'
100 loops, best of 3: 1.38 msec per loop

 - shadowing builtins is generally a bad idea
Any shadowing of builtins was unintentional.  Did you see specific examples?  I 
run the code through pychecker and pylint to try to avoid such cases.

`id` is shadowed in a few places.  

 - all NamedList's appearing in the output are converted to dicts--this loses 
information (in particular, it will be unnecessarily hard for the user to use 
highlighting/debug data).  Using the python/json response format would prevent 
this.  Not returning highlight/debug data in the standard response format (and 
yet providing said parameters in the query() method) seems odd.  Am I missing 
something?  Oh, they are set as dynamic attributes of Response, I see.  
Definitely needs documentation.

Yes, this needs to be documented.  (Please c.f. to my question about allowing a 
complete directory structure.)

 - passing fields='' to query() will return all fields, when the desired return 
is likely no fields

I've changed the default for fields= to be '*', instead of None or "".  This 
way, passing in 'fields=""' will result in 'fl=' being passed to the backend.  
However, I still don't see the point, as passing both 'fl=' and 'fl=*' return 
the exact same set of fields (i.e., "all") on my test setup.

Hmm, what if you pass fields='', score=True?  Ideally tha would pass fl=score 
to the backend, bypassing all stored fields.

 - it might be better to settle on an api that permits doc/field boosts.  How 
about using a tuple as the field name in the field dict?

conn.add_many([{'id': 1, ('field2', 2.33): u"some text"}])

doc boosts could be handled by optionally providing the fielddict as a 
(<fielddict>, boost) tuple.

I agree. I was not aware of field boosts at the time. I'll code this change.

Unfortunately, it is still somewhat awkward.  In my python client I end up 
passing (<name>, <value>, <field boost or None>) everywhere, but that clutters 
up the api considerably.

It might be worth taking a look at the ruby client to see what Eric's done for 
the api.

- for 2.5+, a cool addition might be:

if sys.version > 2.5
   import contextlib      def batched(solrconn):
          solrconn.begin_batch()
        yield solrconn
        solrconn.end_batch()
  batched = contextlib.contextmanager(batched)

Use as:

with batched(solrconn):
       solrconn.add(...)
       solrconn.add(...)
       solrconn.add(...)

Adding...

Unfortunately, it does push the required python version to 2.4.  Personally, I 
think that requiring 2.4 is not unreasonable, but I'm somewhat of a bleeding 
edge guy...

[incidently, it would be best to keep comments in JIRA, for posterity]

> Improvements to solr.py
> -----------------------
>
>                 Key: SOLR-216
>                 URL: https://issues.apache.org/jira/browse/SOLR-216
>             Project: Solr
>          Issue Type: Improvement
>          Components: clients - python
>    Affects Versions: 1.2
>            Reporter: Jason Cater
>            Assignee: Mike Klaas
>            Priority: Trivial
>         Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level 
> functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-216) Improvements to solr.py

Reply via email to