Hoss, I apologize for spamming the issue comments. I just thought it
would be the appropriate place to record all critical discussions and
decisions related to the issue so that people who read it later get to
know all arguments made.

In general, emitting the cache header based on whether the Index is
modified or not can have problem with these cases:
1. Solr Statistics page (statistics change even though index doesn't)
2. Distributed Search (results on another shard may change and this
index remains same)
3. Partial Results in case of timeouts
4. DataImportHandler - the status page keeps changing even though the
IndexReader remains the same (until committed)

I think we both are in agreement that this should configurable on a
per-handler basis instead of on a global basis.

To your point #2, it is true that Solr should be emitting the cache
headers to conform to HTTP spec but, the strategy used to compute
those headers should be varying for different handlers. If we take a
leaf out of the servlet API, the servlets are the ones which decide
the lastModifiedSince and expires headers and not Tomcat. Similiarily,
here the handler which is the actual one writing out the response,
should be the one deciding on the strategy.

IMHO, the approach taken by SOLR-505 and SOLR-506 is good enough. So
using, SOLR-505 handlers can decide to emit/suppress cache headers on
a per-response basis (e.g. partial resutls, errors etc.). Using
SOLR-506, the end user can enable/disable emitting cache headers
per-handler. We can emit cache headers as the default behavior for
SearchHandler, SpellCheckerHandler and MoreLikeThisHandler to begin
with.

What do you think?

On Tue, Mar 18, 2008 at 11:25 PM, Hoss Man (JIRA) <[EMAIL PROTECTED]> wrote:
>
>     [ 
> https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579953#action_12579953
>  ]
>
>
>  Hoss Man commented on SOLR-127:
>  -------------------------------
>
>  For the record: most of this discussion should have happened on the solr-dev 
> list, not in the issue comments ... but i would like to address some points, 
> so I'll do it here since this is where the discussion is.
>
>  1) It's true, there is no way to configure caching on a per request handler 
> basis -- if you look at the history of the issue we looked into that but 
> because of the necessary API changes we scaled back the scope of the patch -- 
> it can be done, it just needs more thought into how to do it and people 
> interested in working on it.
>
>  2) there is no doubt in my mind that having the cache awareness code on by 
> default is the right approach moving forward.  These options don't cause Solr 
> do do any caching, or to force any external caches to cache the pages -- they 
> only result in Solr behaving correctly according to the HTTP spec sections 
> relating to cache headers:
>    * *if* a request is made to Solr via an HTTP cache that cache will receive 
> headers it can use to decide if/how-long to cache the response
>    * *if* Solr receives a request with cache validation information then it 
> responds with a 304
>  if you don't want that behavior then either don't access Solr via a cache, 
> or explicitly set the <httpCaching never304="true"> option; but the default 
> behavior for people who are upgrading from 1.2 should be for Solr to emit 
> Correct headers and to respect validation requests.  Requiring Solr users to 
> explicitly turn on an option to get Solr to emit correct Caching headers 
> would be like requiring them to explicitly set an option to get well formed 
> XML instead of invalid XML -- the default should be the one that behaves the 
> most correctly.
>
>  I admit however: this is a notable enough change that it should be mentioned 
> in the "Upgrading from 1.2" section of CHANGES.txt -- I will add that.
>
>  3) if other pending patches attached to other issues have poor behavior as a 
> result of the caching code, the appropriate place to discuss that is in those 
> issue -- the solution may be to mark those issues dependent on a new issue to 
> add the API hooks for request handlers to suppress caching (that's a good 
> idea in general) but it's also possible that there are 
> better/safer/more-logical solutions specific to those patches ... if the 
> DataImportHandler is having problems because the caching code, i'm guessing 
> it's because people use it to trigger updates using an HTTP GET -- that 
> violates the semantics of GET and making work arounds in the the HttpCaching 
> code to allow for that is a bad idea.
>
>  4) saying only the "/select" handler should get it's responses cached is 
> missleading -- under Solr 1.3 there won't be anything special about /select 
> ... any handler name can be used for queries, and any handler name can be 
> used for updates ... if you are issuing a request that modifies the index, 
> you should be sending a POST and no caching headers (or validation) will be 
> done by Solr regardless of configuration.
>
>  As I said, discussion about the general topic of HTTP Caching, Solr, and 
> what the defaults should be should really happen on the solr-dev list ... if 
> there are any further comments let's please conduct them there and then 
> open/update whatever issues we need to once a consensus has been reached.
>
>
>
>
>  > Make Solr more friendly to external HTTP caches
>  > -----------------------------------------------
>  >
>  >                 Key: SOLR-127
>  >                 URL: https://issues.apache.org/jira/browse/SOLR-127
>  >             Project: Solr
>  >          Issue Type: Wish
>  >            Reporter: Hoss Man
>  >            Assignee: Hoss Man
>  >             Fix For: 1.3
>  >
>  >         Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch
>  >
>  >
>  > an offhand comment I saw recently reminded me of something that really 
> bugged me about the serach solution i used *before* Solr -- it didn't play 
> nicely with HTTP caches that might be sitting in front of it.
>  > at the moment, Solr doesn't put in particularly usefull info in the HTTP 
> Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
> requests with a 400, and doesn't do anything special with If-Modified-Since.
>  > t the very least, we can set a Last-Modified based on when the current 
> IndexReder was open (if not the Date on the IndexReader) and use the same 
> info to determing how to respond to If-Modified-Since requests.
>  > (for the record, i think the reason this hasn't occured to me in the 2+ 
> years i've been using Solr, is because with the internal caching, i've yet to 
> need to put a proxy cache in front of Solr)
>
>  --
>  This message is automatically generated by JIRA.
>  -
>  You can reply to this email to add a comment to the issue online.
>
>



-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to