On Sat, 20 Jan 2007, Ryan McKinley wrote:

: Date: Sat, 20 Jan 2007 19:17:16 -0800
: From: Ryan McKinley <[EMAIL PROTECTED]>
: Reply-To: solr-dev@lucene.apache.org
: To: solr-dev@lucene.apache.org
: Subject: Re: Update Plugins (was Re: Handling disparate data sources in
:     Solr)
:
: >
: > ...what if we bring that idea back, and let people configure it in the
: > solrconfig.xml, using path like names...
: >
: >   <requestParser name="/raw" class="solr.RawPostRequestParser" />
: >   <requestParser name="/multi" class="solr.MultiPartRequestParser" />
: >   <requestParser name="/nostream" class="solr.SimpleRequestParser" />
: >   <requestParser name="/guess" class="solr.UseContentTypeRequestParser" />
: >
: > ...but don't make it a *public* interface ... make it package protected,
: > or maybe even a private static interface of the Dispatch Filter .. either
: > way, don't instantiate instances of it using the plugin-lib ClassLoader,
: > make sure it comes from the WAR to only uses the ones provided out of hte
: > box.


: I'm on board as long as the URL structure is:
:   ${path/from/solr/config}?stream.type=raw

actually the URL i was suggesting was...

    ${parser/path/from/solr/config}${handler/path/from/solr/config}?param=val

...i was trying to avoid keeping the parser name out of the query string,
so we don't have to do any hack parsing of
HttpServletRequest.getQueryString() to get it.

basically if you have this...

  <requestParser name="/raw" class="solr.RawPostRequestParser" />
  <requestParser name="/multi" class="solr.MultiPartRequestParser" />
  <requestParser name="/nostream" class="solr.SimpleRequestParser" />

  <requestHandler name="/update/commit" class="solr.CommitRequestHandler"/>
  <requestHandler name="/update" class="solr.UpdateRequestHandler" />
  <requestHandler name="/xml" class="solr.XmlQueryRequestHandler" />

...then these urls are all valid...

   http://localhost:9999/solr/raw/update?param=val
      ..uses raw post body for update
   http://localhost:9999/solr/multi/update?param=val
      ..uses multipart mime for update
   http://localhost:9999/solr/update?param=val
      ..no requestParser matched path prefix, so default is choosen and
        COntent-Type is used to decide where streams come from.

but if instead my config looks like this...

  <requestParser name="" class="solr.MultiPartRequestParser" />
  <requestParser name="/raw" class="solr.RawPostRequestParser" />

  <requestHandler name="/update/commit" class="solr.CommitRequestHandler"/>
  <requestHandler name="/update" class="solr.UpdateRequestHandler" />
  <requestHandler name="/xml" class="solr.XmlQueryRequestHandler" />

...then these URLs would fail...

   http://localhost:9999/solr/raw/update?param=val
   http://localhost:9999/solr/multi/update?param=val

...because the empty string would match as a parser, but "/raw/update"
and "/multi/update" wouldn't match as requestHandlers (the registration of
"/raw" as a parser would be useless)

this URL would work however...

   http://localhost:9999/solr/update?param=val
      ..treat all requetss as if they have multi-part mime streams

...i use this only as an example of what i'm describing ... not sa an
example of soemthing we shoudl recommend.

The key to all of this being that we'd check parser names against the URL
prefix in order from shortest to longest, then check the rest of the path
as a requestHandler ... if either of those fail, then the filter would
skip the request.

What we would probably recommended is that people map the "guess" request
parser to "/" so that they could put in all of hte options they want on
buffer sizes and such, then map their requestHandlers without a "/"
prefix, and use content types correctly.

if they really had a reason to want to force one type of parsing, they
could register it with a differnet prefix.

  * default URLs stay clean
  * no need for an extra "stream.type" param
  * urls only get ugly if people want them to get ugly because they don't
    want to make their clients set the mime type correctly.




-Hoss

Reply via email to