[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

Grant Ingersoll (JIRA) Fri, 11 Dec 2009 12:54:44 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789520#action_12789520
 ]


Grant Ingersoll commented on SOLR-1131:
---------------------------------------

bq. I'm still -1 on the way this patch deals with the "optimization" issue. I'd 
like to see evidence that it makes sense to not use split and trim.

My tests show it to be at least 7 times faster.  But this should be obvious 
from static analysis, too.  First of all, String.split() uses a regex which 
then makes a pass through the underlying character array.  Then, trim has to go 
back through and analyze the char array too, not to mention the extra String 
creations.  The optimized version here makes one pass and deals solely at the 
char array level and only has to do the substring, which I think can be 
optimized by the JVM to be a copy on write.

{code}

  public void testDistPerf() throws Exception {
    String [] input = new String[1000000];
    Random random = new Random();
    for (int i = 0; i < input.length; i++){
      input[i] = random.nextInt() + ", " + random.nextInt();
    }
    String [] out = new String[2];
    long time = 0;
    long start = System.currentTimeMillis();
    for (int j = 0; j < 50; j++) {
      for (int i = 0; i < input.length; i++){
        split(input[i], out, 2);
      }
    }
    time = (System.currentTimeMillis() - start);
    System.out.println("Time: " + time);
    time = 0;
    start = System.currentTimeMillis();
    for (int j = 0; j < 50; j++) {
      for (int i = 0; i < input.length; i++){
        DistanceUtils.parsePoint(out, input[i], 2);
      }
    }
    time = (System.currentTimeMillis() - start);
    System.out.println("Time: " + time);
  }

  private String[] split(String externalVal, String[] out, int dimension) {
    out = externalVal.split(",");
    if (out.length != dimension) {
      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, 
"incompatible dimension (" + dimension +
              ") and values (" + externalVal + ").  Only " + out.length + " 
values specified");
    }
    for (int j = 0; j < out.length; j++) {
      out[j] = out[j].trim();
    }
    return out;
  }
{code}

> Allow a single field type to index multiple fields
> --------------------------------------------------
>
>                 Key: SOLR-1131
>                 URL: https://issues.apache.org/jira/browse/SOLR-1131
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>            Assignee: Grant Ingersoll
>             Fix For: 1.5
>
>         Attachments: SOLR-1131-IndexMultipleFields.patch, 
> SOLR-1131.Mattmann.121009.patch.txt, SOLR-1131.Mattmann.121109.patch.txt, 
> SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch, 
> SOLR-1131.patch, SOLR-1131.patch, SOLR-1131.patch
>
>
> In a few special cases, it makes sense for a single "field" (the concept) to 
> be indexed as a set of Fields (lucene Field).  Consider SOLR-773.  The 
> concept "point" may be best indexed in a variety of ways:
>  * geohash (sincle lucene field)
>  * lat field, lon field (two double fields)
>  * cartesian tiers (a series of fields with tokens to say if it exists within 
> that region)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1131) Allow a single field type to index multiple fields

Reply via email to