[ 
https://issues.apache.org/jira/browse/SOLR-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670102#action_12670102
 ] 

Shalin Shekhar Mangar commented on SOLR-1000:
---------------------------------------------

Thanks Fergus.

One minor thing:
{code}
while (true) {
      Map<String, Object> r = getNext();
      if (r != null) r = applyTransformer(r);
        return r;
    }
{code}

In the new code the loop is not used at all. The difference is important 
because Transformers have the ability to skip documents by doing 
map.put("$skipDoc", true) on this map. If a document is skipped, 
applyTransformer will return null and we'd like to request a new row from the 
data source (entity processor in this case). With this change, null will be 
returned which signals that the DataSource/EntityProcessor has run out of data 
even though it has not.

Except for this, the patch looks great! I'll commit this shortly.

> DIH FileListEntityProcessor fileName filters directory names and stops 
> recursion 
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-1000
>                 URL: https://issues.apache.org/jira/browse/SOLR-1000
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Fergus McMenemie
>            Assignee: Shalin Shekhar Mangar
>         Attachments: SOLR-1000.patch, SOLR-1000.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have been trying to find out why DIH in FileListEntityProcessor mode did 
> not appear to be recursing into subdirectories. Going through 
> FileListEntityProcessor.java I eventually tumbled to the fact that my 
> filename filter setting from data-config.xml also applied to directory names.
> Now, I feel that the fieldName filter should be applied to files fed into the 
> parser, it should not be applied to the directory names we are recursing 
> through. I bodged the code to adjust the behavior so that the "FileName" and 
> "excludes" attributes of "entity" only apply to filenames and not directory 
> names. It now recurses though my directory tree only indexing the appropriate 
> files! I think the new behavior is more standard.
> I will submit the a patch once I have constructed one!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to