[
https://issues.apache.org/jira/browse/SOLR-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670102#action_12670102
]
Shalin Shekhar Mangar commented on SOLR-1000:
---------------------------------------------
Thanks Fergus.
One minor thing:
{code}
while (true) {
Map<String, Object> r = getNext();
if (r != null) r = applyTransformer(r);
return r;
}
{code}
In the new code the loop is not used at all. The difference is important
because Transformers have the ability to skip documents by doing
map.put("$skipDoc", true) on this map. If a document is skipped,
applyTransformer will return null and we'd like to request a new row from the
data source (entity processor in this case). With this change, null will be
returned which signals that the DataSource/EntityProcessor has run out of data
even though it has not.
Except for this, the patch looks great! I'll commit this shortly.
> DIH FileListEntityProcessor fileName filters directory names and stops
> recursion
> ---------------------------------------------------------------------------------
>
> Key: SOLR-1000
> URL: https://issues.apache.org/jira/browse/SOLR-1000
> Project: Solr
> Issue Type: Improvement
> Components: contrib - DataImportHandler
> Affects Versions: 1.3
> Reporter: Fergus McMenemie
> Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-1000.patch, SOLR-1000.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> I have been trying to find out why DIH in FileListEntityProcessor mode did
> not appear to be recursing into subdirectories. Going through
> FileListEntityProcessor.java I eventually tumbled to the fact that my
> filename filter setting from data-config.xml also applied to directory names.
> Now, I feel that the fieldName filter should be applied to files fed into the
> parser, it should not be applied to the directory names we are recursing
> through. I bodged the code to adjust the behavior so that the "FileName" and
> "excludes" attributes of "entity" only apply to filenames and not directory
> names. It now recurses though my directory tree only indexing the appropriate
> files! I think the new behavior is more standard.
> I will submit the a patch once I have constructed one!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.