[ https://issues.apache.org/jira/browse/SOLR-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670102#action_12670102 ]
Shalin Shekhar Mangar commented on SOLR-1000: --------------------------------------------- Thanks Fergus. One minor thing: {code} while (true) { Map<String, Object> r = getNext(); if (r != null) r = applyTransformer(r); return r; } {code} In the new code the loop is not used at all. The difference is important because Transformers have the ability to skip documents by doing map.put("$skipDoc", true) on this map. If a document is skipped, applyTransformer will return null and we'd like to request a new row from the data source (entity processor in this case). With this change, null will be returned which signals that the DataSource/EntityProcessor has run out of data even though it has not. Except for this, the patch looks great! I'll commit this shortly. > DIH FileListEntityProcessor fileName filters directory names and stops > recursion > --------------------------------------------------------------------------------- > > Key: SOLR-1000 > URL: https://issues.apache.org/jira/browse/SOLR-1000 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler > Affects Versions: 1.3 > Reporter: Fergus McMenemie > Assignee: Shalin Shekhar Mangar > Attachments: SOLR-1000.patch, SOLR-1000.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > I have been trying to find out why DIH in FileListEntityProcessor mode did > not appear to be recursing into subdirectories. Going through > FileListEntityProcessor.java I eventually tumbled to the fact that my > filename filter setting from data-config.xml also applied to directory names. > Now, I feel that the fieldName filter should be applied to files fed into the > parser, it should not be applied to the directory names we are recursing > through. I bodged the code to adjust the behavior so that the "FileName" and > "excludes" attributes of "entity" only apply to filenames and not directory > names. It now recurses though my directory tree only indexing the appropriate > files! I think the new behavior is more standard. > I will submit the a patch once I have constructed one! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.