Fergus, Implementing wildcard (//tagname) is definitely possible. I would love to see it working. But if you wish to take a dig at it I shall do whatever I can to help.
>What is the use case that makes flow though so useful? We do not know to which forEach xpath a given field is associated with. Currently you can clean up the fields using a transformer. There is an implicit field '$forEach' which tells you about the xpath tag for each record that is emitted. >The recently added comments in XPathRecordReader are a great help and I >was planning to add more. Might this be an issue? I would love to have it. Give a patch and I shall commit it. XPathRecordReader is a blackbox and AFAIK I am the only one who knows it. I would love to have more eyes on that. >I would like to open a JIRA for improving XPathRecordReader. Please go ahead. You can paste the contents of this mail in the list . There may be others with similar ideas Noble. -----Original Message----- >>>Noble >>>> >>>>/document/category/item | /document/category >>>> >>>>means there are two paths which triggers a new doc (it is possible to >>>>have more). Whenever it encounters the closing tag of that xpath , it >>>>emits all the fields it collected since the opening of the same tag. >>>>after that it clears all the fields it collected since the opening of >>>>the tag. >>>> >>>>If there are fields it collected before opening of the same tag, it >>>>retains it >>> >>> >>> Nice and clear, but that is not what I see. >>> >>> With my test case with forEach="/record | /record/mediaBlock" >>> I see that for each /record/mediaBlock "document" indexed it contains > >>> all fields from the parent "/record" document as well. A search over >>> mediaBlock s returns lots of extra fields from the parent which did >>> not have the commonField attribute. I will try and produce a testcase >> >>yes it does . . /record/mediaBlock will have all the fields collected >>from /record as well. *****It is by design****** > >Oh! > >I had always considered it a bug or at least a limitation. After all if >we have the "commonField" attribute why do we need an automatic flow >through of all collected fields from parent nodes. This feature is as >far as I can see undocumented and at the same time unintuitive. >It also, in my case, causes tons more information to be indexed than is >needed. > >I have spent a while thinking through possible use cases. My use case >involves having documents we want to search as a whole and behave as >normal. At the same time these documents contain inner sections we wish >to treat as sub-documents; in my case I a have pictures with associated >captions which I wish to search separately. Having indexed the documents >with forEach="/record | /record/mediaBlock" my picture search works >nicely but I have a nasty side effect when performing searches over the >rest of the document. Because fields from the parent node are also >present in the children, when I search for any text the same document >gets returned many times, once due to the text in the parent node and >again for each picture placed in the document. I have a work around for >this issue but have always considered it a bug. > >What is the use case that makes flow though so useful? > >I had just started playing with the code to see how easy this would be >to change. The recently added comments in XPathRecordReader are a great >help and I was planning to add more. Might this be an issue? > >I have noted, while lurking on the solr mail lists, that requests for >this type of functionality keep coming up; to be able to restrict >searches to a sub section of a document. I have really needed this sort >of thinks many times with the type of stuff I work with. > >My other planned activity was to see how easy xpaths such as //tagname >would be implement. Since my latest data-config.xml looks like:- > ><field column="para32" name="text" xpath="/record/address/para" >flatten="true" /> ><field column="para40" name="text" xpath="/record/authoredBy/para" >flatten="true" /> ><field column="para43" name="text" >xpath="/record/dataGroup/address/para" flatten="true" /> ><field column="para47" name="text" >xpath="/record/dataGroup/keyPersonnel/doubleList/first/para" >flatten="true" /> ><field column="para49" name="text" >xpath="/record/dataGroup/keyPersonnel/doubleList/second/para" >flatten="true" /> ><field column="para50" name="text" >xpath="/record/dataGroup/keyPersonnel/para" flatten="true" /> ><field column="para51" name="text" xpath="/record/dataGroup/para" >flatten="true" /> ><field column="para57" name="text" >xpath="/record/doubleList/first/para" flatten="true" /> ><field column="para59" name="text" >xpath="/record/doubleList/second/para" flatten="true" /> ><field column="para63" name="text" >xpath="/record/keyPersonnel/doubleList/first/para" flatten="true" /> ><field column="para65" name="text" >xpath="/record/keyPersonnel/doubleList/second/para" flatten="true" /> ><field column="para68" name="text" xpath="/record/list/listItem/para" >flatten="true" /> ><field column="para75" name="text" >xpath="/record/mediaBlock/doubleList/first/para" flatten="true" /> ><field column="para77" name="text" >xpath="/record/mediaBlock/doubleList/second/para" flatten="true" /> ><field column="para172" name="text" xpath="/record/noteGroup/note/para" >flatten="true" /> <field column="para174" name="text" >xpath="/record/para" flatten="true" /> <field column="para179" >name="text" >xpath="/record/relatedInfo/list/listItem/relatedArticle/para" >flatten="true" /> <field column="para184" name="text" >xpath="/record/sect1/address/dataGroup/para" flatten="true" /> <field >column="para185" name="text" xpath="/record/sect1/address/para" >flatten="true" /> <field column="para195" name="text" >xpath="/record/sect1/dataGroup/address/para" flatten="true" /> <field >column="para199" name="text" >xpath="/record/sect1/dataGroup/keyPersonnel/doubleList/first/para" >flatten="true" /> <field column="para201" name="text" >xpath="/record/sect1/dataGroup/keyPersonnel/doubleList/second/para" >flatten="true" /> <field column="para202" name="text" >xpath="/record/sect1/dataGroup/keyPersonnel/para" flatten="true" /> ><field column="para203" name="text" >xpath="/record/sect1/dataGroup/para" flatten="true" /> <field >column="para208" name="text" >xpath="/record/sect1/doubleList/first/para" flatten="true" /> <field >column="para212" name="text" >xpath="/record/sect1/doubleList/second/list/listItem/para" >flatten="true" /> <field column="para213" name="text" >xpath="/record/sect1/doubleList/second/para" flatten="true" /> <field >column="para217" name="text" >xpath="/record/sect1/keyPersonnel/doubleList/first/para" flatten="true" >/> <field column="para219" name="text" >xpath="/record/sect1/keyPersonnel/doubleList/second/para" >flatten="true" /> <field column="para220" name="text" >xpath="/record/sect1/keyPersonnel/para" flatten="true" /> <field >column="para225" name="text" >xpath="/record/sect1/list/listItem/list/listItem/para" flatten="true" >/> <field column="para226" name="text" >xpath="/record/sect1/list/listItem/para" flatten="true" /> <field >column="para240" name="text" xpath="/record/sect1/para" flatten="true" >/> <field column="para244" name="text" >xpath="/record/sect1/sect2/doubleList/first/para" flatten="true" /> ><field column="para246" name="text" >xpath="/record/sect1/sect2/doubleList/second/para" flatten="true" /> ><field column="para251" name="text" >xpath="/record/sect1/sect2/list/listItem/list/listItem/para" >flatten="true" /> <field column="para252" name="text" >xpath="/record/sect1/sect2/list/listItem/para" flatten="true" /> <field >column="para258" name="text" >xpath="/record/sect1/sect2/noteGroup/note/para" flatten="true" /> ><field column="para259" name="text" xpath="/record/sect1/sect2/para" >flatten="true" /> <field column="para265" name="text" >xpath="/record/sect1/sect2/sect3/list/listItem/list/listItem/para" >flatten="true" /> <field column="para266" name="text" >xpath="/record/sect1/sect2/sect3/list/listItem/para" flatten="true" /> ><field column="para271" name="text" >xpath="/record/sect1/sect2/sect3/para" flatten="true" /> <field >column="para275" name="text" >xpath="/record/sect1/sect2/sect3/sect4/list/listItem/para" >flatten="true" /> <field column="para279" name="text" >xpath="/record/sect1/sect2/sect3/sect4/para" flatten="true" /> <field >column="para284" name="text" >xpath="/record/sect1/sect2/sect3/sect4/sect5/para" flatten="true" /> ><field column="para295" name="text" >xpath="/record/sect1/sect2/sect3/table/tgroup/tbody/row/entry/noteGroup/ >note/para" flatten="true" /> <field column="para297" name="text" >xpath="/record/sect1/sect2/sect3/table/tgroup/tbody/row/entry/para" >flatten="true" /> <field column="para301" name="text" >xpath="/record/sect1/sect2/sect3/table/tgroup/thead/row/entry/para" >flatten="true" /> <field column="para312" name="text" >xpath="/record/sect1/sect2/table/tgroup/tbody/row/entry/list/listItem/pa >ra" flatten="true" /> <field column="para315" name="text" >xpath="/record/sect1/sect2/table/tgroup/tbody/row/entry/noteGroup/note/p >ara" flatten="true" /> <field column="para316" name="text" >xpath="/record/sect1/sect2/table/tgroup/tbody/row/entry/noteGroup/para" >flatten="true" /> <field column="para318" name="text" >xpath="/record/sect1/sect2/table/tgroup/tbody/row/entry/para" >flatten="true" /> <field column="para322" name="text" >xpath="/record/sect1/sect2/table/tgroup/thead/row/entry/para" >flatten="true" /> <field column="para341" name="text" >xpath="/record/sect1/table/tgroup/tbody/row/entry/noteGroup/note/para" >flatten="true" /> <field column="para342" name="text" >xpath="/record/sect1/table/tgroup/tbody/row/entry/noteGroup/para" >flatten="true" /> <field column="para344" name="text" >xpath="/record/sect1/table/tgroup/tbody/row/entry/para" flatten="true" >/> <field column="para348" name="text" >xpath="/record/sect1/table/tgroup/thead/row/entry/para" flatten="true" >/> <field column="para371" name="text" >xpath="/record/table/tgroup/tbody/row/entry/noteGroup/note/para" >flatten="true" /> <field column="para373" name="text" >xpath="/record/table/tgroup/tbody/row/entry/para" flatten="true" /> ><field column="para377" name="text" >xpath="/record/table/tgroup/thead/row/entry/para" flatten="true" /> > >Which is nuts! > >I would like to open a JIRA for improving XPathRecordReader. > >Regds Fergus.