[ https://issues.apache.org/jira/browse/SOLR-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12673570#action_12673570 ]
Shalin Shekhar Mangar commented on SOLR-1020: --------------------------------------------- Karl, would it make sense to use the NamedList format instead of a custom XML one? That way, you can use most of the existing parsing code. Thoughts? > PreAnalyzed field analyzer > -------------------------- > > Key: SOLR-1020 > URL: https://issues.apache.org/jira/browse/SOLR-1020 > Project: Solr > Issue Type: New Feature > Components: Analysis > Affects Versions: 1.3 > Reporter: Karl Wettin > Priority: Minor > Attachments: SOLR-1020.txt > > > An Analyzer that produce a TokenStream based on XML input that contains a > marshalled TokenStream. Also contains static TokenStream XML marshaller. > I kind of pulled this out of my pocket without testing it in a real > environment in order to get some comments on the solution before I add it to > my project. So cosider it a beta-patch. > It use JSR173 XMLStream API available in Java 1.6, compatible with Java 1.5 > and downloadable from https://sjsxp.dev.java.net/ > XSD: > {code:xml} > <?xml version="1.0" encoding="UTF-8"?> > <xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" > xmlns:xs="http://www.w3.org/2001/XMLSchema"> > <xs:element name="tokens" type="tokensType"/> > <xs:complexType name="tokensType"> > <xs:sequence> > <xs:element type="tokenType" name="token"/> > </xs:sequence> > </xs:complexType> > <xs:complexType name="tokenType"> > <xs:sequence> > <xs:element type="xs:int" name="positionIncrement" maxOccurs="1"/> > <xs:element type="xs:string" name="term" minOccurs="1" > maxOccurs="1"/> > <xs:element type="xs:string" name="type" maxOccurs="1"/> > <xs:element type="xs:int" name="startOffset" maxOccurs="1"/> > <xs:element type="xs:int" name="endOffset" maxOccurs="1"/> > <xs:element type="xs:int" name="flags" maxOccurs="1"/> > <xs:element type="payloadType" name="payload" maxOccurs="1"/> > </xs:sequence> > </xs:complexType> > <xs:complexType name="payloadType"> > <xs:choice maxOccurs="1" minOccurs="1"> > <xs:element type="bytesType" name="bytes"/> > <xs:element type="xs:string" name="hex"/> > <xs:element type="xs:string" name="base64"/> > </xs:choice> > </xs:complexType> > <xs:complexType name="bytesType"> > <xs:sequence> > <xs:element type="xs:byte" name="byte" maxOccurs="unbounded" > minOccurs="1"/> > </xs:sequence> > </xs:complexType> > </xs:schema> > {code} > Even though I've added a couple of variants to how to handle a Payload in the > XSD only <hex> is supported. > Example XML: > {code:xml} > <tokens> > <token> > <positionIncrement>1</positionIncrement> > <term>term</term> > <type>type</type> > <startOffset>0</startOffset> > <endOffset>3</endOffset> > <flags>65535</flags> > <payload><hex>fffefd</hex></payload> > </token> > </tokens> > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.