Hi Steve,
> you never set exhausted to false, and when the filter got reused, *it > incorrectly carried state from the previous document.* Thanks for replying, but I am not able to understand this. With Regards Aman Tandon On Fri, Jun 19, 2015 at 10:25 AM, Steve Rowe <sar...@gmail.com> wrote: > Hi Aman, > > The admin UI screenshot you linked to is from an older version of Solr - > what version are you using? > > Lots of extraneous angle brackets and asterisks got into your email and > made for a bunch of cleanup work before I could read or edit it. In the > future, please put your code somewhere people can easily read it and > copy/paste it into an editor: into a github gist or on a paste service, etc. > > Looks to me like your use of “exhausted” is unnecessary, and is likely the > cause of the problem you saw (only one document getting processed): you > never set exhausted to false, and when the filter got reused, it > incorrectly carried state from the previous document. > > Here’s a simpler version that’s hopefully more correct and more efficient > (2 fewer copies from the StringBuilder to the final token). Note: I didn’t > test it: > > https://gist.github.com/sarowe/9b9a52b683869ced3a17 > > Steve > www.lucidworks.com > > > On Jun 18, 2015, at 11:33 AM, Aman Tandon <amantandon...@gmail.com> > wrote: > > > > Please help, what wrong I am doing here. please guide me. > > > > With Regards > > Aman Tandon > > > > On Thu, Jun 18, 2015 at 4:51 PM, Aman Tandon <amantandon...@gmail.com> > > wrote: > > > >> Hi, > >> > >> I created a *token concat filter* to concat all the tokens from token > >> stream. It creates the concatenated token as expected. > >> > >> But when I am posting the xml containing more than 30,000 documents, > then > >> only first document is having the data of that field. > >> > >> *Schema:* > >> > >> *<field name="titlex" type="text" indexed="true" stored="false" > >>> required="false" omitNorms="false" multiValued="false" />* > >> > >> > >> > >> > >> > >> > >>> *<fieldType name="text" class="solr.TextField" > >>> positionIncrementGap="100">* > >>> * <analyzer type="index">* > >>> * <charFilter class="solr.HTMLStripCharFilterFactory"/>* > >>> * <tokenizer class="solr.StandardTokenizerFactory"/>* > >>> * <filter class="solr.WordDelimiterFilterFactory" > >>> generateWordParts="1" generateNumberParts="1" catenateWords="0" > >>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>* > >>> * <filter class="solr.LowerCaseFilterFactory"/>* > >>> * <filter class="solr.ShingleFilterFactory" maxShingleSize="3" > >>> outputUnigrams="true" tokenSeparator=""/>* > >>> * <filter class="solr.SnowballPorterFilterFactory" > >>> language="English" protected="protwords.txt"/>* > >>> * <filter > >>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>* > >>> * <filter class="solr.SynonymFilterFactory" > >>> synonyms="stemmed_synonyms_text_prime_ex_index.txt" ignoreCase="true" > >>> expand="true"/>* > >>> * </analyzer>* > >>> * <analyzer type="query">* > >>> * <tokenizer class="solr.StandardTokenizerFactory"/>* > >>> * <filter class="solr.SynonymFilterFactory" > >>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>* > >>> * <filter class="solr.StopFilterFactory" ignoreCase="true" > >>> words="stopwords_text_prime_search.txt" > enablePositionIncrements="true" />* > >>> * <filter class="solr.WordDelimiterFilterFactory" > >>> generateWordParts="1" generateNumberParts="1" catenateWords="0" > >>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>* > >>> * <filter class="solr.LowerCaseFilterFactory"/>* > >>> * <filter class="solr.SnowballPorterFilterFactory" > >>> language="English" protected="protwords.txt"/>* > >>> * <filter > >>> class="com.xyz.analysis.concat.ConcatenateWordsFilterFactory"/>* > >>> * </analyzer>** </fieldType>* > >> > >> > >> Please help me, The code for the filter is as follows, please take a > look. > >> > >> Here is the picture of what filter is doing > >> <http://i.imgur.com/THCsYtG.png?1> > >> > >> The code of concat filter is : > >> > >> *package com.xyz.analysis.concat;* > >>> > >>> *import java.io.IOException;* > >>> > >>> > >>>> *import org.apache.lucene.analysis.TokenFilter;* > >>> > >>> *import org.apache.lucene.analysis.TokenStream;* > >>> > >>> *import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;* > >>> > >>> *import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;* > >>> > >>> *import > >>>> > org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;* > >>> > >>> *import org.apache.lucene.analysis.tokenattributes.TypeAttribute;* > >>> > >>> > >>>> *public class ConcatenateWordsFilter extends TokenFilter {* > >>> > >>> > >>>> * private CharTermAttribute charTermAttribute = > >>>> addAttribute(CharTermAttribute.class);* > >>> > >>> * private OffsetAttribute offsetAttribute = > >>>> addAttribute(OffsetAttribute.class);* > >>> > >>> * PositionIncrementAttribute posIncr = > >>>> addAttribute(PositionIncrementAttribute.class);* > >>> > >>> * TypeAttribute typeAtrr = addAttribute(TypeAttribute.class);* > >>> > >>> > >>>> * private StringBuilder stringBuilder = new StringBuilder();* > >>> > >>> * private boolean exhausted = false;* > >>> > >>> > >>>> * /*** > >>> > >>> * * Creates a new ConcatenateWordsFilter* > >>> > >>> * * @param input TokenStream that will be filtered* > >>> > >>> * */* > >>> > >>> * public ConcatenateWordsFilter(TokenStream input) {* > >>> > >>> * super(input);* > >>> > >>> * }* > >>> > >>> > >>>> * /*** > >>> > >>> * * {@inheritDoc}* > >>> > >>> * */* > >>> > >>> * @Override* > >>> > >>> * public final boolean incrementToken() throws IOException {* > >>> > >>> * while (!exhausted && input.incrementToken()) {* > >>> > >>> * char terms[] = charTermAttribute.buffer();* > >>> > >>> * int termLength = charTermAttribute.length();* > >>> > >>> * if(typeAtrr.type().equals("<ALPHANUM>")){* > >>> > >>> * stringBuilder.append(terms, 0, termLength);* > >>> > >>> * }* > >>> > >>> * charTermAttribute.copyBuffer(terms, 0, termLength);* > >>> > >>> * return true;* > >>> > >>> * }* > >>> > >>> > >>>> * if (!exhausted) {* > >>> > >>> * exhausted = true;* > >>> > >>> * String sb = stringBuilder.toString();* > >>> > >>> * System.err.println("The Data got is "+sb);* > >>> > >>> * int sbLength = sb.length();* > >>> > >>> * //posIncr.setPositionIncrement(0);* > >>> > >>> * charTermAttribute.copyBuffer(sb.toCharArray(), 0, sbLength);* > >>> > >>> * offsetAttribute.setOffset(offsetAttribute.startOffset(), > >>>> offsetAttribute.startOffset()+sbLength);* > >>> > >>> * stringBuilder.setLength(0);* > >>> > >>> * //typeAtrr.setType("CONCATENATED");* > >>> > >>> * return true;* > >>> > >>> * }* > >>> > >>> * return false;* > >>> > >>> * }* > >>> > >>> *}* > >>> > >>> > >> > >> With Regards > >> Aman Tandon > >> > >