Hi Eric Below a part of the managed-schema. There is 1k section* fields. The second experience, I removed the copyField, droped the collection and re-indexed the whole. To mesure the index size, I went to solr-cloud and looked in the cloud part: 40GO per shard. I also look at the folder size. I made some tests and the _text_ field is indexed.
<field name="_text_" type="text_fr" indexed="true" stored="false" multiValued="true"/> <dynamicField name="section*" type="text_fr" indexed="true" stored="true" multiValued="true"/> <copyField source="section*" dest="_text_"/> <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="\p{Punct}" replacement=" " replace="all"/> <filter class="solr.ICUFoldingFilterFactory"/> <!-- removes l', etc --> <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_fr.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fr.txt" format="snowball" /> <filter class="solr.FrenchLightStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms-fr.txt" ignoreCase="true" expand="true"/> <filter class="solr.PatternReplaceFilterFactory" pattern="\p{Punct}" replacement=" " replace="all"/> <filter class="solr.ICUFoldingFilterFactory"/> <!-- removes l', etc --> <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_fr.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fr.txt" format="snowball" /> <filter class="solr.FrenchLightStemFilterFactory"/> </analyzer> </fieldType> On Thu, Dec 26, 2019 at 02:16:32PM -0500, Erick Erickson wrote: > This simply cannot be true unless the destination copyField is indexed=false, > docValues=false stored=false. I.e. “some circumstances” means there’s really > no use in using the copyField in the first place. I suppose that if you don’t > store any term vectors, no position information nothing except, say, the > terms then maybe you’ll have extremely minimal size. But even in that case, > I’d use the original field in an “fq” clause which doesn’t use any scoring in > place of using the copyField. > > Each field is stored in a separate part of the relevant files (.tim, .pos, > etc). Term frequencies are kept on a _per field_ basis for instance. > > So this pretty much has to be small sample size or other measurement error. > > Best, > Erick > > > On Dec 26, 2019, at 9:27 AM, Nicolas Paris <nicolas.pa...@riseup.net> wrote: > > > > Anyway, that´s good news copy field does not increase indexe size in > > some circumstance: > > - the copied fields and the target field share the same datatype > > - the target field is not stored > > > > this is tested on text fields > > > > > > On Wed, Dec 25, 2019 at 11:42:23AM +0100, Nicolas Paris wrote: > >> > >> On Wed, Dec 25, 2019 at 05:30:03AM -0500, Dave wrote: > >>> #2 you initially said you were talking about 1k documents. > >> > >> Hi Dave. Again, sorry for the confusion. This is 1k fields > >> (general_text), over 50M large documents copied into one _text_ field. > >> 4 shards, 40GB per shard in both case, with/without the _text_ field > >> > >>> > >>>> On Dec 25, 2019, at 3:07 AM, Nicolas Paris <nicolas.pa...@riseup.net> > >>>> wrote: > >>>> > >>>> > >>>>> > >>>>> If you are redoing the indexing after changing the schema and > >>>>> reloading/restarting, then you can ignore me. > >>>> > >>>> I am sorry to say that I have to ignore you. Indeed, my tests include > >>>> recreating the collection from scratch - with and without the copy > >>>> fields. > >>>> In both cases the index size is the same ! (while the _text_ field is > >>>> working correctly) > >>>> > >>>>> On Tue, Dec 24, 2019 at 05:32:09PM -0700, Shawn Heisey wrote: > >>>>>> On 12/24/2019 5:11 PM, Nicolas Paris wrote: > >>>>>> Do you mean "copy fields" is only an action of changing the schema ? > >>>>>> I was thinking it was adding a new field and eventually a new index to > >>>>>> the collection > >>>>> > >>>>> The copy that copyField does happens at index time. Reindexing is > >>>>> required > >>>>> after changing the schema, or nothing happens. > >>>>> > >>>>> If you are redoing the indexing after changing the schema and > >>>>> reloading/restarting, then you can ignore me. > >>>>> > >>>>> Thanks, > >>>>> Shawn > >>>>> > >>>> > >>>> -- > >>>> nicolas > >>> > >> > >> -- > >> nicolas > >> > > > > -- > > nicolas > -- nicolas