Re: Injecting synonymns into Solr

Zheng Lin Edwin Yeo Sun, 03 May 2015 23:08:06 -0700

Would like to check, will this method of splitting the synonyms into
multiple files use up a lot of memory?


I'm trying it with about 10 files and that collection is not able to be
loaded due to insufficient memory.

Although currently my machine only have 4GB of memory, but I only have
500,000 records indexed, so not sure if there's a significant impact in the
future (even with larger memory) when my index grows and other things like
faceting, highlighting, and carrot tools are implemented.

Regards,
Edwin



On 1 May 2015 at 11:08, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote:

> Thank you for the info. Yup this works. I found out that we can't load
> files that are more than 1MB into zookeeper, as it happens to any files
> that's larger than 1MB in size, not just the synonyms files.
> But I'm not sure if there will be an impact to the system, as the number
> of synonym text file can potentially grow up to more than 20 since my
> sample synonym file size is more than 20MB.
>
> Currently I only have less than 500,000 records indexed in Solr, so not
> sure if there will be a significant impact as compared to one which has
> millions of records.
> Will try to get more records indexed and will update here again.
>
> Regards,
> Edwin
>
>
> On 1 May 2015 at 08:17, Philippe Soares <soa...@genomequest.com> wrote:
>
>> Split your synonyms into multiple files and set the SynonymFilterFactory
>> with a coma-separated list of files. e.g. :
>> synonyms="syn1.txt,syn2.txt,syn3.txt"
>>
>> On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com>
>> wrote:
>>
>> > Just to populate it with the general synonym words. I've managed to
>> > populate it with some source online, but is there a limit to what it can
>> > contains?
>> >
>> > I can't load the configuration into zookeeper if the synonyms.txt file
>> > contains more than 2100 lines.
>> >
>> > Regards,
>> > Edwin
>> > On 1 May 2015 05:44, "Chris Hostetter" <hossman_luc...@fucit.org>
>> wrote:
>> >
>> > >
>> > > : There is a possible solution here:
>> > > : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to
>> > SOLR
>> > > : Synonym format).
>> > >
>> > > If you have WordNet synonyms you do't need any special code/tools to
>> > > convert them -- the current solr.SynonymFilterFactory supports wordnet
>> > > files (just specify format="wordnet")
>> > >
>> > >
>> > > : > > Does anyone knows any faster method of populating the
>> synonyms.txt
>> > > file
>> > > : > > instead of manually typing in the words into the file, which
>> there
>> > > could
>> > > : > be
>> > > : > > thousands of synonyms around?
>> > >
>> > > populate from what?  what is hte source of your data?
>> > >
>> > > the default solr synonym file format is about as simple as it could
>> > > possibly be -- pretty trivial to generate it from scripts -- the hard
>> > part
>> > > is usually selecting the synonym data you want to use and parsing
>> > whatever
>> > > format it is already in.
>> > >
>> > >
>> > >
>> > > -Hoss
>> > > http://www.lucidworks.com/
>> > >
>> >
>>
>
>

Re: Injecting synonymns into Solr

Reply via email to