Thanks Nick for your reply and taking time on this. One quick question before you lost on below email. In release 0.6.1 we have fix for below bug right?
> BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 : input 47 too high > S_fibonacci at core/Lucy/Index/IndexManager.c line 129 Thanks, Rajiv g -----Original Message----- From: Nick Wellnhofer [mailto:[email protected]] Sent: Saturday, December 17, 2016 2:52 AM To: [email protected] Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119 On 13/12/2016 18:05, Gupta, Rajiv wrote: > After I create directory by myself I'm getting this error: Which directory do you try to create? I wouldn't try to make manual changes inside Lucy's index directory. This will only make things worse. $indexer = Lucy::Index::Indexer->new( index => $saveindexlocation, schema => $schema, manager => Lucy::Index::IndexManager->new(host=>$self->{_hostname}), create => $dir_create_flag, truncate => 0, ); The "create" flag initially set to 1 so that $saveindexlocation can get created after I got the error I make sure directory is created and made create flag always 0. > Can't open > '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1/seg_fd/lexicon-7.ixix': > Invalid argument > 20161211 182109 [] * LUCY_FSFolder_Local_Open_FileHandle_IMP at > core/Lucy/Store/FSFolder.c line 118 > 20161211 182109 [] * LUCY_Folder_Local_Open_In_IMP at > core/Lucy/Store/Folder.c line 101 > 20161211 182109 [] * LUCY_Folder_Open_In_IMP at core/Lucy/Store/Folder.c > line 75 > > There are two more failures they also failed due so similar reasons > > rename from > '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode > _1of1/.lucyindex/1/schema.temp' to > '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode > _1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory > > Can't delete 'lexicon-3.ix' > > I believe all three are related to race condition while doing parallel > indexing and should go away with retries. However my retries started failing > with different error which is strange to me as if directory already exists > shouldn't it skip from create attempt. > > 20161211 182109 [] * FAIL: [FAILED]: Retrying to add doc at path: > /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1 > : Couldn't create directory > '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1': > No such file or directory > 20161211 182109 [] * LUCY_FSFolder_Initialize_IMP at > core/Lucy/Store/FSFolder.c line 102 > > So my all retry attempts were also failed. These errors still look like multiple processes are modifying the index at the same time. Are you really sure that every indexer is created with an IndexManager and that every IndexManager is created with a `host` argument that is unique to each machine? Rajiv>>>All parallel processes are child process of one process and running from the same host. Would you think giving host name uniqueness with some random number would help for multiple processes. All these errors mean that there's something fundamentally wrong with your code or that you hit a bug in Lucy. The only type of error where it makes sense to retry is LockErr. All other errors are mostly fatal and could result in index corruption. Retrying will only mask an underlying problem in this case. Unfortunately, I'm unable to help unless you provide some kind of self-contained, reproducible test case. I'm aware that this isn't easy, especially with multiple clients writing to a shared volume. As I already hinted at, you might want to reconsider your architecture and use some kind of search server that uses an index on a local filesystem. There are ready-made platforms on top of Lucy like Dezi, but it isn't too hard to roll your own solution. This should result in better performance and makes behavior of your code more predictable. Rajiv>>> Going to local file system is not possible for my case. This is a test framework that generate lot of logs and I'm doing indexing per test runs and all these logs needs to be on shared volume for other triaging purpose. The next thing I'm going to try is create a watcher per directory and index all files under that directory serially. Currently I'm creating watchers on all the files and some time multiple files in the same directory may try to get indexed at the same time. And as you stated this might be the issue. I'm not sure how it will perform with the current time limits. Creating Indexer manager adding overhead to the search process. Nick
