Re: Solr Replication is not Possible on RAMDirectory?
Erik Hatcher-4 wrote There's an open issue (with a patch!) that enables this, it seems: lt;https://issues.apache.org/jira/browse/SOLR-3911gt; Erik well patch seems not doing that... i have tried and still getting some error lines about the dir types - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018670.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Replication is not Possible on RAMDirectory?
There's an open issue (with a patch!) that enables this, it seems: https://issues.apache.org/jira/browse/SOLR-3911 Erik On Nov 5, 2012, at 07:41 , deniz wrote: Michael Della Bitta-2 wrote No, RAMDirectory doesn't work for replication. Use MMapDirectory... it ends up storing the index in RAM and more efficiently so, plus it's backed by disk. Just be sure to not set a big heap because MMapDirectory works outside of heap. for my tests, i dont think index is ended up in ram with mmap... i gave 4gigs for heap while using mmap and got mapping error while indexing... while index should be something around 2 gigs, ram consumption was around 300mbs... Can anyone explain why RAMDirectory cant be used for replication? I cant see why the master is set for using RAMDirectory and replica is using MMap or some other? As far as I understand SolrCloud is some kinda pushing from master to replica/slave... so why it is not possible to push from RAM to HDD? If my logic is wrong, someone can please explain me all these? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018198.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Replication is not Possible on RAMDirectory?
Erik Hatcher-4 wrote There's an open issue (with a patch!) that enables this, it seems: lt;https://issues.apache.org/jira/browse/SOLR-3911gt; i will check it for sure, thank you Erik :) Shawn Heisey-4 wrote ... transparently mapping the files on disk to a virtual memory space and using excess RAM to cache that data and make it fast. If you have enough extra memory (disk cache) to fit the entire index, the OS will never have to read any part of the index from disk more than once so for disk cache, are there any disks with 1 gigs or more of caches? if am not wrong there are mostly 16 or 32mb cache disks around (or i am checking the wrong stuff? ) if so, that amount definitely too small... - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018396.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Replication is not Possible on RAMDirectory?
Shawn Heisey-4 wrote ... transparently mapping the files on disk to a virtual memory space and using excess RAM to cache that data and make it fast. If you have enough extra memory (disk cache) to fit the entire index, the OS will never have to read any part of the index from disk more than once so for disk cache, are there any disks with 1 gigs or more of caches? if am not wrong there are mostly 16 or 32mb cache disks around (or i am checking the wrong stuff? ) if so, that amount definitely too small... I am not talking about the cache on the actual disk drive, or even cache on your hard drive controller. I am talking about the operating system using RAM, specifically RAM not being used by programs, to cache data on your hard drive. All modern operating systems do it, even the one made in Redmond that people love to hate. If you have 16 GB of RAM and all your programs use up 4.5 GB, you can count on the OS using at least another half GB, so you have about 11 GB left. The OS is going to put data that it reads and writes to/from your disk in this space. If you start up another program that wants 2GB, the OS will simply throw away 2 GB of data in its cache (it's still on the disk, after all) and give that RAM to the new program. Solr counts on this OS capability for good performance. Thanks, Shawn
Re: Solr Replication is not Possible on RAMDirectory?
Here's some reading: http://en.wikipedia.org/wiki/Page_cache Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Nov 5, 2012 at 8:02 PM, deniz denizdurmu...@gmail.com wrote: Erik Hatcher-4 wrote There's an open issue (with a patch!) that enables this, it seems: lt;https://issues.apache.org/jira/browse/SOLR-3911gt; i will check it for sure, thank you Erik :) Shawn Heisey-4 wrote ... transparently mapping the files on disk to a virtual memory space and using excess RAM to cache that data and make it fast. If you have enough extra memory (disk cache) to fit the entire index, the OS will never have to read any part of the index from disk more than once so for disk cache, are there any disks with 1 gigs or more of caches? if am not wrong there are mostly 16 or 32mb cache disks around (or i am checking the wrong stuff? ) if so, that amount definitely too small... - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018396.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Replication is not Possible on RAMDirectory?
Michael Della Bitta-2 wrote No, RAMDirectory doesn't work for replication. Use MMapDirectory... it ends up storing the index in RAM and more efficiently so, plus it's backed by disk. Just be sure to not set a big heap because MMapDirectory works outside of heap. for my tests, i dont think index is ended up in ram with mmap... i gave 4gigs for heap while using mmap and got mapping error while indexing... while index should be something around 2 gigs, ram consumption was around 300mbs... Can anyone explain why RAMDirectory cant be used for replication? I cant see why the master is set for using RAMDirectory and replica is using MMap or some other? As far as I understand SolrCloud is some kinda pushing from master to replica/slave... so why it is not possible to push from RAM to HDD? If my logic is wrong, someone can please explain me all these? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018198.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Replication is not Possible on RAMDirectory?
On 11/4/2012 11:41 PM, deniz wrote: Michael Della Bitta-2 wrote No, RAMDirectory doesn't work for replication. Use MMapDirectory... it ends up storing the index in RAM and more efficiently so, plus it's backed by disk. Just be sure to not set a big heap because MMapDirectory works outside of heap. for my tests, i dont think index is ended up in ram with mmap... i gave 4gigs for heap while using mmap and got mapping error while indexing... while index should be something around 2 gigs, ram consumption was around 300mbs... With mmap, the ram is not actually consumed by your application, which in this case is Java. The operating system is handling it -- transparently mapping the files on disk to a virtual memory space and using excess RAM to cache that data and make it fast. If you have enough extra memory (disk cache) to fit the entire index, the OS will never have to read any part of the index from disk more than once. With RAMDirectory, the index has to go into the Java heap, which is much less efficient at memory management than the native operating system. Can anyone explain why RAMDirectory cant be used for replication? I cant see why the master is set for using RAMDirectory and replica is using MMap or some other? As far as I understand SolrCloud is some kinda pushing from master to replica/slave... so why it is not possible to push from RAM to HDD? If my logic is wrong, someone can please explain me all these? With RAMDirectory, there are no files to copy. Replication does not copy Solr (Lucene) documents, it copies files. Thanks, Shawn
Re: Solr Replication is not Possible on RAMDirectory?
so it is not possible to use RAMdirectory for replication? No, RAMDirectory doesn't work for replication. Use MMapDirectory... it ends up storing the index in RAM and more efficiently so, plus it's backed by disk. Just be sure to not set a big heap because MMapDirectory works outside of heap. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Fri, Nov 2, 2012 at 4:44 AM, deniz denizdurmu...@gmail.com wrote: so it is not possible to use RAMdirectory for replication?
Solr Replication is not Possible on RAMDirectory?
Hi all, I am trying to set up a master/slave system, by following this page : http://wiki.apache.org/solr/SolrReplication I was able to set up and did some experiments with that, but when i try to set the index for RAMDirectory, i got errors for indexing. While master and slave are both using a non-RAM directory, everything is okay... but when i try to use RAMdirectory on both I got this error below: 16:40:31.626 [qtp28208563-24] ERROR org.apache.solr.core.SolrCore - org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@7e693f lockFactory=org.apache.lucene.store.NativeFSLockFactory@92c787: files: [] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:639) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:75) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:62) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:191) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:77) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:511) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1016) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Unknown Source) 16:40:31.627 [qtp28208563-24] ERROR o.a.solr.servlet.SolrDispatchFilter - null:org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@7e693f
Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)
You may well have already seen this, but in case not: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html FWIW, Erick On Wed, Oct 24, 2012 at 9:51 PM, Shawn Heisey s...@elyograg.org wrote: On 10/24/2012 6:29 PM, Aaron Daubman wrote: Let me be clear that that I am not interested in RAMDirectory. However, I would like to better understand the oft-recommended and currently-default MMapDirectory, and what the tradeoffs would be, when using a 64-bit linux server dedicated to this single solr instance, with plenty (more than 2x index size) of RAM, of storing the index files on SSDs versus on a ramfs mount. I understand that using the default MMapDirectory will allow caching of the index in-memory, however, my understanding is that mmaped files are demand-paged (lazy evaluated), meaning that only after a block is read from disk will it be paged into memory - is this correct? is it actually block-by-block (page size by page size?) - any pointers to decent documentation on this regardless of the effectiveness of the approach would be appreciated... You are correct that the data must have just been accessed to be in the disk cache.This does however include writes -- so any data that gets indexed will be in the cache because it has just been written. I do believe that it is read in one page block at a time, and I believe that the blocks are 4k in size. My concern with using MMapDirectory for an index stored on disk (even SSDs), if my understanding is correct, is that there is still a large startup cost to MMapDirectory, as it may take many queries before even most of a 20G index has been loaded into memory, and there may yet still be dark corners that only come up in edge-case queries that cause QTime spikes should these queries ever occur. I would like to ensure that, at startup, no query will incur disk-seek/read penalties. Is the right way to achieve this to copy the index to a ramfs (NOT ramdisk) mount and then continue to use MMapDirectory in Solr to read the index? I am under the impression that when using ramfs (rather than ramdisk, for which this would not work) a file mmaped on a ramfs mount will actually share the same address space, and so would not incur the typical double-ram overhead of mmaping a file in memory just o have yet another copy of the file created in a second memory location. Is this correct? If not, would you please point me to documentation stating otherwise (I haven't found much documentation either way). I am not familiar with any double-ram overhead from using mmap. It should be extroardinarily efficient, so much so that even when your index won't fit in RAM, performance is typically still excellent. Using an SSD instead of a spinning disk will increase performance across the board, until enough of the index is cached in RAM, after which it won't make a lot of difference. My parting thoughts, with a general note to the masses: Do not try this if you are not absolutely sure your index will fit in memory! It will tend to cause WAY more problems than it will solve for most people with large indexes. If you actually do have considerably more RAM than your index size, and you know that the index will never grow to where it might not fit, you can use a simple trick to get it all cached, even before running queries. Just read the entire contents of the index, discarding everything you read. There are two main OS variants to consider here, and both can be scripted, as noted below. Run the command twice to see the difference that caching makes for the second run. Note that an SSD would speed the first run of these commands up considerably: *NIX (may work on a mac too): cat /path/to/index/files/* /dev/null Windows: type C:\Path\To\Index\Files\* NUL Thanks, Shawn
Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)
Aaron The best way to make sure the index is cached by the OS is to just cat it on startup: cat `find /path/to/solr/index` /dev/null Just make sure your index is smaller than RAM otherwise data will be rotated out. Memory mapping is built on the virtual memory system, and I suspect that ramfs is too, so I doubt very much that copying your index to ramfs will help at all. Sidebar - a while ago I did a bunch of testing copying indices to shared memory (/dev/shm in this case) and there was no advantage compared to just accessing indices on disc when using memory mapping once the system got to a steady state. There has been a lot written about this topic on the list. Basically it come down to using MMapDirectory (which is the default), make sure your index is smaller than your RAM, and allocate just enough memory to the Java VM. That last part requires some benchmarking because it is so workload dependent. Best regards François On Oct 24, 2012, at 8:29 PM, Aaron Daubman daub...@gmail.com wrote: Greetings, Most times I've seen the topic of storing one's index in memory, it seems the asker was referring (or understood to be referring) to the (in)famous not intended to work with huge indexes Solr RAMDirectory. Let me be clear that that I am not interested in RAMDirectory. However, I would like to better understand the oft-recommended and currently-default MMapDirectory, and what the tradeoffs would be, when using a 64-bit linux server dedicated to this single solr instance, with plenty (more than 2x index size) of RAM, of storing the index files on SSDs versus on a ramfs mount. I understand that using the default MMapDirectory will allow caching of the index in-memory, however, my understanding is that mmaped files are demand-paged (lazy evaluated), meaning that only after a block is read from disk will it be paged into memory - is this correct? is it actually block-by-block (page size by page size?) - any pointers to decent documentation on this regardless of the effectiveness of the approach would be appreciated... My concern with using MMapDirectory for an index stored on disk (even SSDs), if my understanding is correct, is that there is still a large startup cost to MMapDirectory, as it may take many queries before even most of a 20G index has been loaded into memory, and there may yet still be dark corners that only come up in edge-case queries that cause QTime spikes should these queries ever occur. I would like to ensure that, at startup, no query will incur disk-seek/read penalties. Is the right way to achieve this to copy the index to a ramfs (NOT ramdisk) mount and then continue to use MMapDirectory in Solr to read the index? I am under the impression that when using ramfs (rather than ramdisk, for which this would not work) a file mmaped on a ramfs mount will actually share the same address space, and so would not incur the typical double-ram overhead of mmaping a file in memory just o have yet another copy of the file created in a second memory location. Is this correct? If not, would you please point me to documentation stating otherwise (I haven't found much documentation either way). Finally, given the desire to be quick at startup with a large index that will still easily fit within a system's memory, am I thinking about this wrong or are there other better approaches? Thanks, as always, Aaron
Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)
Was going to say the same thing. It's also usually a good idea to reduce paging (eg 0 swappiness in linux). - Mark On Oct 24, 2012, at 9:36 PM, François Schiettecatte fschietteca...@gmail.com wrote: Aaron The best way to make sure the index is cached by the OS is to just cat it on startup: cat `find /path/to/solr/index` /dev/null Just make sure your index is smaller than RAM otherwise data will be rotated out. Memory mapping is built on the virtual memory system, and I suspect that ramfs is too, so I doubt very much that copying your index to ramfs will help at all. Sidebar - a while ago I did a bunch of testing copying indices to shared memory (/dev/shm in this case) and there was no advantage compared to just accessing indices on disc when using memory mapping once the system got to a steady state. There has been a lot written about this topic on the list. Basically it come down to using MMapDirectory (which is the default), make sure your index is smaller than your RAM, and allocate just enough memory to the Java VM. That last part requires some benchmarking because it is so workload dependent. Best regards François On Oct 24, 2012, at 8:29 PM, Aaron Daubman daub...@gmail.com wrote: Greetings, Most times I've seen the topic of storing one's index in memory, it seems the asker was referring (or understood to be referring) to the (in)famous not intended to work with huge indexes Solr RAMDirectory. Let me be clear that that I am not interested in RAMDirectory. However, I would like to better understand the oft-recommended and currently-default MMapDirectory, and what the tradeoffs would be, when using a 64-bit linux server dedicated to this single solr instance, with plenty (more than 2x index size) of RAM, of storing the index files on SSDs versus on a ramfs mount. I understand that using the default MMapDirectory will allow caching of the index in-memory, however, my understanding is that mmaped files are demand-paged (lazy evaluated), meaning that only after a block is read from disk will it be paged into memory - is this correct? is it actually block-by-block (page size by page size?) - any pointers to decent documentation on this regardless of the effectiveness of the approach would be appreciated... My concern with using MMapDirectory for an index stored on disk (even SSDs), if my understanding is correct, is that there is still a large startup cost to MMapDirectory, as it may take many queries before even most of a 20G index has been loaded into memory, and there may yet still be dark corners that only come up in edge-case queries that cause QTime spikes should these queries ever occur. I would like to ensure that, at startup, no query will incur disk-seek/read penalties. Is the right way to achieve this to copy the index to a ramfs (NOT ramdisk) mount and then continue to use MMapDirectory in Solr to read the index? I am under the impression that when using ramfs (rather than ramdisk, for which this would not work) a file mmaped on a ramfs mount will actually share the same address space, and so would not incur the typical double-ram overhead of mmaping a file in memory just o have yet another copy of the file created in a second memory location. Is this correct? If not, would you please point me to documentation stating otherwise (I haven't found much documentation either way). Finally, given the desire to be quick at startup with a large index that will still easily fit within a system's memory, am I thinking about this wrong or are there other better approaches? Thanks, as always, Aaron
Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)
On 10/24/2012 6:29 PM, Aaron Daubman wrote: Let me be clear that that I am not interested in RAMDirectory. However, I would like to better understand the oft-recommended and currently-default MMapDirectory, and what the tradeoffs would be, when using a 64-bit linux server dedicated to this single solr instance, with plenty (more than 2x index size) of RAM, of storing the index files on SSDs versus on a ramfs mount. I understand that using the default MMapDirectory will allow caching of the index in-memory, however, my understanding is that mmaped files are demand-paged (lazy evaluated), meaning that only after a block is read from disk will it be paged into memory - is this correct? is it actually block-by-block (page size by page size?) - any pointers to decent documentation on this regardless of the effectiveness of the approach would be appreciated... You are correct that the data must have just been accessed to be in the disk cache.This does however include writes -- so any data that gets indexed will be in the cache because it has just been written. I do believe that it is read in one page block at a time, and I believe that the blocks are 4k in size. My concern with using MMapDirectory for an index stored on disk (even SSDs), if my understanding is correct, is that there is still a large startup cost to MMapDirectory, as it may take many queries before even most of a 20G index has been loaded into memory, and there may yet still be dark corners that only come up in edge-case queries that cause QTime spikes should these queries ever occur. I would like to ensure that, at startup, no query will incur disk-seek/read penalties. Is the right way to achieve this to copy the index to a ramfs (NOT ramdisk) mount and then continue to use MMapDirectory in Solr to read the index? I am under the impression that when using ramfs (rather than ramdisk, for which this would not work) a file mmaped on a ramfs mount will actually share the same address space, and so would not incur the typical double-ram overhead of mmaping a file in memory just o have yet another copy of the file created in a second memory location. Is this correct? If not, would you please point me to documentation stating otherwise (I haven't found much documentation either way). I am not familiar with any double-ram overhead from using mmap. It should be extroardinarily efficient, so much so that even when your index won't fit in RAM, performance is typically still excellent. Using an SSD instead of a spinning disk will increase performance across the board, until enough of the index is cached in RAM, after which it won't make a lot of difference. My parting thoughts, with a general note to the masses: Do not try this if you are not absolutely sure your index will fit in memory! It will tend to cause WAY more problems than it will solve for most people with large indexes. If you actually do have considerably more RAM than your index size, and you know that the index will never grow to where it might not fit, you can use a simple trick to get it all cached, even before running queries. Just read the entire contents of the index, discarding everything you read. There are two main OS variants to consider here, and both can be scripted, as noted below. Run the command twice to see the difference that caching makes for the second run. Note that an SSD would speed the first run of these commands up considerably: *NIX (may work on a mac too): cat /path/to/index/files/* /dev/null Windows: type C:\Path\To\Index\Files\* NUL Thanks, Shawn
RAMDirectory - still stores some docs on disk?
Hello I am using RAMDirectory for running some experiments and came up with a weird (well, for me) situation. basically after indexing on RAM. i have killed the JVM and then restarted it after some time. I can still see some documents as indexed and searchable. I had indexed more than 2M docs before shutting down and after restart there were around 15K docs still in the index. So how could this happen? there is some caching mechanism which backups some amount(?) of the total index or it is directly written on disk? (if so, how? ) - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/RAMDirectory-still-stores-some-docs-on-disk-tp4015022.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 3.1 core with RAMDirectory isn't reloaded
Hello, We just tried core reloading on a freshly installed Solr 3.1.0 with RamDirectoryFactory. It doesn't seem to happen. With the FSDirectoryFactory everything works fine. Looks like the RamDirectoryFactory implementation caches directory and if it's available it doesn't really reopen it thus not having updated index loaded into memory. Can anyone comment on this? Should we implement our own RamDirectoryFactory? Here is the code snippet from Solr 3.1.0. It looks a bit confusing. public Directory open(String path) throws IOException { synchronized (RAMDirectoryFactory.class) { RefCntRamDirectory directory = directories.get(path); if (directory == null || !directory.isOpen()) { directory = (RefCntRamDirectory) openNew(path); directories.put(path, directory); } else { directory.incRef(); } return directory; } } Shouldn't the directory reload the data whenever it gets an opening request (because incRef doesn't really do much except for reference count increment)? We expected it to reload the data (or at least check the data on disk has been updated) even if the path to the filesystem directory is the same. Regards, Dmitry -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-1-core-with-RAMDirectory-isn-t-reloaded-tp2821875p2821875.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Ramdirectory
I have used this without issue. In the example solrconfig.xml replace this line: directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ with this one: directoryFactory name=DirectoryFactory class=solr.RAMDirectoryFactory/ Thanks, Matt Weber On Thu, Feb 24, 2011 at 7:47 PM, Bill Bell billnb...@gmail.com wrote: Thanks - yeah that is why I asked how to use it. But I still don't know how to use it. https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/ RAMDirectoryFactory.html https://issues.apache.org/jira/browse/SOLR-465 directoryProvider class=org.apache.lucene.store.RAMDirectory !-- Parameters as required by the implementation -- /directoryProvider Is that right? Examples? Options? Where do I put that in solrconfig.xml ? Do I put it in mainIndex/directoryProvider ? I know that SOLR-465 is more generic, but https://issues.apache.org/jira/browse/SOLR-480 seems easier to use. Thanks. On 2/24/11 6:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I could not figure out how to setup the ramdirectory option in solrconfig.XML. Does anyone have an example for 1.4? it wasn't an option in 1.4. as Koji had already mentioned in the other thread where you chimed in and asked about this, it was added in the 3x branch... http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td25671 66.html -Hoss -- Thanks, Matt Weber
Ramdirectory
I could not figure out how to setup the ramdirectory option in solrconfig.XML. Does anyone have an example for 1.4? Bill Bell Sent from mobile
Re: Ramdirectory
: I could not figure out how to setup the ramdirectory option in solrconfig.XML. Does anyone have an example for 1.4? it wasn't an option in 1.4. as Koji had already mentioned in the other thread where you chimed in and asked about this, it was added in the 3x branch... http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td2567166.html -Hoss
Re: Ramdirectory
Thanks - yeah that is why I asked how to use it. But I still don't know how to use it. https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/ RAMDirectoryFactory.html https://issues.apache.org/jira/browse/SOLR-465 directoryProvider class=org.apache.lucene.store.RAMDirectory !-- Parameters as required by the implementation -- /directoryProvider Is that right? Examples? Options? Where do I put that in solrconfig.xml ? Do I put it in mainIndex/directoryProvider ? I know that SOLR-465 is more generic, but https://issues.apache.org/jira/browse/SOLR-480 seems easier to use. Thanks. On 2/24/11 6:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I could not figure out how to setup the ramdirectory option in solrconfig.XML. Does anyone have an example for 1.4? it wasn't an option in 1.4. as Koji had already mentioned in the other thread where you chimed in and asked about this, it was added in the 3x branch... http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td25671 66.html -Hoss
Re: Problems with RAMDirectory in Solr
: Hello. We just recently started using a RAMDirectory with Solr and found a : problem. When we restart the Solr, the RAMDirectory is refreshed as : expected. Hwoever when we use the snapinstaller script to update the index, : the RAMDirectory is not updated. Is there any way to update the RAMDirectory : after every commit? Thanks. The snapinstaller script are garunteed to never work with a RAMDirectory based Solr setup -- all those scripts do is manage some hardlinks on the filesystem and trigger a commit -- nothing about them will copy from a FSDirectory to a RAMDirectory. Using the newer Java based replication *might* work, but i'm not sure about that -- i think it probably only works with the StandardDirectoryProvider, but it could possibly be made to work (if it doesn't already) by writing code to bulk copy the files from an FSDirectory opened on disk to the RAMDirectory. Honestly though: this seems like it defeats the point of using a RAMDirectory (totally transient memory only indexes). If your goal is to have an index read from disk, that is then kept entirely in memory, why not just leave some memory available for the OS's file system cache and let it do it's job? (add a few warming queries to ensure the index pages are read into ram, and you should be in business) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Configuring Solr to use RAMDirectory
On Thu, Dec 31, 2009 at 3:36 PM, dipti khullar dipti.khul...@gmail.comwrote: Hi Can somebody let me know if its possible to configure RAMDirectory from solrconfig.xml. Although its clearly mentioned in https://issues.apache.org/jira/browse/SOLR-465 by Mark that he has worked upon it, but still I couldn't find any such property in config file in Solr 1.4 latest download. May be I am overlooking some simple property. Any help would be appreciated. Note that there are things like replication which will not work if you are using a RAMDirectory. -- Regards, Shalin Shekhar Mangar.
Re: Configuring Solr to use RAMDirectory
Hi Dipti, Just out of curiosity, are you trying to use RAMDirectory for improvement in speed? I tried doing that and did not see any significant improvement. Would be nice to know what your experiment shows. - Raghu On Thu, Dec 31, 2009 at 4:17 PM, Erik Hatcher erik.hatc...@gmail.comwrote: It's possible, but requires a custom DirectoryFactory implementation. There isn't a built in factory to construct a RAMDirectory. You wire it into solrconfig.xml this way: directoryFactory name=DirectoryFactory class=[fully.qualified.classname] !-- Parameters as required by the implementation -- /directoryFactory On Dec 31, 2009, at 5:06 AM, dipti khullar wrote: Hi Can somebody let me know if its possible to configure RAMDirectory from solrconfig.xml. Although its clearly mentioned in https://issues.apache.org/jira/browse/SOLR-465 by Mark that he has worked upon it, but still I couldn't find any such property in config file in Solr 1.4 latest download. May be I am overlooking some simple property. Any help would be appreciated. Thanks Dipti On Fri, Nov 20, 2009 at 2:27 PM, Andrey Klochkov akloch...@griddynamics.com wrote: I thought that SOLR-465 just does what is asked, i.e. one can use any Directory implementation including RAMDirectory. Thomas, take a look at it. On Thu, Nov 12, 2009 at 7:55 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I think not out of the box, but look at SOLR-243 issue in JIRA. You could also put your index on ram disk (tmpfs), but it would be useless for writing to it. Note that when people ask about loading the whole index in memory explicitly, it's often a premature optimization attempt. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Thomas Nguyen thngu...@ign.com To: solr-user@lucene.apache.org Sent: Wed, November 11, 2009 8:46:11 PM Subject: Configuring Solr to use RAMDirectory Is it possible to configure Solr to fully load indexes in memory? I wasn't able to find any documentation about this on either their site or in the Solr 1.4 Enterprise Search Server book. -- Andrew Klochkov Senior Software Engineer, Grid Dynamics
Re: Configuring Solr to use RAMDirectory
Hi Can somebody let me know if its possible to configure RAMDirectory from solrconfig.xml. Although its clearly mentioned in https://issues.apache.org/jira/browse/SOLR-465 by Mark that he has worked upon it, but still I couldn't find any such property in config file in Solr 1.4 latest download. May be I am overlooking some simple property. Any help would be appreciated. Thanks Dipti On Fri, Nov 20, 2009 at 2:27 PM, Andrey Klochkov akloch...@griddynamics.com wrote: I thought that SOLR-465 just does what is asked, i.e. one can use any Directory implementation including RAMDirectory. Thomas, take a look at it. On Thu, Nov 12, 2009 at 7:55 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I think not out of the box, but look at SOLR-243 issue in JIRA. You could also put your index on ram disk (tmpfs), but it would be useless for writing to it. Note that when people ask about loading the whole index in memory explicitly, it's often a premature optimization attempt. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Thomas Nguyen thngu...@ign.com To: solr-user@lucene.apache.org Sent: Wed, November 11, 2009 8:46:11 PM Subject: Configuring Solr to use RAMDirectory Is it possible to configure Solr to fully load indexes in memory? I wasn't able to find any documentation about this on either their site or in the Solr 1.4 Enterprise Search Server book. -- Andrew Klochkov Senior Software Engineer, Grid Dynamics
Re: Configuring Solr to use RAMDirectory
It's possible, but requires a custom DirectoryFactory implementation. There isn't a built in factory to construct a RAMDirectory. You wire it into solrconfig.xml this way: directoryFactory name=DirectoryFactory class=[fully.qualified.classname] !-- Parameters as required by the implementation -- /directoryFactory On Dec 31, 2009, at 5:06 AM, dipti khullar wrote: Hi Can somebody let me know if its possible to configure RAMDirectory from solrconfig.xml. Although its clearly mentioned in https://issues.apache.org/jira/browse/SOLR-465 by Mark that he has worked upon it, but still I couldn't find any such property in config file in Solr 1.4 latest download. May be I am overlooking some simple property. Any help would be appreciated. Thanks Dipti On Fri, Nov 20, 2009 at 2:27 PM, Andrey Klochkov akloch...@griddynamics.com wrote: I thought that SOLR-465 just does what is asked, i.e. one can use any Directory implementation including RAMDirectory. Thomas, take a look at it. On Thu, Nov 12, 2009 at 7:55 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: I think not out of the box, but look at SOLR-243 issue in JIRA. You could also put your index on ram disk (tmpfs), but it would be useless for writing to it. Note that when people ask about loading the whole index in memory explicitly, it's often a premature optimization attempt. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Thomas Nguyen thngu...@ign.com To: solr-user@lucene.apache.org Sent: Wed, November 11, 2009 8:46:11 PM Subject: Configuring Solr to use RAMDirectory Is it possible to configure Solr to fully load indexes in memory? I wasn't able to find any documentation about this on either their site or in the Solr 1.4 Enterprise Search Server book. -- Andrew Klochkov Senior Software Engineer, Grid Dynamics
Configuring Solr to use RAMDirectory
Is it possible to configure Solr to fully load indexes in memory? I wasn't able to find any documentation about this on either their site or in the Solr 1.4 Enterprise Search Server book.
Re: Configuring Solr to use RAMDirectory
I think not out of the box, but look at SOLR-243 issue in JIRA. You could also put your index on ram disk (tmpfs), but it would be useless for writing to it. Note that when people ask about loading the whole index in memory explicitly, it's often a premature optimization attempt. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Thomas Nguyen thngu...@ign.com To: solr-user@lucene.apache.org Sent: Wed, November 11, 2009 8:46:11 PM Subject: Configuring Solr to use RAMDirectory Is it possible to configure Solr to fully load indexes in memory? I wasn't able to find any documentation about this on either their site or in the Solr 1.4 Enterprise Search Server book.
Re: Does SOLR support RAMDirectory ?
What are you looking to do? Lucene inherently uses RAMDirectory under the covers during indexing, but not sure if that is your interest. -Grant On Jun 1, 2008, at 5:09 PM, s d wrote: Can i use RAMDirectory in SOLR?Thanks, S -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Does SOLR support RAMDirectory ?
Can i use RAMDirectory in SOLR?Thanks, S
Re: RAMDirectory
Hi, If you have enough RAM to load the whole index into RAM using RAMDirectory, then you could also just use tmpfs to load your index in RAM... tmpfs exists under Linux, Solaris, and BSD, I believe. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: s d [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, December 27, 2007 8:18:14 PM Subject: RAMDirectory Is there a way to use RAMDirectory with SOLR?If you can point me to documentation that would be great. Thanks, S
RAMDirectory
Is there a way to use RAMDirectory with SOLR?If you can point me to documentation that would be great. Thanks, S
RAMDirectory
HI, Does any know how to use RAM disk for index? Thanks, Jae Joo
RE: RAMDirectory
not yet implemented ,hopefully soon : http://jira.terracotta.org/jira/browse/CDV-399 Jeryl Cook /^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/ ..Act your age, and not your shoe size.. -Prince(1986) Date: Sat, 22 Sep 2007 15:33:58 -0400 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RAMDirectory HI, Does any know how to use RAM disk for index? Thanks, Jae Jo,