Re: Solr Replication is not Possible on RAMDirectory?

2012-11-06 Thread deniz
Erik Hatcher-4 wrote
 There's an open issue (with a patch!) that enables this, it seems:
 lt;https://issues.apache.org/jira/browse/SOLR-3911gt;
 
   Erik

well patch seems not doing that... i have tried and still getting some error
lines about the dir types




-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018670.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Replication is not Possible on RAMDirectory?

2012-11-05 Thread Erik Hatcher
There's an open issue (with a patch!) that enables this, it seems: 
https://issues.apache.org/jira/browse/SOLR-3911

Erik

On Nov 5, 2012, at 07:41 , deniz wrote:

 Michael Della Bitta-2 wrote
 No, RAMDirectory doesn't work for replication. Use MMapDirectory... it
 ends up storing the index in RAM and more efficiently so, plus it's
 backed by disk.
 
 Just be sure to not set a big heap because MMapDirectory works outside of
 heap.
 
 for my tests, i dont think index is ended up in ram with mmap... i gave
 4gigs for heap while using mmap and got mapping error while indexing...
 while index should be something around 2 gigs, ram consumption was around
 300mbs... 
 
 Can anyone explain why RAMDirectory cant be used for replication? I cant see
 why the master is set for using RAMDirectory and replica is using MMap or
 some other? As far as I understand SolrCloud is some kinda pushing from
 master to replica/slave... so why it is not possible to push from RAM to
 HDD? If my logic is wrong, someone can please explain me all these? 
 
 
 
 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018198.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Replication is not Possible on RAMDirectory?

2012-11-05 Thread deniz
Erik Hatcher-4 wrote
 There's an open issue (with a patch!) that enables this, it seems:
 lt;https://issues.apache.org/jira/browse/SOLR-3911gt;

 i will check it for sure, thank you Erik :) 


Shawn Heisey-4 wrote
 ... transparently mapping the files on disk to a virtual memory space and
 using excess RAM to cache that data and make it fast.  If you have
 enough extra memory (disk cache) to fit the entire index, the OS will
 never have to read any part of the index from disk more than once

so for disk cache, are there any disks with 1 gigs or more of caches? if am
not wrong there are mostly 16 or 32mb cache disks around (or i am checking
the wrong stuff? ) if so, that amount definitely too small... 





-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018396.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Replication is not Possible on RAMDirectory?

2012-11-05 Thread Shawn Heisey

 Shawn Heisey-4 wrote
 ... transparently mapping the files on disk to a virtual memory space
 and
 using excess RAM to cache that data and make it fast.  If you have
 enough extra memory (disk cache) to fit the entire index, the OS will
 never have to read any part of the index from disk more than once

 so for disk cache, are there any disks with 1 gigs or more of caches? if
 am
 not wrong there are mostly 16 or 32mb cache disks around (or i am checking
 the wrong stuff? ) if so, that amount definitely too small...


I am not talking about the cache on the actual disk drive, or even cache
on your hard drive controller. I am talking about the operating system
using RAM, specifically RAM not being used by programs, to cache data on
your hard drive. All modern operating systems do it, even the one made in
Redmond that people love to hate.

If you have 16 GB of RAM and all your programs use up 4.5 GB, you can
count on the OS using at least another half GB, so you have about 11 GB
left. The OS is going to put data that it reads and writes to/from your
disk in this space. If you start up another program that wants 2GB, the OS
will simply throw away 2 GB of data in its cache (it's still on the disk,
after all) and give that RAM to the new program.


Solr counts on this OS capability for good performance.

Thanks,
Shawn





Re: Solr Replication is not Possible on RAMDirectory?

2012-11-05 Thread Michael Della Bitta
Here's some reading:

http://en.wikipedia.org/wiki/Page_cache

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Nov 5, 2012 at 8:02 PM, deniz denizdurmu...@gmail.com wrote:
 Erik Hatcher-4 wrote
 There's an open issue (with a patch!) that enables this, it seems:
 lt;https://issues.apache.org/jira/browse/SOLR-3911gt;

  i will check it for sure, thank you Erik :)


 Shawn Heisey-4 wrote
 ... transparently mapping the files on disk to a virtual memory space and
 using excess RAM to cache that data and make it fast.  If you have
 enough extra memory (disk cache) to fit the entire index, the OS will
 never have to read any part of the index from disk more than once

 so for disk cache, are there any disks with 1 gigs or more of caches? if am
 not wrong there are mostly 16 or 32mb cache disks around (or i am checking
 the wrong stuff? ) if so, that amount definitely too small...





 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018396.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Replication is not Possible on RAMDirectory?

2012-11-04 Thread deniz
Michael Della Bitta-2 wrote
 No, RAMDirectory doesn't work for replication. Use MMapDirectory... it
 ends up storing the index in RAM and more efficiently so, plus it's
 backed by disk.
 
 Just be sure to not set a big heap because MMapDirectory works outside of
 heap.

for my tests, i dont think index is ended up in ram with mmap... i gave
4gigs for heap while using mmap and got mapping error while indexing...
while index should be something around 2 gigs, ram consumption was around
300mbs... 

Can anyone explain why RAMDirectory cant be used for replication? I cant see
why the master is set for using RAMDirectory and replica is using MMap or
some other? As far as I understand SolrCloud is some kinda pushing from
master to replica/slave... so why it is not possible to push from RAM to
HDD? If my logic is wrong, someone can please explain me all these? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018198.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Replication is not Possible on RAMDirectory?

2012-11-04 Thread Shawn Heisey

On 11/4/2012 11:41 PM, deniz wrote:

Michael Della Bitta-2 wrote

No, RAMDirectory doesn't work for replication. Use MMapDirectory... it
ends up storing the index in RAM and more efficiently so, plus it's
backed by disk.

Just be sure to not set a big heap because MMapDirectory works outside of
heap.

for my tests, i dont think index is ended up in ram with mmap... i gave
4gigs for heap while using mmap and got mapping error while indexing...
while index should be something around 2 gigs, ram consumption was around
300mbs...


With mmap, the ram is not actually consumed by your application, which 
in this case is Java. The operating system is handling it -- 
transparently mapping the files on disk to a virtual memory space and 
using excess RAM to cache that data and make it fast.  If you have 
enough extra memory (disk cache) to fit the entire index, the OS will 
never have to read any part of the index from disk more than once.  With 
RAMDirectory, the index has to go into the Java heap, which is much less 
efficient at memory management than the native operating system.



Can anyone explain why RAMDirectory cant be used for replication? I cant see
why the master is set for using RAMDirectory and replica is using MMap or
some other? As far as I understand SolrCloud is some kinda pushing from
master to replica/slave... so why it is not possible to push from RAM to
HDD? If my logic is wrong, someone can please explain me all these?


With RAMDirectory, there are no files to copy.  Replication does not 
copy Solr (Lucene) documents, it copies files.


Thanks,
Shawn



Re: Solr Replication is not Possible on RAMDirectory?

2012-11-02 Thread Michael Della Bitta
 so it is not possible to use RAMdirectory for replication?

No, RAMDirectory doesn't work for replication. Use MMapDirectory... it
ends up storing the index in RAM and more efficiently so, plus it's
backed by disk.

Just be sure to not set a big heap because MMapDirectory works outside of heap.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Fri, Nov 2, 2012 at 4:44 AM, deniz denizdurmu...@gmail.com wrote:
 so it is not possible to use RAMdirectory for replication?


Solr Replication is not Possible on RAMDirectory?

2012-11-02 Thread deniz
Hi all, I am trying to set up a master/slave system, by following this page :
http://wiki.apache.org/solr/SolrReplication

I was able to set up and did some experiments with that, but when i try to
set the index for RAMDirectory, i got errors for indexing.

While master and slave are both using a non-RAM directory, everything is
okay... but when i try to use RAMdirectory on both I got this error below: 

16:40:31.626 [qtp28208563-24] ERROR org.apache.solr.core.SolrCore -
org.apache.lucene.index.IndexNotFoundException: no segments* file found in
org.apache.lucene.store.RAMDirectory@7e693f
lockFactory=org.apache.lucene.store.NativeFSLockFactory@92c787: files: []
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:639)
at 
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:75)
at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:62)
at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:191)
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:77)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:511)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:87)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1016)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
at java.lang.Thread.run(Unknown Source)

16:40:31.627 [qtp28208563-24] ERROR o.a.solr.servlet.SolrDispatchFilter -
null:org.apache.lucene.index.IndexNotFoundException: no segments* file found
in org.apache.lucene.store.RAMDirectory@7e693f

Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)

2012-10-25 Thread Erick Erickson
You may well have already seen this, but in case not:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

FWIW,
Erick

On Wed, Oct 24, 2012 at 9:51 PM, Shawn Heisey s...@elyograg.org wrote:
 On 10/24/2012 6:29 PM, Aaron Daubman wrote:

 Let me be clear that that I am not interested in RAMDirectory.
 However, I would like to better understand the oft-recommended and
 currently-default MMapDirectory, and what the tradeoffs would be, when
 using a 64-bit linux server dedicated to this single solr instance,
 with plenty (more than 2x index size) of RAM, of storing the index
 files on SSDs versus on a ramfs mount.

 I understand that using the default MMapDirectory will allow caching
 of the index in-memory, however, my understanding is that mmaped files
 are demand-paged (lazy evaluated), meaning that only after a block is
 read from disk will it be paged into memory - is this correct? is it
 actually block-by-block (page size by page size?) - any pointers to
 decent documentation on this regardless of the effectiveness of the
 approach would be appreciated...


 You are correct that the data must have just been accessed to be in the disk
 cache.This does however include writes -- so any data that gets indexed will
 be in the cache because it has just been written.  I do believe that it is
 read in one page block at a time, and I believe that the blocks are 4k in
 size.


 My concern with using MMapDirectory for an index stored on disk (even
 SSDs), if my understanding is correct, is that there is still a large
 startup cost to MMapDirectory, as it may take many queries before even
 most of a 20G index has been loaded into memory, and there may yet
 still be dark corners that only come up in edge-case queries that
 cause QTime spikes should these queries ever occur.

 I would like to ensure that, at startup, no query will incur
 disk-seek/read penalties.

 Is the right way to achieve this to copy the index to a ramfs (NOT
 ramdisk) mount and then continue to use MMapDirectory in Solr to read
 the index? I am under the impression that when using ramfs (rather
 than ramdisk, for which this would not work) a file mmaped on a ramfs
 mount will actually share the same address space, and so would not
 incur the typical double-ram overhead of mmaping a file in memory just
 o have yet another copy of the file created in a second memory
 location. Is this correct? If not, would you please point me to
 documentation stating otherwise (I haven't found much documentation
 either way).


 I am not familiar with any double-ram overhead from using mmap.  It should
 be extroardinarily efficient, so much so that even when your index won't fit
 in RAM, performance is typically still excellent.  Using an SSD instead of a
 spinning disk will increase performance across the board, until enough of
 the index is cached in RAM, after which it won't make a lot of difference.

 My parting thoughts, with a general note to the masses: Do not try this if
 you are not absolutely sure your index will fit in memory!  It will tend to
 cause WAY more problems than it will solve for most people with large
 indexes.

 If you actually do have considerably more RAM than your index size, and you
 know that the index will never grow to where it might not fit, you can use a
 simple trick to get it all cached, even before running queries.  Just read
 the entire contents of the index, discarding everything you read.  There are
 two main OS variants to consider here, and both can be scripted, as noted
 below.  Run the command twice to see the difference that caching makes for
 the second run.  Note that an SSD would speed the first run of these
 commands up considerably:

 *NIX (may work on a mac too):
 cat /path/to/index/files/*  /dev/null

 Windows:
 type C:\Path\To\Index\Files\*  NUL

 Thanks,
 Shawn



Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)

2012-10-24 Thread François Schiettecatte
Aaron

The best way to make sure the index is cached by the OS is to just cat it on 
startup:

cat `find /path/to/solr/index`  /dev/null

Just make sure your index is smaller than RAM otherwise data will be rotated 
out.

Memory mapping is built on the virtual memory system, and I suspect that ramfs 
is too, so I doubt very much that copying your index to ramfs will help at all. 
Sidebar - a while ago I did a bunch of testing copying indices to shared memory 
(/dev/shm in this case) and there was no advantage compared to just accessing 
indices on disc when using memory mapping once the system got to a steady state.

There has been a lot written about this topic on the list. Basically it come 
down to using MMapDirectory (which is the default), make sure your index is 
smaller than your RAM, and allocate just enough memory to the Java VM. That 
last part requires some benchmarking because it is so workload dependent.

Best regards

François

On Oct 24, 2012, at 8:29 PM, Aaron Daubman daub...@gmail.com wrote:

 Greetings,
 
 Most times I've seen the topic of storing one's index in memory, it
 seems the asker was referring (or understood to be referring) to the
 (in)famous not intended to work with huge indexes Solr RAMDirectory.
 
 Let me be clear that that I am not interested in RAMDirectory.
 However, I would like to better understand the oft-recommended and
 currently-default MMapDirectory, and what the tradeoffs would be, when
 using a 64-bit linux server dedicated to this single solr instance,
 with plenty (more than 2x index size) of RAM, of storing the index
 files on SSDs versus on a ramfs mount.
 
 I understand that using the default MMapDirectory will allow caching
 of the index in-memory, however, my understanding is that mmaped files
 are demand-paged (lazy evaluated), meaning that only after a block is
 read from disk will it be paged into memory - is this correct? is it
 actually block-by-block (page size by page size?) - any pointers to
 decent documentation on this regardless of the effectiveness of the
 approach would be appreciated...
 
 My concern with using MMapDirectory for an index stored on disk (even
 SSDs), if my understanding is correct, is that there is still a large
 startup cost to MMapDirectory, as it may take many queries before even
 most of a 20G index has been loaded into memory, and there may yet
 still be dark corners that only come up in edge-case queries that
 cause QTime spikes should these queries ever occur.
 
 I would like to ensure that, at startup, no query will incur
 disk-seek/read penalties.
 
 Is the right way to achieve this to copy the index to a ramfs (NOT
 ramdisk) mount and then continue to use MMapDirectory in Solr to read
 the index? I am under the impression that when using ramfs (rather
 than ramdisk, for which this would not work) a file mmaped on a ramfs
 mount will actually share the same address space, and so would not
 incur the typical double-ram overhead of mmaping a file in memory just
 o have yet another copy of the file created in a second memory
 location. Is this correct? If not, would you please point me to
 documentation stating otherwise (I haven't found much documentation
 either way).
 
 Finally, given the desire to be quick at startup with a large index
 that will still easily fit within a system's memory, am I thinking
 about this wrong or are there other better approaches?
 
 Thanks, as always,
 Aaron



Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)

2012-10-24 Thread Mark Miller
Was going to say the same thing. It's also usually a good idea to reduce paging 
(eg 0 swappiness in linux).

- Mark

On Oct 24, 2012, at 9:36 PM, François Schiettecatte fschietteca...@gmail.com 
wrote:

 Aaron
 
 The best way to make sure the index is cached by the OS is to just cat it on 
 startup:
 
   cat `find /path/to/solr/index`  /dev/null
 
 Just make sure your index is smaller than RAM otherwise data will be rotated 
 out.
 
 Memory mapping is built on the virtual memory system, and I suspect that 
 ramfs is too, so I doubt very much that copying your index to ramfs will help 
 at all. Sidebar - a while ago I did a bunch of testing copying indices to 
 shared memory (/dev/shm in this case) and there was no advantage compared to 
 just accessing indices on disc when using memory mapping once the system got 
 to a steady state.
 
 There has been a lot written about this topic on the list. Basically it come 
 down to using MMapDirectory (which is the default), make sure your index is 
 smaller than your RAM, and allocate just enough memory to the Java VM. That 
 last part requires some benchmarking because it is so workload dependent.
 
 Best regards
 
 François
 
 On Oct 24, 2012, at 8:29 PM, Aaron Daubman daub...@gmail.com wrote:
 
 Greetings,
 
 Most times I've seen the topic of storing one's index in memory, it
 seems the asker was referring (or understood to be referring) to the
 (in)famous not intended to work with huge indexes Solr RAMDirectory.
 
 Let me be clear that that I am not interested in RAMDirectory.
 However, I would like to better understand the oft-recommended and
 currently-default MMapDirectory, and what the tradeoffs would be, when
 using a 64-bit linux server dedicated to this single solr instance,
 with plenty (more than 2x index size) of RAM, of storing the index
 files on SSDs versus on a ramfs mount.
 
 I understand that using the default MMapDirectory will allow caching
 of the index in-memory, however, my understanding is that mmaped files
 are demand-paged (lazy evaluated), meaning that only after a block is
 read from disk will it be paged into memory - is this correct? is it
 actually block-by-block (page size by page size?) - any pointers to
 decent documentation on this regardless of the effectiveness of the
 approach would be appreciated...
 
 My concern with using MMapDirectory for an index stored on disk (even
 SSDs), if my understanding is correct, is that there is still a large
 startup cost to MMapDirectory, as it may take many queries before even
 most of a 20G index has been loaded into memory, and there may yet
 still be dark corners that only come up in edge-case queries that
 cause QTime spikes should these queries ever occur.
 
 I would like to ensure that, at startup, no query will incur
 disk-seek/read penalties.
 
 Is the right way to achieve this to copy the index to a ramfs (NOT
 ramdisk) mount and then continue to use MMapDirectory in Solr to read
 the index? I am under the impression that when using ramfs (rather
 than ramdisk, for which this would not work) a file mmaped on a ramfs
 mount will actually share the same address space, and so would not
 incur the typical double-ram overhead of mmaping a file in memory just
 o have yet another copy of the file created in a second memory
 location. Is this correct? If not, would you please point me to
 documentation stating otherwise (I haven't found much documentation
 either way).
 
 Finally, given the desire to be quick at startup with a large index
 that will still easily fit within a system's memory, am I thinking
 about this wrong or are there other better approaches?
 
 Thanks, as always,
Aaron
 



Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)

2012-10-24 Thread Shawn Heisey

On 10/24/2012 6:29 PM, Aaron Daubman wrote:

Let me be clear that that I am not interested in RAMDirectory.
However, I would like to better understand the oft-recommended and
currently-default MMapDirectory, and what the tradeoffs would be, when
using a 64-bit linux server dedicated to this single solr instance,
with plenty (more than 2x index size) of RAM, of storing the index
files on SSDs versus on a ramfs mount.

I understand that using the default MMapDirectory will allow caching
of the index in-memory, however, my understanding is that mmaped files
are demand-paged (lazy evaluated), meaning that only after a block is
read from disk will it be paged into memory - is this correct? is it
actually block-by-block (page size by page size?) - any pointers to
decent documentation on this regardless of the effectiveness of the
approach would be appreciated...


You are correct that the data must have just been accessed to be in the 
disk cache.This does however include writes -- so any data that gets 
indexed will be in the cache because it has just been written.  I do 
believe that it is read in one page block at a time, and I believe that 
the blocks are 4k in size.



My concern with using MMapDirectory for an index stored on disk (even
SSDs), if my understanding is correct, is that there is still a large
startup cost to MMapDirectory, as it may take many queries before even
most of a 20G index has been loaded into memory, and there may yet
still be dark corners that only come up in edge-case queries that
cause QTime spikes should these queries ever occur.

I would like to ensure that, at startup, no query will incur
disk-seek/read penalties.

Is the right way to achieve this to copy the index to a ramfs (NOT
ramdisk) mount and then continue to use MMapDirectory in Solr to read
the index? I am under the impression that when using ramfs (rather
than ramdisk, for which this would not work) a file mmaped on a ramfs
mount will actually share the same address space, and so would not
incur the typical double-ram overhead of mmaping a file in memory just
o have yet another copy of the file created in a second memory
location. Is this correct? If not, would you please point me to
documentation stating otherwise (I haven't found much documentation
either way).


I am not familiar with any double-ram overhead from using mmap.  It 
should be extroardinarily efficient, so much so that even when your 
index won't fit in RAM, performance is typically still excellent.  Using 
an SSD instead of a spinning disk will increase performance across the 
board, until enough of the index is cached in RAM, after which it won't 
make a lot of difference.


My parting thoughts, with a general note to the masses: Do not try this 
if you are not absolutely sure your index will fit in memory!  It will 
tend to cause WAY more problems than it will solve for most people with 
large indexes.


If you actually do have considerably more RAM than your index size, and 
you know that the index will never grow to where it might not fit, you 
can use a simple trick to get it all cached, even before running 
queries.  Just read the entire contents of the index, discarding 
everything you read.  There are two main OS variants to consider here, 
and both can be scripted, as noted below.  Run the command twice to see 
the difference that caching makes for the second run.  Note that an SSD 
would speed the first run of these commands up considerably:


*NIX (may work on a mac too):
cat /path/to/index/files/*  /dev/null

Windows:
type C:\Path\To\Index\Files\*  NUL

Thanks,
Shawn



RAMDirectory - still stores some docs on disk?

2012-10-21 Thread deniz
Hello 

I am using RAMDirectory for running some experiments and came up with a
weird (well, for me) situation. basically after indexing on RAM. i have
killed the JVM and then restarted it after some time. I can still see some
documents as indexed and searchable. I had indexed more than 2M docs before
shutting down and after restart there were around 15K docs still in the
index. 

So how could this happen? there is some caching mechanism which backups some
amount(?) of the total index or it is directly written on disk? (if so, how?
) 






-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/RAMDirectory-still-stores-some-docs-on-disk-tp4015022.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 3.1 core with RAMDirectory isn't reloaded

2011-04-14 Thread nskmda
Hello,

We just tried core reloading on a freshly installed Solr 3.1.0 with
RamDirectoryFactory.
It doesn't seem to happen.
With the FSDirectoryFactory everything works fine.

Looks like the RamDirectoryFactory implementation caches directory and if
it's available it doesn't really reopen it thus not having updated index
loaded into memory.

Can anyone comment on this?
Should we implement our own RamDirectoryFactory?

Here is the code snippet from Solr 3.1.0. It looks a bit confusing.

public Directory open(String path) throws IOException {
synchronized (RAMDirectoryFactory.class) {
  RefCntRamDirectory directory = directories.get(path);
  if (directory == null || !directory.isOpen()) {
directory = (RefCntRamDirectory) openNew(path);
directories.put(path, directory);
  } else {
directory.incRef();
  }

  return directory;
}
  }


Shouldn't the directory reload the data whenever it gets an opening request
(because incRef doesn't really do much except for reference count
increment)?
We expected it to reload the data (or at least check the data on disk has
been updated) even if the path to the filesystem directory is the same.

Regards,
Dmitry


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-1-core-with-RAMDirectory-isn-t-reloaded-tp2821875p2821875.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Ramdirectory

2011-02-25 Thread Matt Weber
I have used this without issue.  In the example solrconfig.xml replace
this line:

directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

with this one:

directoryFactory name=DirectoryFactory class=solr.RAMDirectoryFactory/

Thanks,
Matt Weber

On Thu, Feb 24, 2011 at 7:47 PM, Bill Bell billnb...@gmail.com wrote:
 Thanks - yeah that is why I asked how to use it. But I still don't know
 how to use it.

 https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/
 RAMDirectoryFactory.html


 https://issues.apache.org/jira/browse/SOLR-465

 directoryProvider class=org.apache.lucene.store.RAMDirectory
 !-- Parameters as required by the implementation --
 /directoryProvider


 Is that right? Examples? Options?

 Where do I put that in solrconfig.xml ? Do I put it in
 mainIndex/directoryProvider ?

 I know that SOLR-465 is more generic, but
 https://issues.apache.org/jira/browse/SOLR-480 seems easier to use.



 Thanks.


 On 2/24/11 6:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote:


: I could not figure out how to setup the ramdirectory option in
solrconfig.XML. Does anyone have an example for 1.4?

it wasn't an option in 1.4.

as Koji had already mentioned in the other thread where you chimed in
and asked about this, it was added in the 3x branch...

http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td25671
66.html



-Hoss






-- 
Thanks,
Matt Weber


Ramdirectory

2011-02-24 Thread Bill Bell
I could not figure out how to setup the ramdirectory option in solrconfig.XML. 
Does anyone have an example for 1.4?

Bill Bell
Sent from mobile



Re: Ramdirectory

2011-02-24 Thread Chris Hostetter

: I could not figure out how to setup the ramdirectory option in 
solrconfig.XML. Does anyone have an example for 1.4?

it wasn't an option in 1.4.

as Koji had already mentioned in the other thread where you chimed in
and asked about this, it was added in the 3x branch...

http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td2567166.html



-Hoss


Re: Ramdirectory

2011-02-24 Thread Bill Bell
Thanks - yeah that is why I asked how to use it. But I still don't know
how to use it.

https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/
RAMDirectoryFactory.html


https://issues.apache.org/jira/browse/SOLR-465

directoryProvider class=org.apache.lucene.store.RAMDirectory
!-- Parameters as required by the implementation --
/directoryProvider


Is that right? Examples? Options?

Where do I put that in solrconfig.xml ? Do I put it in
mainIndex/directoryProvider ?

I know that SOLR-465 is more generic, but
https://issues.apache.org/jira/browse/SOLR-480 seems easier to use.



Thanks.


On 2/24/11 6:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote:


: I could not figure out how to setup the ramdirectory option in
solrconfig.XML. Does anyone have an example for 1.4?

it wasn't an option in 1.4.

as Koji had already mentioned in the other thread where you chimed in
and asked about this, it was added in the 3x branch...

http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td25671
66.html



-Hoss




Re: Problems with RAMDirectory in Solr

2010-10-01 Thread Chris Hostetter

: Hello. We just recently started using a RAMDirectory with Solr and found a
: problem. When we restart the Solr, the RAMDirectory is refreshed as
: expected. Hwoever when we use the snapinstaller script to update the index,
: the RAMDirectory is not updated. Is there any way to update the RAMDirectory
: after every commit? Thanks.

The snapinstaller script are garunteed to never work with a RAMDirectory 
based Solr setup -- all those scripts do is manage some hardlinks on the 
filesystem and trigger a commit -- nothing about them will copy from a 
FSDirectory to a RAMDirectory.

Using the newer Java based replication *might* work, but i'm not sure 
about that -- i think it probably only works with the 
StandardDirectoryProvider, but it could possibly be made to work (if it 
doesn't already) by writing code to bulk copy the files from an 
FSDirectory opened on disk to the RAMDirectory.

Honestly though: this seems like it defeats the point of using a 
RAMDirectory (totally transient memory only indexes).  If your goal is to 
have an index read from disk, that is then kept entirely in memory, why 
not just leave some memory available for the OS's file system cache and 
let it do it's job?  (add a few warming queries to ensure the index pages 
are read into ram, and you should be in business)



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Configuring Solr to use RAMDirectory

2010-01-04 Thread Shalin Shekhar Mangar
On Thu, Dec 31, 2009 at 3:36 PM, dipti khullar dipti.khul...@gmail.comwrote:

 Hi

 Can somebody let me know if its possible to configure RAMDirectory from
 solrconfig.xml. Although its clearly mentioned in
 https://issues.apache.org/jira/browse/SOLR-465 by Mark that he has worked
 upon it, but still I couldn't find any such property in config file in Solr
 1.4 latest download.
 May be I am overlooking some simple property. Any help would be
 appreciated.


Note that there are things like replication which will not work if you are
using a RAMDirectory.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Configuring Solr to use RAMDirectory

2010-01-02 Thread Raghuveer Kancherla
Hi Dipti,
Just out of curiosity, are you trying to use RAMDirectory for improvement in
speed? I tried doing that and did not see any significant improvement. Would
be nice to know what your experiment shows.

- Raghu


On Thu, Dec 31, 2009 at 4:17 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 It's possible, but requires a custom DirectoryFactory implementation.
  There isn't a built in factory to construct a RAMDirectory.  You wire it
 into solrconfig.xml this way:

  directoryFactory name=DirectoryFactory
 class=[fully.qualified.classname]
!-- Parameters as required by the implementation --
  /directoryFactory



 On Dec 31, 2009, at 5:06 AM, dipti khullar wrote:

  Hi

 Can somebody let me know if its possible to configure RAMDirectory from
 solrconfig.xml. Although its clearly mentioned in
 https://issues.apache.org/jira/browse/SOLR-465 by Mark that he has worked
 upon it, but still I couldn't find any such property in config file in
 Solr
 1.4 latest download.
 May be I am overlooking some simple property. Any help would be
 appreciated.


 Thanks
 Dipti

 On Fri, Nov 20, 2009 at 2:27 PM, Andrey Klochkov 
 akloch...@griddynamics.com

 wrote:


  I thought that SOLR-465 just does what is asked, i.e. one can use any
 Directory implementation including RAMDirectory. Thomas, take a look at
 it.

 On Thu, Nov 12, 2009 at 7:55 AM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:

  I think not out of the box, but look at SOLR-243 issue in JIRA.

 You could also put your index on ram disk (tmpfs), but it would be

 useless

 for writing to it.

 Note that when people ask about loading the whole index in memory
 explicitly, it's often a premature optimization attempt.

 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 

 From: Thomas Nguyen thngu...@ign.com
 To: solr-user@lucene.apache.org
 Sent: Wed, November 11, 2009 8:46:11 PM
 Subject: Configuring Solr to use RAMDirectory

 Is it possible to configure Solr to fully load indexes in memory?  I
 wasn't able to find any documentation about this on either their site

 or

 in the Solr 1.4 Enterprise Search Server book.





 --
 Andrew Klochkov
 Senior Software Engineer,
 Grid Dynamics





Re: Configuring Solr to use RAMDirectory

2009-12-31 Thread dipti khullar
Hi

Can somebody let me know if its possible to configure RAMDirectory from
solrconfig.xml. Although its clearly mentioned in
https://issues.apache.org/jira/browse/SOLR-465 by Mark that he has worked
upon it, but still I couldn't find any such property in config file in Solr
1.4 latest download.
May be I am overlooking some simple property. Any help would be appreciated.


Thanks
Dipti

On Fri, Nov 20, 2009 at 2:27 PM, Andrey Klochkov akloch...@griddynamics.com
 wrote:

 I thought that SOLR-465 just does what is asked, i.e. one can use any
 Directory implementation including RAMDirectory. Thomas, take a look at it.

 On Thu, Nov 12, 2009 at 7:55 AM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:

  I think not out of the box, but look at SOLR-243 issue in JIRA.
 
  You could also put your index on ram disk (tmpfs), but it would be
 useless
  for writing to it.
 
  Note that when people ask about loading the whole index in memory
  explicitly, it's often a premature optimization attempt.
 
  Otis
  --
  Sematext is hiring -- http://sematext.com/about/jobs.html?mls
  Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
  - Original Message 
   From: Thomas Nguyen thngu...@ign.com
   To: solr-user@lucene.apache.org
   Sent: Wed, November 11, 2009 8:46:11 PM
   Subject: Configuring Solr to use RAMDirectory
  
   Is it possible to configure Solr to fully load indexes in memory?  I
   wasn't able to find any documentation about this on either their site
 or
   in the Solr 1.4 Enterprise Search Server book.
 
 


 --
 Andrew Klochkov
 Senior Software Engineer,
 Grid Dynamics



Re: Configuring Solr to use RAMDirectory

2009-12-31 Thread Erik Hatcher
It's possible, but requires a custom DirectoryFactory implementation.   
There isn't a built in factory to construct a RAMDirectory.  You wire  
it into solrconfig.xml this way:


  directoryFactory name=DirectoryFactory  
class=[fully.qualified.classname]

!-- Parameters as required by the implementation --
  /directoryFactory


On Dec 31, 2009, at 5:06 AM, dipti khullar wrote:


Hi

Can somebody let me know if its possible to configure RAMDirectory  
from

solrconfig.xml. Although its clearly mentioned in
https://issues.apache.org/jira/browse/SOLR-465 by Mark that he has  
worked
upon it, but still I couldn't find any such property in config file  
in Solr

1.4 latest download.
May be I am overlooking some simple property. Any help would be  
appreciated.



Thanks
Dipti

On Fri, Nov 20, 2009 at 2:27 PM, Andrey Klochkov akloch...@griddynamics.com

wrote:



I thought that SOLR-465 just does what is asked, i.e. one can use any
Directory implementation including RAMDirectory. Thomas, take a  
look at it.


On Thu, Nov 12, 2009 at 7:55 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


I think not out of the box, but look at SOLR-243 issue in JIRA.

You could also put your index on ram disk (tmpfs), but it would be

useless

for writing to it.

Note that when people ask about loading the whole index in memory
explicitly, it's often a premature optimization attempt.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 

From: Thomas Nguyen thngu...@ign.com
To: solr-user@lucene.apache.org
Sent: Wed, November 11, 2009 8:46:11 PM
Subject: Configuring Solr to use RAMDirectory

Is it possible to configure Solr to fully load indexes in  
memory?  I
wasn't able to find any documentation about this on either their  
site

or

in the Solr 1.4 Enterprise Search Server book.






--
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics





Configuring Solr to use RAMDirectory

2009-11-11 Thread Thomas Nguyen
Is it possible to configure Solr to fully load indexes in memory?  I
wasn't able to find any documentation about this on either their site or
in the Solr 1.4 Enterprise Search Server book.



Re: Configuring Solr to use RAMDirectory

2009-11-11 Thread Otis Gospodnetic
I think not out of the box, but look at SOLR-243 issue in JIRA.

You could also put your index on ram disk (tmpfs), but it would be useless for 
writing to it.

Note that when people ask about loading the whole index in memory explicitly, 
it's often a premature optimization attempt.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Thomas Nguyen thngu...@ign.com
 To: solr-user@lucene.apache.org
 Sent: Wed, November 11, 2009 8:46:11 PM
 Subject: Configuring Solr to use RAMDirectory
 
 Is it possible to configure Solr to fully load indexes in memory?  I
 wasn't able to find any documentation about this on either their site or
 in the Solr 1.4 Enterprise Search Server book.



Re: Does SOLR support RAMDirectory ?

2008-06-02 Thread Grant Ingersoll
What are you looking to do?  Lucene inherently uses RAMDirectory under  
the covers during indexing, but not sure if that is your interest.


-Grant

On Jun 1, 2008, at 5:09 PM, s d wrote:


Can i use RAMDirectory in SOLR?Thanks,
S


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Does SOLR support RAMDirectory ?

2008-06-01 Thread s d
Can i use RAMDirectory in SOLR?Thanks,
S


Re: RAMDirectory

2007-12-29 Thread Otis Gospodnetic
Hi,

If you have enough RAM to load the whole index into RAM using RAMDirectory, 
then you could also just use tmpfs to load your index in RAM... tmpfs exists 
under Linux, Solaris, and BSD, I believe.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: s d [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Thursday, December 27, 2007 8:18:14 PM
Subject: RAMDirectory

Is there a way to use RAMDirectory with SOLR?If you can point me to
documentation that would be great.
Thanks,
S





RAMDirectory

2007-12-27 Thread s d
Is there a way to use RAMDirectory with SOLR?If you can point me to
documentation that would be great.
Thanks,
S


RAMDirectory

2007-09-22 Thread Jae Joo
HI,

Does any know how to use RAM disk for index?

Thanks,

Jae Joo


RE: RAMDirectory

2007-09-22 Thread Jeryl Cook
not yet implemented ,hopefully soon :

http://jira.terracotta.org/jira/browse/CDV-399



Jeryl Cook 



/^\ Pharaoh /^\ 

http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986)

 Date: Sat, 22 Sep 2007 15:33:58 -0400
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: RAMDirectory
 
 HI,
 
 Does any know how to use RAM disk for index?
 
 Thanks,
 
 Jae Jo,