Thanks Shawn. What do you mean with "important parts of index"? and how to calculate their size?
Thanks, Mahmoud Sent from my iPhone > On Dec 29, 2014, at 8:19 PM, Shawn Heisey <apa...@elyograg.org> wrote: > >> On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote: >> I've the same index with a bit different schema and 200M documents, >> installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size >> of index is about 1.5TB, have many updates every 5 minutes, complex queries >> and faceting with response time of 100ms that is acceptable for us. >> >> Toke Eskildsen, >> >> Is the index updated while you are searching? *No* >> Do you do any faceting or other heavy processing as part of a search? *No* >> How many hits does a search typically have and how many documents are >> returned? *The test for QTime only with no documents returned and No. of >> hits varying from 50,000 to 50,000,000.* >> How many concurrent searches do you need to support? How fast should the >> response time be? *May be 100 concurrent searches with 100ms with facets.* >> >> Does splitting the shard to two shards on the same node so every shard will >> be on a single EBS Volume better than using LVM? > > The basic problem is simply that the system has so little memory that it > must read large amounts of data from the disk when it does a query. > There is not enough RAM to cache the important parts of the index. RAM > is much faster than disk, even SSD. > > Typical consumer-grade DDR3-1600 memory has a data transfer rate of > about 12800 megabytes per second. If it's ECC memory (which I would say > is a requirement) then the transfer rate is probably a little bit slower > than that. Figuring 9 bits for every byte gets us about 11377 MB/s. > That's only an estimate, and it could be wrong in either direction, but > I'll go ahead and use it. > > http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules > > If your SSD is SATA, the transfer rate will be limited to approximately > 600MB/s -- the 6 gigabit per second transfer rate of the newest SATA > standard. That makes memory about 18 times as fast as SATA SSD. I saw > one PCI express SSD that claimed a transfer rate of 2900 MB/s. Even > that is only about one fourth of the estimated speed of DDR3-1600 with > ECC. I don't know what interface technology Amazon uses for their SSD > volumes, but I would bet on it being the cheaper version, which would > mean SATA. The networking between the EC2 instance and the EBS storage > is unknown to me and may be a further bottleneck. > > http://ocz.com/enterprise/z-drive-4500/specifications > > Bottom line -- you need a lot more memory. Speeding up the disk may > *help* ... but it will not replace that simple requirement. With EC2 as > the platform, you may need more instances and more shards. > > Your 200 million document index that works well with only 90GB of total > memory ... that's surprising to me. That means that the important parts > of that index *do* fit in memory ... but if the index gets much larger, > performance is likely to drop off sharply. > > Thanks, > Shawn >