Re: Bitcask won't merge without explicit merge() call

2011-12-12 Thread Justin Sheehy
Dmitry, What you are expecting is Bitcask's normal behavior, though I can see why it might not be what you expected. Bitcask does not quite auto-merge; instead it provides you with the tools to easily decide when a merge is needed, and to easily have a merge scheduled when you wish. Does this

Re: Open ticket for configurable R-value in MapReduce?

2011-12-14 Thread Justin Sheehy
Elias, On Dec 14, 2011, at 5:32 PM, Elias Levy wrote: > If you add a node, that node will be empty. If MR chooses the new node, the > choice of R=1 will cause it to think there is no data to process. As time > goes on that node will gain new data or be populated by read-repair, but it > will

Re: Python-riak links?

2011-12-24 Thread Justin Sheehy
They are just stored in the metadata field of the object; what you describe is roughly equivalent except that link traversal can occur without roundtrips between Riak and your client. Justin On Dec 24, 2011, at 11:38 AM, Shuhao Wu wrote: > How are the links implemented? > > Would it be fas

Re: Absolute consistency

2012-01-10 Thread Justin Sheehy
On Jan 10, 2012, at 9:42 PM, Les Mikesell wrote: > How do things like mongo and elasticsearch manage atomic operations > while still being redundant? Most such systems use some variant of primary copy replication, also known as master/slave replication. That approach can provide consistency, b

Re: Delete old record

2012-01-19 Thread Justin Sheehy
On Jan 18, 2012, at 7:12 PM, kser wrote: > Is there anyway to delete old record ?? This question could mean either of two things. You can of course issue a delete request against any records you like, using any of Riak's APIs. If you would instead like records to automatically be deleted when

licenses (was Re: riakkit, a python riak object mapper, has hit beta!(

2012-03-01 Thread Justin Sheehy
Hi, Andrey. On Mar 1, 2012, at 10:18 PM, "Andrey V. Martyanov" wrote: > Sorry for GPL, it's a typo. I just don't like GPL-based licenses, including > LGPL. I think it's overcomplicated. You are of course free to dislike anything you wish, but it is worth mentioning that GPL and LGPL are very

Re: What kind of protocol are used between Riak nodes?

2012-05-28 Thread Justin Sheehy
Hi, Alek. On May 28, 2012, at 1:40 PM, Alek Morfi wrote: > What kind of protocol is used betwwen Riak nodes to communicate. Because if > all Riak nodes are located in the same cluster (LAN network scale) there is > no problem. > But when Riak nodes are located on different clusters which are co

Re: Riak as Binary File Store

2012-05-29 Thread Justin Sheehy
Hi, Praveen. Nothing about what you have said would cause a problem for Riak. Go for it! Justin On May 29, 2012, at 8:36 AM, Praveen Baratam wrote: > Hello Everybody! > > I have read abundantly over the web that Riak is very well suited to store > and retrieve small binary objects such as

Re: Atomicity of if_not_modified?

2013-01-04 Thread Justin Sheehy
On Jan 4, 2013, at 1:25 PM, Les Mikesell wrote: > And, doesn't every description of riak behavior have to include the > scenario where the network is partitioned and updates are > simultaneously performed by entities that can't contact each other? > If it weren't for that possibility, it could ju

Re: Atomicity of if_not_modified?

2013-01-06 Thread Justin Sheehy
On Jan 3, 2013, at 11:44 AM, Kaspar Thommen wrote: > Can someone confirm this? If it's true, what exactly is the purpose of > offering the if_not_modified flag? Yes, I confirmed this earlier in this thread: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-January/010672.html -

Re: Stopping/Starting Riak.

2013-01-12 Thread Justin Sheehy
On Jan 12, 2013, at 10:26 AM, Kevin Burton wrote: > I noticed that I have no problem with ‘sudo /etc/init.d/riak stop’. But, when > I try to start the process with ‘sudo /etc/init.d/riak start’ I am met with a > prompt for a password. What is the password? I don’t recall setting a > password.

mailing list headers (was Re: riak cluster suddenly became unresponsive)

2013-03-19 Thread Justin Sheehy
Hi, Ingo. On Mar 19, 2013, at 10:41 AM, Ingo Rockel wrote: > and the riak-users mailer-daemon should really set a "reply-to"… Most email client programs have two well-understood controls for replies, one for "reply (to sender)" and one for "reply to all." We are not going to make one of them b

Re: two-node cluster for riak?

2013-04-18 Thread Justin Sheehy
Hi, Michael. Your spidey-sense is absolutely correct. Recall for a moment that Riak by default will store 3 copies of everything. This means that in a two-node configuration any given value will be stored once on one node and twice on the other. Not only does this mean a whole lot of wasted wo

Re: write value reality check

2013-06-28 Thread Justin Sheehy
Hi, Louis-Philippe. With a 2-node cluster and N=3, each value will be written to disk a total of three times: twice on one node, once on the other. (The W setting has no effect on the number of copies made or hosts used.) That behavior might seem a bit strange, but it's a strange configuration

Re: Help with local restore for dev enviroment

2013-07-10 Thread Justin Sheehy
Hi, Mark. You've already received a little advice generally so I won't pile on that part, but one thing stood out to me: > My client has sent me a backup from one of their cluster nodes. bitcask > data,. rings and config. Unless I'm misunderstanding what you're doing, what you're working on wi

Re: What is the purpose of "rel" links?

2013-07-22 Thread Justin Sheehy
Hi, Age. The Link header in HTTP as used by Riak is defined by RFC 5988. In the Link Relation Type registry (http://tools.ietf.org/html/rfc5988#section-6.2.2) you can see that the relation type "up" refers to a parent document in a hierarchy of documents. In Riak, this means the bucket a key is

Re: Funky List-Id headers in sent messages from this list

2013-11-08 Thread Justin Sheehy
The mailman host that is used to manage the list was moved last month, so that's probably the source of the change. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Question about the source code: riak_get_fsm

2010-04-13 Thread Justin Sheehy
Hi, Marc. I understand your confusion as that code is a bit subtle. The reason this isn't a bug is that upon receiving the very first notfound in your situation, the "FailThreshold" case in the clause for notfound messages would return true -- since it would already know that it could never get

Re: Big Changes in Riak Tip

2010-04-14 Thread Justin Sheehy
On Wed, Apr 14, 2010 at 1:47 PM, Jonathan Lee wrote: > I'm having trouble building with the latest tip on OS X 10.6.  Does 0.10 > require Erlang R13B04? Yes, it does. That (and the reason for it) will be in the 0.10 release notes. Our apologies for not making that clearer earlier. -Justin __

Re: sidebar :: quick webmachine question

2010-04-19 Thread Justin Sheehy
Hi, Richard. On Mon, Apr 19, 2010 at 12:08 PM, Richard Bucker wrote: > I read an article(from someone at basho) that said that WebMachine was going > to be more public or something like that. In the meantime it has been forked > several times and yet projects like riak integrate it. Other branch

Re: running naked : suggested firewall rules

2010-04-21 Thread Justin Sheehy
On Wed, Apr 21, 2010 at 8:27 AM, richard bucker wrote: > If a riak server is insecure in the DMZ then it's also insecure in the > enterprise. I might be misunderstanding what you mean by this. I don't know of any enterprises that think it is a good idea to run their Oracle databases directly av

Re: setting default bucket props

2010-04-28 Thread Justin Sheehy
If the N value for the bucket is lower than the R or W value in a request, then the request cannot succeed. That sounds likely in this case. An upcoming release will provide more useful messages when someone makes that particular client error. -Justin On Wed, Apr 28, 2010 at 12:35 PM, Matthew

Re: setting default bucket props

2010-04-28 Thread Justin Sheehy
On Wed, Apr 28, 2010 at 1:38 PM, Matthew Pflueger wrote: > Stupid question: Is there a way to set the default read values for a > request on the server side when a client doesn't explicitly set them? Not currently. The defaults at this time are in the client libraries. -Justin ___

Hello, Bitcask!

2010-05-05 Thread Justin Sheehy
Riak Users, You might have noticed that we released a new local key/value store recently: http://blog.basho.com/2010/04/27/hello,-bitcask/ As of just now, it is available as a storage engine ("backend") in the tip of the Riak repository. You can use it like any other backend just by setting the

Re: Riak Bitcask backend is very unstable on OS X 10.6.3

2010-05-08 Thread Justin Sheehy
Hello, That error message is due to running out of filehandles. I am guessing that you have a large number of empty files in your bitcask data directories. If so, there are two pieces of information you may find useful: 1 - it is safe to delete the empty files 2 - This will be addressed very s

Re: Replication behavior

2010-05-13 Thread Justin Sheehy
Hi, Jimmy. With an n_val of 3, there will be 3 copies of each data item in the cluster even when there are less than 3 hosts. With 2 nodes in that situation, each node will have either 1 or 2 copies of each item. Does that help with your understanding? -Justin _

Re: CAP controls

2010-05-13 Thread Justin Sheehy
Hi, Jeremy. It sounds like an interesting project. At this time, there is no way to indicate in Riak that two nodes are actually on the same host (and therefore should not overlap in replica sets). It could certainly be done, but to do so today would require modification to the ring partition cl

Re: returning multiple documents

2010-05-14 Thread Justin Sheehy
Hi, Gareth, You've pretty much hit on it. Either of your two options will work fine. Regards, -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Recovering datas when a node was joining again the cluster (with all node datas lost)

2010-05-18 Thread Justin Sheehy
Hello, Germain. You've already come across read-repair. Between that and hinted-handoff a great deal of passive anti-entropy is performed in a Riak cluster. As long as one doesn't use requests with R=N these mechanisms are generally sufficient. We do have plans for a more "active" anti-entropy

Re: cannot query bucket when a node is down

2010-06-01 Thread Justin Sheehy
On Tue, Jun 1, 2010 at 1:56 PM, Sam Tingleff wrote: > With no single point of failure there is no single index of keys. So > the only way to get an exhaustive list of keys in a given bucket is to > ask all nodes (I do not know if this is what riak is actually doing). Sam is exactly right that Ri

Re: I need distributed file system ?

2010-06-03 Thread Justin Sheehy
Hello, Antoni. Riak handles all the distribution for you, and generally expects to store its data to a local filesystem. You do not need or want any sort of underlying distributed filesystem in addition to Riak. Best, -Justin ___ riak-users mailing l

Re: Switching of backends

2010-06-08 Thread Justin Sheehy
Germain, If you have enough excess capacity that your cluster will be safe with one less machine for a little while, you can do this another way. Just "riak-admin leave" one machine, wait for it to hand off all of its data, "riak stop", set up that machine with a new install/config-file/backend/e

Re: Switching of backends

2010-06-08 Thread Justin Sheehy
On Tue, Jun 8, 2010 at 8:29 AM, Mårten Gustafson wrote: > How would I know when a node has handed off all its data - would the > status command report that it doesn't own any partitions? Good question. That won't quite do it, because the node will give up ownership of the partitions first, and

Re: [ANN] Riak Release 0.11.0

2010-06-11 Thread Justin Sheehy
Hi, Germain. On Fri, Jun 11, 2010 at 11:07 AM, Germain Maurice wrote: > Because of its append-only nature, stale data are created, so, how does > Bitcask to remove stale data ? An excellent question, and one that we haven't yet written enough about. > With CouchDB the compaction process on our

Re: [ANN] Riak Release 0.11.0

2010-06-14 Thread Justin Sheehy
Hi, Alan. Your replicas do in fact exist on both nodes. However, I understand that the situation you are observing is confusing. I will attempt to explain. Quite some time ago, something surprising was noticed by some of our users during their pre-production testing. Some intentional failure s

Re: Riak Recap for 6/10 - 6/13

2010-06-22 Thread Justin Sheehy
Hi, Joel. Thanks for your input! On Mon, Jun 14, 2010 at 4:17 PM, Joel Pitt wrote: > [re bitcask and in-memory data] > I'm sure it's probably already been considered, but just in case... > bloom filters could be an alternative to the requirement of keeping > *all* the keys in memory. I don't k

Re: Best way to back-up riak

2010-07-11 Thread Justin Sheehy
Hi, Jan. On Sun, Jul 11, 2010 at 8:53 PM, Jan Vincent wrote: > Given that riak is new in the database field, if ever I use riak in > production, > what would be the best way to back it up? I know that there's redundancy > on the different nodes and NRW may be modifiable per request, but I'm > w

Re: Conflict Resolution

2010-07-13 Thread Justin Sheehy
Hello, Misha. On Tue, Jul 13, 2010 at 1:06 PM, Misha Gorodnitzky wrote: > From doing a little testing, the last value in a multipart document is > the first, so "Thursday" in this case, can we assume that this will > always be the case? And is it a good idea to base conflict resolution > on this

Re: riak slides?

2010-07-13 Thread Justin Sheehy
Hi, Wilson. There are many sets out there. Which ones suit you best depends a lot on what you plan on saying in your talk. If you tell us a bit about the audience, the event, and what you hope to get across in your talk, then I bet that one of the people here who has given a Riak talk will have

Re: Conflict Resolution

2010-07-14 Thread Justin Sheehy
On Wed, Jul 14, 2010 at 5:25 AM, Misha Gorodnitzky wrote: > I don't suppose there are any examples anywhere of how people have > approached conflict resolution with RIak? That would be useful to help > people understand how to approach it ... maybe a section on the wiki > could be dedicated to it

Re: Expected vs Actual Bucket Behavior

2010-07-20 Thread Justin Sheehy
Hi, Eric! Thanks for your thoughts. On Tue, Jul 20, 2010 at 12:39 PM, Eric Filson wrote: > I would think that this requirement, > retrieving all objects in a bucket, to be a _very_ common > place occurrence for modern web development and perhaps (depending on > requirements) _the_ most common f

Re: Expected vs Actual Bucket Behavior

2010-07-21 Thread Justin Sheehy
I think that we are all (myself included) getting two different issues a bit mixed up in this discussion: 1: storing an implicit index of keys in the Riak key/value store 2: making buckets separate in that a per-bucket operation's performance would not be affected by the content of other buckets

Re: Expected vs Actual Bucket Behavior

2010-07-21 Thread Justin Sheehy
Hi, Alexander. On Wed, Jul 21, 2010 at 1:36 PM, Alexander Sicular wrote: > uses a separate bitcask per-bucket per-partition. What is a partition here? A > vnode or a physical host or something else? My apologies. Given that it was in our bugzilla I let myself use some Riak-internals jargon wit

Re: Is it inefficient to map over a small bucket when you have millions of other buckets?

2010-07-27 Thread Justin Sheehy
On Tue, Jul 13, 2010 at 6:02 AM, Nicolas Fouché wrote: > Giving just a bucket WILL traverse the entire keyspace. You may be interested in: https://issues.basho.com/show_bug.cgi?id=480 -Justin ___ riak-users mailing list riak-users@lists.basho.com ht

Re: Best way to back-up riak

2010-07-27 Thread Justin Sheehy
On Wed, Jul 21, 2010 at 2:01 PM, Alan McConnell wrote: > I'm curious about this as well.  Say I have a ten node cluster.  Could I > just schedule a midnight copy of each bitcask data directory every night, > then restore to another ten node cluster by dropping one of each data > directories on ea

Re: Use of fallback nodes for get requests?

2010-08-02 Thread Justin Sheehy
Hi, Nico. On Mon, Aug 2, 2010 at 1:19 PM, Nico Meyer wrote: > What I mean is, if I do a get request for a key with R=N, and one of the > first N nodes in the preflist is down the request will still succeed. > Why is that? Doesn't that undermine the purpose of seting R to a high > number (specifi

Re: Riak Heterogeneity

2010-08-21 Thread Justin Sheehy
Hi, Michael. On Tue, Aug 17, 2010 at 12:52 PM, Michael Russo wrote: > In the Dynamo design, the number of vnodes per physical node can be tweaked > to satisfy the heterogeneity principle. > Is there any way to do something similar with Riak? This is something that we think is an important idea

Re: failed to merge?

2010-08-21 Thread Justin Sheehy
Hi, Wilson. On Sat, Aug 21, 2010 at 10:06 PM, Wilson MacGyver wrote: > =ERROR REPORT > Failed to merge > follow by a bunch of list of bitcask files > > with final status > > : no_files_to_merge > > how does this happen, does this mean some files in the bitcask are missing? That's just an o

list_keys is less bad

2010-08-23 Thread Justin Sheehy
Riak Users, One aspect of Riak's interface that has often been discouraged in the past is the listing of all keys in a bucket. This has been for two reasons: the first is that it is necessarily an operation that is more heavyweight than any of the more targeted get/put/delete sorts of things, but

Re: list_keys is less bad

2010-08-23 Thread Justin Sheehy
On Mon, Aug 23, 2010 at 10:05 PM, Alexander Sicular wrote: > Three cheers! :-) > Git clone && make all && make rel It looks like they haven't yet migrated out to the github repos, but should do so sometime soon. In the meantime, the bitbucket repos are up to date with tip so you can get the b

Re: Filesize in riak

2010-09-04 Thread Justin Sheehy
Hi, John. On Thu, Sep 2, 2010 at 11:24 AM, John Axel Eriksson wrote: > I know the recommendation of max 50 megs per file in riak currently... but I > tried > uploading a file that was around 120 megs and everything went fine. Riak doesn't itself mandate a maximum object size... but since a riak

Re: Riak and no of clients limit?

2010-09-04 Thread Justin Sheehy
Hello, Senthilkumar. On Fri, Sep 3, 2010 at 4:28 PM, Senthilkumar Peelikkampatti wrote: >    I am using Riak with distributed Erlang and I wanted to know what's > the limit on # of riak clients (I used it before erlang pb client, so yet to > migrate). I am using single client to talk to Riak,

Re: Riak and no of clients limit?

2010-09-04 Thread Justin Sheehy
Hi, Seth. On Sat, Sep 4, 2010 at 5:59 PM, Seth Falcon wrote: > I'm working on a project where we have a webmachine-backed service > that talks to Riak.  I currently initialize one pb client for each > node in the cluster as part of the webmachine startup.  Then the > resources in the webmachine

Re: Riak and no of clients limit?

2010-09-04 Thread Justin Sheehy
On Sat, Sep 4, 2010 at 7:31 PM, Seth Falcon wrote: > Given that, it sounds like one would want a pool of pb clients such > that each resource takes a client out of the pool when handling a > request and returns it when done.  So there would be no concurrent > requests going through the same clien

Re: Listing large key spaces, and bucket Links header

2010-09-06 Thread Justin Sheehy
Hi, Gavin. A couple of things you may be interested in: - There have been improvements in both Bitcask and Riak since 0.12.1 (in tip of trunk and will be in the next release) to speed up (and reduce the resource consumption of) key listing. - You should probably use keys=stream in your request

Re: File descriptor leaks?

2010-10-18 Thread Justin Sheehy
Hi, Dmitry. What version of Riak are you using? And is there anything interesting in the error logs? -Justin On Thu, Oct 14, 2010 at 7:53 AM, Dmitry Demeshchuk wrote: > A small update. I've just encountered the same problem. Just about 3-4 > hours have passed. > > lsof | wc -l showed only a

Re: File descriptor leaks?

2010-10-24 Thread Justin Sheehy
Hi, Dmitry. On Mon, Oct 18, 2010 at 11:07 PM, Dmitry Demeshchuk wrote: > We are using 0.12.1. There was indeed a file descriptor leak in that version of Riak, fixed between then and the 0.13 release. I hadn't seen any situations which were causing it to take effect nearly as quickly as you're

Re: RiakSearch Backend Innostore ?

2010-10-30 Thread Justin Sheehy
On Sat, Oct 30, 2010 at 11:51 AM, Prometheus wrote: > Can we use Innostore for RiakSearch ?  what is the performance comparison for > search backends ?  Any information will be valuable. That depends on whether you mean the actual Search index backend, or the KV backend used for storing complete

Re: Riak and Locks

2010-11-08 Thread Justin Sheehy
Hello, Neville. On Mon, Nov 8, 2010 at 10:35 PM, Neville Burnell wrote: > Are there any plans for a Distributed Lock Service for Riak, to allow for > apps that *need* locking for some KV ? It has been discussed and agreed that it would be interesting, but there is nothing currently being develo

Re: Understanding Riaks rebalancing and handoff behaviour

2010-11-09 Thread Justin Sheehy
On Tue, Nov 9, 2010 at 10:30 AM, Alexander Sicular wrote: > Mainly, I'm of the impression that you should join/leave a cluster one > node at a time. This impression is correct. I believe that in the not-too-distant future a feature may be added to enable stable addition of many nodes at once, bu

Re: How could we test/simulate siblings?

2010-11-12 Thread Justin Sheehy
Hi, Cagdas. On Fri, Nov 12, 2010 at 8:17 PM, Cagdas Tulek wrote: > What is the best way of creating sibling records to see if my logic is > handling them correctly? Ensure that allow_mult is set to true. Create some object B/K. Get that object. It will come with some vector clock V. Put som

Re: Riak won't die all the way on OS X

2010-11-30 Thread Justin Sheehy
Jon, You can just leave empd running. That is standard erlang runtime behavior and generally won't cause any problems. -Justin On Tue, Nov 30, 2010 at 9:59 AM, Jon Brisbin wrote: > I'm running the pre-built binaries for Riak 0.13 (and 0.12 x64, for that > matter) for OS X 10.6. > When I do a

Re: Storing relationship data

2010-12-30 Thread Justin Sheehy
Hi, Bryan. The link data is embedded in the riak_object metadata, so you can easily observe it from outside Riak even when not performing link-walking queries. To see this in action, check out the "Link" headers when using the HTTP interface. -Justin On Thu, Dec 30, 2010 at 6:36 PM, Bryan Nag

Re: allow_multi VS HTTP Conditional PUT

2011-01-03 Thread Justin Sheehy
Hi, Eric. On Mon, Jan 3, 2011 at 1:09 AM, Eric Moritz wrote: > Hi I just read "Why Vector Clocks are Easy". I am having trouble > seeing the advantage of letting a stale PUT into production and merge > afterwards vs HTTP's Conditional PUT, which never let's a stale PUT > into production. This i

Re: PDF or OpenOffice Impresse presentations instead of .KEY + Windows question

2011-01-05 Thread Justin Sheehy
Hello, Jérôme. It looks like Jeremiah has already answered one of your questions, so I'll get the other. On Wed, Jan 5, 2011 at 6:41 PM, Jérôme Verstrynge wrote: > My other question/remark is: there does not seem to be a downloadable > version of Riak for Windows. Is there a technical reason fo

Re: Storing relationship data

2011-01-05 Thread Justin Sheehy
Hi, Bryan. On Thu, Dec 30, 2010 at 8:02 PM, Bryan Nagle wrote: > Our current setup, is we are using webmachine;  Client connects to > webmachine, and webmachine connects to riak via the erlang pcb client.  So, > if we use links, and we want the client to be aware of the relationships, we > would

Re: Getting all the Keys

2011-01-22 Thread Justin Sheehy
On Sat, Jan 22, 2011 at 3:18 PM, Alexander Sicular wrote: > I'll drop a phat tangent and just mention that I watched @rk's talk at Qcon > SF 2010 the other day and am kinda crushing on how they implemented > distributed counters in cassandra (mainlined in 0.7.1 me thinks) which, > imho, is so cho

Re: What is the advantage of using luwak?

2011-01-25 Thread Justin Sheehy
Yes, Jeremiah and Thomas have hit on the reasons. The biggest single one is for streaming up a relatively static but very large content object such as a video or a virtual machine image. Luwak handles very large content much more nicely than if you stored that same content in plain riak objects.

Re: no access logs by default?

2011-03-01 Thread Justin Sheehy
Hi, Ryan. On Tue, Mar 1, 2011 at 8:07 PM, Ryan Zezeski wrote: > Is this intentional?  It seems like odd default behavior. Most databases, including Riak, do not write to a file every time you do a GET, SELECT, or other query as appropriate. This is because the additional disk I/O of an access

Re: Riak search - Lucene

2011-03-02 Thread Justin Sheehy
Hi, Joshua. On Wed, Mar 2, 2011 at 6:26 PM, Joshua Partogi wrote: > I am trying to picture the relationship between Riak and Lucene [or > how Riak interacts with Lucene], which makes Riak search. This is a very easy relationship to picture, as there is no such interaction. :-) Riak Search doe

Re: A script to check bitcask keydir sizes

2011-03-24 Thread Justin Sheehy
Hi, Greg. On Thu, Mar 24, 2011 at 10:17 AM, Greg Nelson wrote: > Wouldn't it be the common case that > there are relatively few buckets?  And so wouldn't it save a lot of memory > to keep a reference to an interned bucket name string in each entry, instead > of the whole bucket name? One reason

Re: Riak vs riak_core

2011-03-30 Thread Justin Sheehy
Hi, Mike. On Wed, Mar 30, 2011 at 5:46 PM, Mike Oxford wrote: > I thought I understood Riak, then I ran across the fact that riak_core was > split out separately. > When would you use riak_core that you wouldn't use Riak? Good question. Riak Core is the distributed systems center that Riak is

Re: Load question

2011-04-12 Thread Justin Sheehy
Hi, Runar. On Tue, Apr 12, 2011 at 3:22 AM, Runar Jordahl wrote: > It would be helpful if a wiki page (under Best Practices) was created > to discuss various load balance configurations. I am also wondering if > a Riak client could use strategy (2), like Dynamo clients can. There is not current

Re: Bitcask vs innostore, again

2011-04-28 Thread Justin Sheehy
Hi, Dmitry. I will try to reply to some of the questions you raised about bitcask. On Thu, Apr 7, 2011 at 12:30 AM, Dmitry Demeshchuk wrote: > Now being considered as the main Riak storage. It's not just being considered, it is the main Riak storage. We are very confident in bitcask's quality

Re: A function as an input for map/reduce

2011-05-05 Thread Justin Sheehy
Hi, Mikhail. On Tue, May 3, 2011 at 5:55 PM, Mikhail Sobolev wrote: > Is there more information about "it can through a few keys at a time, >   and the map/reduce chain would go ahead and start doing the >   processing on whatever keys it gets as soon as it gets them, it does >   not have to w

Re: A function as an input for map/reduce

2011-05-07 Thread Justin Sheehy
Hi, Mikhail. On Thu, May 5, 2011 at 5:15 PM, Mikhail Sobolev wrote: > Thank you for the description.  I now wonder if it's possible for a > map-function instead of returning the whole list of results, do > something that Riak would take as "ah! another map result, let's do pass > it to the next

Re: Make Riak use different Erlang version than that included in the .deb package

2011-05-12 Thread Justin Sheehy
Hi, Jeremy. If you build Riak from source, you'll end up with Riak using the version of Erlang that you used to build it. With a pre-packaged version, it will use the Erlang that was used to make the packages. Riak will be moving to a newer Erlang in upcoming releases, by the way. -Justin On

Re: Production Backup Strategies

2011-05-13 Thread Justin Sheehy
Hi, Mike. Assuming that the cluster is using the default storage engine (bitcask) then the backup story is straightforward. Bitcask only ever appends to files, and never re-opens a file for writing after it is closed. This means that your favorite existing server filesystem backup mechanism will

Re: Production Backup Strategies

2011-05-16 Thread Justin Sheehy
Hi, Jeremy. On Sat, May 14, 2011 at 2:45 PM, Jeremy Raymond wrote: > So just backing up the files from separate nodes works? There won't be > inconsistencies in the data say if all the nodes had to be restored? That's right, it works. :-) Inconsistencies due to modifications that occur betwee

Re: Issues with capacity planning pages on wiki

2011-05-25 Thread Justin Sheehy
Hi, Anthony. There are really three different things below: 1- reducing the minimum overhead of the {Bucket, Key} encoding when riak is storing into bitcask 2- reducing the size of the vector clock encoding 3- reducing the size of the overall riak_object structure and metadata All three of the

Re: Riak doesn't use consistent hashing.

2011-05-26 Thread Justin Sheehy
Hi, Greg. Thanks for your thoughtful analysis and the pull request. On Thu, May 26, 2011 at 1:54 AM, Greg Nelson wrote: > However, the skipping bit isn't part of > Riak's preflist calculation.  Instead, nodes claim partitions in such a way > as to be spaced out by target_n_val, to obviate the n

Re: riak locking and out of memory

2011-05-26 Thread Justin Sheehy
Hi, Ron. On Thu, May 26, 2011 at 4:33 PM, Ron Yang wrote: > On the macbook I looped across 400meg files using bash and curl to > upload them as documents into a bucket: There are other details in your post that I might comment on, but I will focus on the main point. What you describe here simp

Re: A script to check bitcask keydir sizes

2011-06-08 Thread Justin Sheehy
On Thu, Mar 24, 2011 at 1:51 PM, Nico Meyer wrote: > The bigger concern for me would be the way the bucket/key tuple is > serialized: > > Eshell V5.8  (abort with ^G) > 1> iolist_size(term_to_binary({<<>>,<<>>})). > 13 > > That's 13 bytes of overhead per key were only 2 bytes is needed with > rea

Re: Pruning (merging) after storage reaches a certain size?

2011-06-08 Thread Justin Sheehy
Hi, Steve. Check out this page: http://wiki.basho.com/Bitcask-Configuration.html#Disk-Usage-and-Merging-Settings Basically, a "merge trigger" must be met in order to have the merge process occur. When it does occur, it will affect all existing files that meet a "merge threshold." One note th

Re: Pruning (merging) after storage reaches a certain size?

2011-06-13 Thread Justin Sheehy
Hi, Steve. The key to your situation was in my earlier email: One note that is relevant for your specific use: the expiry_secs parameter will cause a given item to disappear from the client API immediately after expiry, and to be cleaned if it is in a file already being merged, bu

Re: Riak Ruby Client Thread Safe?

2011-06-15 Thread Justin Sheehy
Hi, Keith. It is not safe to share a single Riak client instance across multiple client-facing threads. Riak's conflict detection mechanisms will be misled by that sort of sharing. Luckily, the client is quite lightweight so you shouldn't have to worry about the cost of doing it right. -Justin

Re: Benchmarks of backends

2011-06-21 Thread Justin Sheehy
Hi, Anthony. Most people using Riak today use either Bitcask or Innostore, as I suspect you know. Bitcask has excellent performance, but the limitation that you are aware of with a hard limit on number of keys per unit of available RAM. Innostore does not have that limitation, but is much harde

Re: LevelDB driver

2011-07-04 Thread Justin Sheehy
Hi, Jonathan. On Mon, Jul 4, 2011 at 9:42 AM, Jonathan Langevin wrote: > I've seen users show concern of Bitcask's space usage overhead. How does that > compare against LevelDB? Bitcask doesn't have much in the way of disk space "overhead" unless you mean that the space used by deleted or overw

Re: LevelDB driver

2011-07-04 Thread Justin Sheehy
On Mon, Jul 4, 2011 at 10:33 AM, Jonathan Langevin wrote: > Thanks Justin for the helpful response :-) Happy to help. > Can you define what you would consider "huge" regarding # keys? A bit depends on the details (such as key size) but generally the tipping point is somewhere near ten million

Re: LevelDB driver

2011-07-12 Thread Justin Sheehy
Hi, Phil. I might have caused a little confusion. I mentioned, but perhaps didn't sufficiently emphasize, that the benchmark comparing LevelDB to InnoDB was not a benchmark of Riak at all, but just directly talking to the storage engines in order to look at the feasibility of doing more with Level

Re: How much memory for 20GB of data?

2011-07-14 Thread Justin Sheehy
Hi, Maria. In addition to what others have said, I would note that (at least) the following issues matter quite a bit for such planning: - how many items the data is broken up into - how large the keys will be (especially if they are very large due to embedded structure) - what storage engine ("b

Re: How much memory for 20GB of data?

2011-07-14 Thread Justin Sheehy
iously > not enough, because only 700 records were inserted. So I thought > mybe 150GB should be enough? > > Cheers, > Maria > > 2011/7/15 Justin Sheehy : >> Hi, Maria. >> >> In addition to what others have said, I would note that (at least) the >>

Re: Connection Pool with Erlang PB Client Necessary?

2011-07-26 Thread Justin Sheehy
The simplest guidance on client IDs that I can give: If two mutation (PUT) operations could occur concurrently or without awareness of each other, then they should have different client IDs. As a result of the above: if you are sharing a connection, then you should use a different client ID for e

Re: Connection Pool with Erlang PB Client Necessary?

2011-07-26 Thread Justin Sheehy
, > each time it's checked out, a new client id is created. > > Does this sound reasonable and in line with proper client id usage? > > Thanks again! > > Andrew > > > On Tue, Jul 26, 2011 at 11:55 AM, Justin Sheehy wrote: >> The simplest guidance on client ID

Re: riak_core questions

2011-07-28 Thread Justin Sheehy
Hi, Dmitry. A couple of suggestions... The reason that you're not seeing an easy way to automatically have nodes be added or removed from the cluster upon going down or coming up is that we recommend strongly against such behavior. The idea is that intentional (administrative) outages are very

Re: riak_core questions

2011-07-28 Thread Justin Sheehy
Hi, Dmitry. On Thu, Jul 28, 2011 at 12:22 PM, Dmitry Demeshchuk wrote: > By master node, I mean the one that is used when we are joining new > nodes using riak-admin (as far as I remember, only one node can be > used for this). You can use any node at all in the existing cluster for this purpos

Re: Getting a value: get vs map

2011-07-29 Thread Justin Sheehy
Jeremiah, You were essentially correct. A "targeted" MR does not have to search for the data, and does not slow down with database size. It is a bucket-sweeping MR that currently has that behavior. -Justin On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka wrote: > I would have suspected that

Re: how can I trigger a manual merge?

2011-07-30 Thread Justin Sheehy
A direct call to bitcask:merge could force all of the files to be processed, including the removal of expired entries. That won't happen under normal Riak operation as none of the triggers will be passed by your use, but you could certainly write a script to do it directly. -Justin On Fri, Jul

Re: Understanding put if_not_modified.

2011-09-18 Thread Justin Sheehy
Hi, Igor. Riak (quite intentionally, for availability reasons) does not provide any sort of global transactions or user-exposed locking. One result of this is that you can't do exactly what you tried -- or least not that simply. You might be interested in https://github.com/mochi/statebox -Ju

Re: automatically expiring keys with LevelDB?

2011-10-21 Thread Justin Sheehy
On Oct 21, 2011, at 4:22 PM, Nate Lawson wrote: > I know Bitcask has the expiry_secs option for expiring keys, but what about > LevelDB? We're thinking of using Luwak as a file cache frontend to S3, and it > would be nice for older entries to be deleted in LRU order as we store newer > files.

  1   2   >