Re: Erasure Coding

Justin Stottlemyer Sat, 04 Jan 2014 16:24:14 -0800

Using a commercial Cauchy Reed Solomon I get upwards of 600MB/s on a high end 
processor and 60MB/s on a really low end 4+ year old intel.


This is using a 26 piece file but testing hasn't shown significant degradation 
even with a lot of pieces, though I admittedly haven't tested to 100. 

I would say that you should expect close to 30-60 locally, with a decent 
mechanism, but of course you'll need to account for more latency with remote 
nodes ( unless you specifically mean just the decode once you have the parts).

-- Justin 

Typos by iPhone

On Jan 4, 2014, at 2:22 PM, David Vorick <david.vor...@gmail.com> wrote:

> I've been looking at different options for erasure coding, spinal codes seem 
> to slow and LT Codes don't seem to be effective against an intelligent 
> attacker (someone who gets to choose which nodes go offline).
> 
> Which essentially leaves us with Reed-Solomon codes.
> 
> If I have a file coded (using Reed-Solomon) into ~100 pieces, what is a 
> reasonable decoding speed? Could I expect to get over 10mbps on a standard 
> consumer processor?
> 
> 
> On Sun, Dec 1, 2013 at 4:37 PM, David Vorick <david.vor...@gmail.com> wrote:
>> Thanks Dirk, I'll be sure to check all those out as well. Haven't yet heard 
>> of spinal codes.
>> 
>> Natanael, all of the mining is based on the amount of storage that you are 
>> contributing. If you are hosting 100 nodes each with 10GB, you will mine the 
>> same amount as if you had just one node with 1TB. The only way you could 
>> mine extra credits is if you could convince the system that you are hosting 
>> more storage than you are actually hosting.
>> 
>> 
>> On Sun, Dec 1, 2013 at 2:40 PM, <jason.john...@p7n.net> wrote:
>>> What if you gave them the node to use. Like they had to register for a 
>>> node. I started something like this but sort of stopped because I’m lazy.
>>> 
>>>  
>>> 
>>> From: tahoe-dev-boun...@tahoe-lafs.org 
>>> [mailto:tahoe-dev-boun...@tahoe-lafs.org] On Behalf Of Natanael
>>> Sent: Sunday, December 1, 2013 1:37 PM
>>> To: David Vorick
>>> Cc: tahoe-dev@tahoe-lafs.org
>>> Subject: Re: Fwd: Erasure Coding
>>> 
>>>  
>>> 
>>> Can't you pretend to run more nodes than you actually are running in order 
>>> to "mine" more credits? What could prevent that?
>>> 
>>> - Sent from my phone
>>> 
>>> Den 1 dec 2013 17:25 skrev "David Vorick" <david.vor...@gmail.com>:
>>> 
>>>  
>>> 
>>> ---------- Forwarded message ----------
>>> From: David Vorick <david.vor...@gmail.com>
>>> Date: Sun, Dec 1, 2013 at 11:25 AM
>>> Subject: Re: Erasure Coding
>>> To: Alex Elsayed <eternal...@gmail.com>
>>> 
>>> 
>>> Alex, thanks for those resources. I will check them out later this week.
>>> 
>>> I'm trying to create something that will function as a market for cloud 
>>> storage. People can rent out storage on the network for credit (a 
>>> cryptocurrency - not bitcoin but something heavily inspired from bitcoin 
>>> and the other altcoins) and then people who have credit (which can be 
>>> obtained by trading over an exchange, or by renting to the network) can 
>>> rent storage from the network.
>>> 
>>> So the clusters will be spread out over large distances. With RAID5 and 5 
>>> disks, the network needs to communicate 4 bits to recover each lost bit. 
>>> That's really expensive. The computational cost is not the concern, the 
>>> bandwidth cost is the concern. (though there are computational limits as 
>>> well)
>>> 
>>> When you buy storage, all of the redundancy and erasure coding happens 
>>> behind the scenes. So a network that needs 3x redundancy will be 3x as 
>>> expensive to rent storage from. To be competitive, this number should be as 
>>> low as possible. If we had Reed-Solomon and infinite bandwidth, I think we 
>>> could safely get the redundancy below 1.2. But with all the other 
>>> requirements, I'm not sure what a reasonable minimum is.
>>> 
>>> Since many people can be renting many different clusters, each machine on 
>>> the network may (will) be participating in many clusters at once (probably 
>>> in the hundreds to thousands). So the cost of handling a failure should be 
>>> fairly cheap. I don't think this requirement is as extreme as it may sound, 
>>> because if you are participating in 100 clusters each renting an average of 
>>> 50gb of storage, your overall expenses should be similar to participating 
>>> in a few clusters each renting an average of 1tb. The important part is 
>>> that you can keep up with multiple simultaneous network failures, and that 
>>> a single node is never a bottleneck in the repair process.
>>> 
>>>  
>>> 
>>> We need 100s - 1000s of machines in a single cluster for multiple reasons. 
>>> The first is that it makes the cluster roughly as stable as the network as 
>>> a whole. If you have 100 machines randomly selected from the network, and 
>>> on average 1% of the machines on the network fail per day, your cluster 
>>> shouldn't stray too far from 1% failures per day. Even more so if you have 
>>> 300 or 1000 machines. But another reason is that the network is used to 
>>> mine currency based on how much storage you are contributing to the 
>>> network. If there is some way you can trick the network into thinking you 
>>> are storing data when you aren't (or you can somehow lie about the volume), 
>>> then you've broken the network. Having many nodes in every cluster is one 
>>> of the ways cheating is prevented. (there are a few others too, but 
>>> off-topic).
>>> 
>>>  
>>> 
>>> Cluster size should be dynamic (fountain codes?) to support a cluster that 
>>> grows and shrinks in demand. Imagine if some of the files become public 
>>> (for example, youtube starts hosting videos over this network). If one 
>>> video goes viral, the bandwidth demands are going to spike and overwhelm 
>>> the network. But if the network can automatically expand and shrink as 
>>> demand changes, you may be able to solve the 'Reddit hug' problem.
>>> 
>>> And finally, machines that only need to be on some of the time gives the 
>>> network a tolerance for things like power failures, without needing to 
>>> immediately assume that the lost node is gone for good.
>>> 
>>>  
>>> 
>>>  
>>> 
>>> 
>>> _______________________________________________
>>> tahoe-dev mailing list
>>> tahoe-dev@tahoe-lafs.org
>>> https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
>>> 
> 
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev@tahoe-lafs.org
> https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev

_______________________________________________
tahoe-dev mailing list
tahoe-dev@tahoe-lafs.org
https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev

Re: Erasure Coding

Reply via email to