Sent from my iPhone

> On 13 Mar 2017, at 19:30, Dimitri Racordon via swift-evolution 
> <swift-evolution@swift.org> wrote:
> 
> While I understand and agree with the goal of the proposal, I too feel like 
> it’s more difficult to understand than the current approach. HashVisitable 
> looks odd, and one would have to understand the role and API of the Hasher 
> protocol.
> 
> I would argue that if Swift provided a good default hashing API, then we 
> could simply call it when computing the hashValue.
> 
> struct GridPoint: Hashable {
>   var x: Int
>   var y: Int
>   var hashValue: Int {
>     let hasher = Swift.DefaultHasher()
>     hasher.hash(self.x)
>     hasher.hash(self.y)
>     return hasher.value
>   }
>   static func == (lhs: GridPoint, rhs: GridPoint) -> Bool { /* … */ }
> }
> 
> It might seem wordy, but it would allow one to leverage good hashing 
> algorithms, while not forcing people who do have a good hashing function for 
> their particular type to define their own hasher.

This is exactly what this proposal is proposing, with a slightly different API 
that allows you to more easily handle subtypes.

What in the current proposal is more complicated to you than what you propose?

> On an unrelated note, maybe I missed something but I think HashVisitable 
> would still need to conform to Equatable as well, if it were to replace 
> Hashable. This is unclear in your proposal, as the equality operator is 
> present in the motivational example, but absent from the proposed solution.
> 
>> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution 
>> <swift-evolution@swift.org> wrote:
>> 
>> I’m dumb when it comes to proper hashing, but it’s such a tediously common 
>> thing in my experience to need to add Hashable to some kind of a struct so I 
>> can stash it in a set or use it as a dictionary key. Is there really no way 
>> to make this all more automatic? I have to be honest - this proposal seems 
>> *harder* to understand than the way it works now. Of course the easiest 
>> would be if the language could just do this “good enough" for me using 
>> reflection or whatever and if I really did run into a problem where I wanted 
>> to do this myself, I could override something.
>> 
>> Perfect is the enemy of good.
>> 
>> l8r
>> Sean
>> 
>> 
>>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution 
>>> <swift-evolution@swift.org> wrote:
>>> 
>>> Hi there,
>>> 
>>> I've written up a proposal for an overhaul/replacement of the Hashable 
>>> protocol and would love to hear your feedback on it!
>>> 
>>> Rendered | Blog Post
>>> 
>>> Cheers,
>>> Vincent
>>> 
>>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial 
>>> feedback on this proposal. 👍
>>> 
>>> HashVisitable
>>> 
>>> • Proposal: SE-NNNN
>>> • Authors: Vincent Esche
>>> • Review Manager: TBD
>>> • Status: Awaiting review
>>> Introduction
>>> 
>>> Replace the Hashable protocol by two new procotols (Hasher and 
>>> HashVisitable) to improve safety, versatility and learnability.
>>> 
>>> Motivation
>>> 
>>> Implementing Hashable is difficult and the consequences if not done well 
>>> have dire performance and safety repercussions.
>>> 
>>> The documentation of Hashable lists a sample implementation of var 
>>> hashValue:
>>> 
>>> /// A point in an x-y coordinate system.
>>> struct GridPoint {
>>>    var x: Int
>>>    var y: Int
>>> }
>>> 
>>> extension GridPoint: Hashable {
>>>    var hashValue: Int {
>>>        return x.hashValue ^ y.hashValue
>>>    }
>>> 
>>>    static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>>>        return lhs.x == rhs.x && lhs.y == rhs.y
>>>    }
>>> }
>>> 
>>> Calculating the hashes of all GridPoints (given the above implementation) 
>>> on a 1000 × 1000 grid …
>>> 
>>> let (width, height) = (1000, 1000)
>>> let total = width * height
>>> var hashes = Set<Int>()
>>> for x in 0..<width {
>>>    for y in 0..<height {
>>>        hashes.insert(GridPoint(x: x, y: y).hashValue)
>>>    }
>>> }
>>> print("\(hashes.count) unique hashes out of a total of \(total).")
>>> 
>>> … results in just 1024 unique hash values for 1_000_000 unique values.
>>> 
>>> In other words: The recommended implementation causes 99.9% of values to 
>>> trigger a hash collision.
>>> 
>>> Out of those 1_000_000 values the median collision count was 976 with min 
>>> and max being 976 and 1000respectively.
>>> 
>>> The collision rate will have negative impact in algorithms which heavily 
>>> use hashValue like the ones in Dictionaryand Set. Furthermore, it increases 
>>> the vulnerability to DDOS attacks when exposed to the web.
>>> 
>>> If even the official Swift documentation gets the implementation of 
>>> hashValue wrong, then who is to expect the average Swift programmer to do 
>>> any better?
>>> 
>>> In contrast, running the same snippet using HashVisitable and the 
>>> semi-secure Fnv1aHash (see below) results in zero collisions!
>>> 
>>> Finally, the design of the Hashable protocol forces the use of one 
>>> implementation without the possibility of switching between multiple 
>>> hashing algorithms.
>>> 
>>> Proposed solution
>>> 
>>> Instead of coupling the hashing algorithm with each and every Swift type, 
>>> we should provide a hashing API based on the visitor-pattern. By freeing 
>>> application developers from the burden of having to implement hashing 
>>> algorithms, the Standard Library can provide default ones which fulfill 
>>> collision, performance and security goals. Furthermore, it would allow 
>>> developers to swap to different algorithms based on the use case.
>>> 
>>> Detailed design
>>> 
>>> The proposal deprecates the Hashable protocol and introduces the following 
>>> two:
>>> 
>>> protocol Hasher
>>> {
>>> 
>>> mutating func finish() -> Int
>>> 
>>> 
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer)
>>> }
>>> 
>>> 
>>> protocol HashVisitable
>>> {
>>> 
>>> func hash<H: Hasher>(_ hasher: inout
>>> H)
>>> }
>>> 
>>> Hasher is the protocol which represents a hashing algorithm, and 
>>> HashVisitable replaces Hashable. For types entirely represented by their 
>>> memory layout, the following protocol would provide a default 
>>> implementation:
>>> 
>>> protocol ContiguouslyHashable: HashVisitable {}
>>> 
>>> extension ContiguouslyHashable {
>>>    func hash<H: Hasher>(_ hasher: inout H) {
>>>        var mutableSelf = self
>>>        try! Swift.withUnsafeBytes(of: &mutableSelf) {
>>>            hasher.write(bytes: $0)
>>>        }
>>>    }
>>> }
>>> 
>>> extension Bool : ContiguouslyHashable {}
>>> extension UInt8 : ContiguouslyHashable {}
>>> extension UInt16 : ContiguouslyHashable {}
>>> extension UInt32 : ContiguouslyHashable {}
>>> extension UInt64 : ContiguouslyHashable {}
>>> extension UInt : ContiguouslyHashable {}
>>> extension Int8 : ContiguouslyHashable {}
>>> extension Int16 : ContiguouslyHashable {}
>>> extension Int32 : ContiguouslyHashable {}
>>> extension Int64 : ContiguouslyHashable {}
>>> extension Int : ContiguouslyHashable {}
>>> 
>>> The Standard-Library would then provide a set of hashing implementations 
>>> specific to each purpose. A possible choice for hashing algorithms would be 
>>> the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.
>>> 
>>> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, 
>>> which – for the sake of demonstration – could be implemented as follows:
>>> 
>>> struct Fnv1aHash
>>> {
>>> 
>>> fileprivate var state: UInt
>>> 
>>> 
>>> init(seed: UInt
>>> ) {
>>> 
>>> self.state = seed &+ 14695981039346656037
>>> 
>>>    }
>>> }
>>> 
>>> 
>>> extension Fnv1aHash: Hasher 
>>> {
>>> 
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer) {
>>> 
>>> for byte in
>>> bytes {
>>> 
>>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
>>> 
>>>        }
>>>    }
>>> 
>>> mutating func finish() -> Int
>>> {
>>> 
>>> return unsafeBitCast(self.state, to: Int.self
>>> )
>>>    }
>>> }
>>> 
>>> Coming back to the sample code present in the Hashable documentation, the 
>>> new implementation would look like:
>>> 
>>> extension GridPoint: HashVisitable 
>>> {
>>> 
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>> 
>>> self.x.hash(&
>>> hasher)
>>> 
>>> self.y.hash(&
>>> hasher)
>>>    }
>>> }
>>> 
>>> Source compatibility
>>> 
>>> Making use of "extending protocols to conform to protocols":
>>> 
>>> extension Hashable: HashVisitable 
>>> {
>>> 
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>> 
>>> self.hashValue.hash(&
>>> hasher)
>>>    }
>>> }
>>> 
>>> Effect on ABI stability
>>> 
>>> n/a
>>> 
>>> Effect on API resilience
>>> 
>>> This feature should be possible to add/remove without breaking ABI.
>>> 
>>> Alternatives considered
>>> 
>>> n/a
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to