Sent from my iPhone
> On 13 Mar 2017, at 19:30, Dimitri Racordon via swift-evolution > <swift-evolution@swift.org> wrote: > > While I understand and agree with the goal of the proposal, I too feel like > it’s more difficult to understand than the current approach. HashVisitable > looks odd, and one would have to understand the role and API of the Hasher > protocol. > > I would argue that if Swift provided a good default hashing API, then we > could simply call it when computing the hashValue. > > struct GridPoint: Hashable { > var x: Int > var y: Int > var hashValue: Int { > let hasher = Swift.DefaultHasher() > hasher.hash(self.x) > hasher.hash(self.y) > return hasher.value > } > static func == (lhs: GridPoint, rhs: GridPoint) -> Bool { /* … */ } > } > > It might seem wordy, but it would allow one to leverage good hashing > algorithms, while not forcing people who do have a good hashing function for > their particular type to define their own hasher. This is exactly what this proposal is proposing, with a slightly different API that allows you to more easily handle subtypes. What in the current proposal is more complicated to you than what you propose? > On an unrelated note, maybe I missed something but I think HashVisitable > would still need to conform to Equatable as well, if it were to replace > Hashable. This is unclear in your proposal, as the equality operator is > present in the motivational example, but absent from the proposed solution. > >> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution >> <swift-evolution@swift.org> wrote: >> >> I’m dumb when it comes to proper hashing, but it’s such a tediously common >> thing in my experience to need to add Hashable to some kind of a struct so I >> can stash it in a set or use it as a dictionary key. Is there really no way >> to make this all more automatic? I have to be honest - this proposal seems >> *harder* to understand than the way it works now. Of course the easiest >> would be if the language could just do this “good enough" for me using >> reflection or whatever and if I really did run into a problem where I wanted >> to do this myself, I could override something. >> >> Perfect is the enemy of good. >> >> l8r >> Sean >> >> >>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution >>> <swift-evolution@swift.org> wrote: >>> >>> Hi there, >>> >>> I've written up a proposal for an overhaul/replacement of the Hashable >>> protocol and would love to hear your feedback on it! >>> >>> Rendered | Blog Post >>> >>> Cheers, >>> Vincent >>> >>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial >>> feedback on this proposal. 👍 >>> >>> HashVisitable >>> >>> • Proposal: SE-NNNN >>> • Authors: Vincent Esche >>> • Review Manager: TBD >>> • Status: Awaiting review >>> Introduction >>> >>> Replace the Hashable protocol by two new procotols (Hasher and >>> HashVisitable) to improve safety, versatility and learnability. >>> >>> Motivation >>> >>> Implementing Hashable is difficult and the consequences if not done well >>> have dire performance and safety repercussions. >>> >>> The documentation of Hashable lists a sample implementation of var >>> hashValue: >>> >>> /// A point in an x-y coordinate system. >>> struct GridPoint { >>> var x: Int >>> var y: Int >>> } >>> >>> extension GridPoint: Hashable { >>> var hashValue: Int { >>> return x.hashValue ^ y.hashValue >>> } >>> >>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool { >>> return lhs.x == rhs.x && lhs.y == rhs.y >>> } >>> } >>> >>> Calculating the hashes of all GridPoints (given the above implementation) >>> on a 1000 × 1000 grid … >>> >>> let (width, height) = (1000, 1000) >>> let total = width * height >>> var hashes = Set<Int>() >>> for x in 0..<width { >>> for y in 0..<height { >>> hashes.insert(GridPoint(x: x, y: y).hashValue) >>> } >>> } >>> print("\(hashes.count) unique hashes out of a total of \(total).") >>> >>> … results in just 1024 unique hash values for 1_000_000 unique values. >>> >>> In other words: The recommended implementation causes 99.9% of values to >>> trigger a hash collision. >>> >>> Out of those 1_000_000 values the median collision count was 976 with min >>> and max being 976 and 1000respectively. >>> >>> The collision rate will have negative impact in algorithms which heavily >>> use hashValue like the ones in Dictionaryand Set. Furthermore, it increases >>> the vulnerability to DDOS attacks when exposed to the web. >>> >>> If even the official Swift documentation gets the implementation of >>> hashValue wrong, then who is to expect the average Swift programmer to do >>> any better? >>> >>> In contrast, running the same snippet using HashVisitable and the >>> semi-secure Fnv1aHash (see below) results in zero collisions! >>> >>> Finally, the design of the Hashable protocol forces the use of one >>> implementation without the possibility of switching between multiple >>> hashing algorithms. >>> >>> Proposed solution >>> >>> Instead of coupling the hashing algorithm with each and every Swift type, >>> we should provide a hashing API based on the visitor-pattern. By freeing >>> application developers from the burden of having to implement hashing >>> algorithms, the Standard Library can provide default ones which fulfill >>> collision, performance and security goals. Furthermore, it would allow >>> developers to swap to different algorithms based on the use case. >>> >>> Detailed design >>> >>> The proposal deprecates the Hashable protocol and introduces the following >>> two: >>> >>> protocol Hasher >>> { >>> >>> mutating func finish() -> Int >>> >>> >>> mutating func write(bytes >>> : UnsafeRawBufferPointer) >>> } >>> >>> >>> protocol HashVisitable >>> { >>> >>> func hash<H: Hasher>(_ hasher: inout >>> H) >>> } >>> >>> Hasher is the protocol which represents a hashing algorithm, and >>> HashVisitable replaces Hashable. For types entirely represented by their >>> memory layout, the following protocol would provide a default >>> implementation: >>> >>> protocol ContiguouslyHashable: HashVisitable {} >>> >>> extension ContiguouslyHashable { >>> func hash<H: Hasher>(_ hasher: inout H) { >>> var mutableSelf = self >>> try! Swift.withUnsafeBytes(of: &mutableSelf) { >>> hasher.write(bytes: $0) >>> } >>> } >>> } >>> >>> extension Bool : ContiguouslyHashable {} >>> extension UInt8 : ContiguouslyHashable {} >>> extension UInt16 : ContiguouslyHashable {} >>> extension UInt32 : ContiguouslyHashable {} >>> extension UInt64 : ContiguouslyHashable {} >>> extension UInt : ContiguouslyHashable {} >>> extension Int8 : ContiguouslyHashable {} >>> extension Int16 : ContiguouslyHashable {} >>> extension Int32 : ContiguouslyHashable {} >>> extension Int64 : ContiguouslyHashable {} >>> extension Int : ContiguouslyHashable {} >>> >>> The Standard-Library would then provide a set of hashing implementations >>> specific to each purpose. A possible choice for hashing algorithms would be >>> the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8. >>> >>> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, >>> which – for the sake of demonstration – could be implemented as follows: >>> >>> struct Fnv1aHash >>> { >>> >>> fileprivate var state: UInt >>> >>> >>> init(seed: UInt >>> ) { >>> >>> self.state = seed &+ 14695981039346656037 >>> >>> } >>> } >>> >>> >>> extension Fnv1aHash: Hasher >>> { >>> >>> mutating func write(bytes >>> : UnsafeRawBufferPointer) { >>> >>> for byte in >>> bytes { >>> >>> self.state = (self.state ^ UInt(byte)) &* 1099511628211 >>> >>> } >>> } >>> >>> mutating func finish() -> Int >>> { >>> >>> return unsafeBitCast(self.state, to: Int.self >>> ) >>> } >>> } >>> >>> Coming back to the sample code present in the Hashable documentation, the >>> new implementation would look like: >>> >>> extension GridPoint: HashVisitable >>> { >>> >>> func hash<H: Hasher>(_ hasher: inout >>> H) { >>> >>> self.x.hash(& >>> hasher) >>> >>> self.y.hash(& >>> hasher) >>> } >>> } >>> >>> Source compatibility >>> >>> Making use of "extending protocols to conform to protocols": >>> >>> extension Hashable: HashVisitable >>> { >>> >>> func hash<H: Hasher>(_ hasher: inout >>> H) { >>> >>> self.hashValue.hash(& >>> hasher) >>> } >>> } >>> >>> Effect on ABI stability >>> >>> n/a >>> >>> Effect on API resilience >>> >>> This feature should be possible to add/remove without breaking ABI. >>> >>> Alternatives considered >>> >>> n/a >>> _______________________________________________ >>> swift-evolution mailing list >>> swift-evolution@swift.org >>> https://lists.swift.org/mailman/listinfo/swift-evolution >> >> _______________________________________________ >> swift-evolution mailing list >> swift-evolution@swift.org >> https://lists.swift.org/mailman/listinfo/swift-evolution > > _______________________________________________ > swift-evolution mailing list > swift-evolution@swift.org > https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution