Hello Brent, Thanks kindly for the flair!
You gave cases for which `underestimatedCount()` is used: > For sequences with easily-calculated counts, this should give us a size > that's just right. For sequences where they can kind of estimate the right > count (for instance, if you're decoding a fixed-size buffer of UTF-8 bytes, > and you know how many bytes there are but not exactly how many characters > they'll decode to), it will get us at least part of the way there. For > sequences whose size is completely unknown, it won't help but it won't hurt > much either. CC’ing swift-dev, I’m wondering: Was there a reason `underestimatedCount()` was chosen for Sequence instead of `estimatedCount()`. I suppose an overestimate would be bad for iterating `0..<estimatedCount()`, but I couldn’t find any existing loops over `0..<underestimatedCount()`. Does anything depend on an underestimate of Sequence count? And if not, would it be better to be less restrictive when asking a Sequence for a size estimate? Since the exact count is returned for collections, the name also isn’t very precise. Regards, Will Stanton > On Mar 19, 2016, at 3:53 PM, Brent Royal-Gordon via swift-users > <swift-us...@swift.org> wrote: > >>> I might have missed something since a lot of the results are for tests >>> <https://github.com/apple/swift/search?utf8=✓&q=underestimatedCount&type=Code> >>> But why would `underestimatedCount` ever be used instead of `count`? Don’t >>> most collections have O(1) `count` anyway? >> >> Hi Will, >> >> This API comes from Sequence, which does not have a count that you can >> inspect without possibly consuming the sequence. > > To add some color: > > A sequence merely says that it has a bunch of elements and you can walk > through them. It does *not* promise several important things: > > * That you can get the elements in the sequence more than once > * That it knows the number of elements in the sequence and can quickly return > them > * That you can get the number of elements in the sequence without walking > through and counting them one by one > > This adds up to an ugly conclusion: With just a sequence, it is impossible to > efficiently discover the count, and doing so might destroy the sequence. > > That's obviously not so great. In particular, it's not so great when you want > to load a sequence into an array. The simple, naive way is to do this: > > // Notionally > struct Array<Element> { > init<Seq: SequenceType where Seq.Generator.Element == > Element>(_ seq: Seq) { > self.init() > > for elem in seq { > appendElement(elem) > } > } > } > > But this may resize the array several times, which would be very slow. The > fastest way to do that will be to pre-allocate the exact right number of > elements so you don't have to keep resizing the array: > > // Notionally > struct Array<Element> { > init<Seq: SequenceType where Seq.Generator.Element == > Element>(_ seq: Seq) { > self.init() > > reserveCapacity(seq.count) > for elem in seq { > > _appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem) > } > } > } > > But `seq.count` might consume the sequence, leaving you unable to add any > elements. What do we do? > > Enter `underestimateCount()`. We can always call `underestimateCount()` and > size to whatever it says we should be. For sequences with easily-calculated > counts, this should give us a size that's just right. For sequences where > they can kind of estimate the right count (for instance, if you're decoding a > fixed-size buffer of UTF-8 bytes, and you know how many bytes there are but > not exactly how many characters they'll decode to), it will get us at least > part of the way there. For sequences whose size is completely unknown, it > won't help but it won't hurt much either. > > So you do this: > > // Notionally > struct Array<Element> { > init<Seq: SequenceType where Seq.Generator.Element == > Element>(_ seq: Seq) { > self.init() > > let fastCount = seq.underestimateCount() > reserveCapacity(fastCount) > > // We'll have to pull elements out manually. > let generator = seq.generate() > > // Load all the fast elements > for _ in 0..<fastCount { > let elem = generator.next()! > > _appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem) > } > > // Get anything else that's left > while let elem = generator.next() { > appendElement(elem) > } > } > } > > (If you're looking at Swift 3, remember: SequenceType is now Sequence, > Generator is now Iterator, generate() is now makeIterator(), and > underestimateCount() is now underestimatedCount().) > > And in fact, this is quite similar to some real source code inside Array: > > @warn_unused_result > internal func _copySequenceToNativeArrayBuffer< > S : SequenceType > >(source: S) -> _ContiguousArrayBuffer<S.Generator.Element> { > let initialCapacity = source.underestimateCount() > var builder = > > _UnsafePartiallyInitializedContiguousArrayBuffer<S.Generator.Element>( > initialCapacity: initialCapacity) > > var generator = source.generate() > > // FIXME(performance): use _initializeTo(). > > // Add elements up to the initial capacity without checking for > regrowth. > for _ in 0..<initialCapacity { > builder.addWithExistingCapacity(generator.next()!) > } > > // Add remaining elements, if any. > while let element = generator.next() { > builder.add(element) > } > > return builder.finish() > } > > Now, collections *do* promise that you can walk through a sequence more than > once, and some collections (ones that promise "random access") promise that > `count` can be calculated quickly. But because CollectionType conforms to > SequenceType, some code may receive a collection but only know it's a > sequence. Fortunately, the standard library provides a default implementation > of `underestimateCount` for collections: > > public func underestimateCount() -> Int { > return numericCast(count) > } > > So you just don't have to worry about this on collections. Implement `count` > and `understimateCount` will take care of itself. > > HTH, > -- > Brent Royal-Gordon > Architechies _______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev