Re: [swift-dev] [swift-users] Collection underestimatedCount() does _?

Will Stanton via swift-dev Sat, 19 Mar 2016 15:28:56 -0700

Hello Brent,

Thanks kindly for the flair!


You gave cases for which `underestimatedCount()` is used:
> For sequences with easily-calculated counts, this should give us a size 
> that's just right. For sequences where they can kind of estimate the right 
> count (for instance, if you're decoding a fixed-size buffer of UTF-8 bytes, 
> and you know how many bytes there are but not exactly how many characters 
> they'll decode to), it will get us at least part of the way there. For 
> sequences whose size is completely unknown, it won't help but it won't hurt 
> much either.

CC’ing swift-dev, I’m wondering:
Was there a reason `underestimatedCount()` was chosen for Sequence instead of 
`estimatedCount()`. I suppose an overestimate would be bad for iterating 
`0..<estimatedCount()`, but I couldn’t find any existing loops over 
`0..<underestimatedCount()`.

Does anything depend on an underestimate of Sequence count? And if not, would 
it be better to be less restrictive when asking a Sequence for a size estimate?
Since the exact count is returned for collections, the name also isn’t very 
precise.

Regards,
Will Stanton

> On Mar 19, 2016, at 3:53 PM, Brent Royal-Gordon via swift-users 
> <swift-us...@swift.org> wrote:
> 
>>> I might have missed something since a lot of the results are for tests 
>>> <https://github.com/apple/swift/search?utf8=✓&q=underestimatedCount&type=Code>
>>> But why would `underestimatedCount` ever be used instead of `count`? Don’t 
>>> most collections have O(1) `count` anyway?
>> 
>> Hi Will,
>> 
>> This API comes from Sequence, which does not have a count that you can
>> inspect without possibly consuming the sequence.
> 
> To add some color:
> 
> A sequence merely says that it has a bunch of elements and you can walk 
> through them. It does *not* promise several important things:
> 
> * That you can get the elements in the sequence more than once
> * That it knows the number of elements in the sequence and can quickly return 
> them
> * That you can get the number of elements in the sequence without walking 
> through and counting them one by one
> 
> This adds up to an ugly conclusion: With just a sequence, it is impossible to 
> efficiently discover the count, and doing so might destroy the sequence.
> 
> That's obviously not so great. In particular, it's not so great when you want 
> to load a sequence into an array. The simple, naive way is to do this:
> 
>       // Notionally
>       struct Array<Element> {
>               init<Seq: SequenceType where Seq.Generator.Element == 
> Element>(_ seq: Seq) {
>                       self.init()
>                       
>                       for elem in seq {
>                               appendElement(elem)
>                       }
>               }
>       }
> 
> But this may resize the array several times, which would be very slow. The 
> fastest way to do that will be to pre-allocate the exact right number of 
> elements so you don't have to keep resizing the array:
> 
>       // Notionally
>       struct Array<Element> {
>               init<Seq: SequenceType where Seq.Generator.Element == 
> Element>(_ seq: Seq) {
>                       self.init()
>                       
>                       reserveCapacity(seq.count)
>                       for elem in seq {
>                               
> _appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
>                       }
>               }
>       }
> 
> But `seq.count` might consume the sequence, leaving you unable to add any 
> elements. What do we do?
> 
> Enter `underestimateCount()`. We can always call `underestimateCount()` and 
> size to whatever it says we should be. For sequences with easily-calculated 
> counts, this should give us a size that's just right. For sequences where 
> they can kind of estimate the right count (for instance, if you're decoding a 
> fixed-size buffer of UTF-8 bytes, and you know how many bytes there are but 
> not exactly how many characters they'll decode to), it will get us at least 
> part of the way there. For sequences whose size is completely unknown, it 
> won't help but it won't hurt much either.
> 
> So you do this:
> 
>       // Notionally
>       struct Array<Element> {
>               init<Seq: SequenceType where Seq.Generator.Element == 
> Element>(_ seq: Seq) {
>                       self.init()
>                       
>                       let fastCount = seq.underestimateCount()
>                       reserveCapacity(fastCount)
>                       
>                       // We'll have to pull elements out manually.
>                       let generator = seq.generate()
>                       
>                       // Load all the fast elements
>                       for _ in 0..<fastCount {
>                               let elem = generator.next()!
>                               
> _appendElementQuicklyBecauseWeAlreadyKnowWeHaveTheCapacity(elem)
>                       }
>                       
>                       // Get anything else that's left
>                       while let elem = generator.next() {
>                               appendElement(elem)
>                       }
>               }
>       }
> 
> (If you're looking at Swift 3, remember: SequenceType is now Sequence, 
> Generator is now Iterator, generate() is now makeIterator(), and 
> underestimateCount() is now underestimatedCount().)
> 
> And in fact, this is quite similar to some real source code inside Array:
> 
>       @warn_unused_result
>       internal func _copySequenceToNativeArrayBuffer<
>         S : SequenceType
>       >(source: S) -> _ContiguousArrayBuffer<S.Generator.Element> {
>         let initialCapacity = source.underestimateCount()
>         var builder =
>           
> _UnsafePartiallyInitializedContiguousArrayBuffer<S.Generator.Element>(
>             initialCapacity: initialCapacity)
>       
>         var generator = source.generate()
>       
>         // FIXME(performance): use _initializeTo().
>       
>         // Add elements up to the initial capacity without checking for 
> regrowth.
>         for _ in 0..<initialCapacity {
>           builder.addWithExistingCapacity(generator.next()!)
>         }
>       
>         // Add remaining elements, if any.
>         while let element = generator.next() {
>           builder.add(element)
>         }
>       
>         return builder.finish()
>       }
> 
> Now, collections *do* promise that you can walk through a sequence more than 
> once, and some collections (ones that promise "random access") promise that 
> `count` can be calculated quickly. But because CollectionType conforms to 
> SequenceType, some code may receive a collection but only know it's a 
> sequence. Fortunately, the standard library provides a default implementation 
> of `underestimateCount` for collections:
> 
>         public func underestimateCount() -> Int {
>           return numericCast(count)
>         }
> 
> So you just don't have to worry about this on collections. Implement `count` 
> and `understimateCount` will take care of itself.
> 
> HTH,
> -- 
> Brent Royal-Gordon
> Architechies

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Re: [swift-dev] [swift-users] Collection underestimatedCount() does _?

Reply via email to