Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

Geoffrey Garen Thu, 20 Nov 2014 11:02:22 -0800

Yes, good point. I should have been more specific. I believe this math might 
overflow and then truncate:


template<typename T, unsigned inlineCapacity, typename OverflowHandler>
bool Vector<T, inlineCapacity, OverflowHandler>::tryExpandCapacity(unsigned 
newMinCapacity)
{
    return tryReserveCapacity(std::max(newMinCapacity, std::max(16u, 
static_cast<unsigned>(capacity() + capacity() / 4 + 1))));
}

Geoff

> On Nov 19, 2014, at 2:04 PM, Chris Dumez <[email protected]> wrote:
> 
> Hi,
> 
> I believe the Vector class is already checking for overflows. I looked 
> quickly and it seems that when trying to allocate / reallocate the Vector 
> buffer, we will call CRASH() if:
> (newCapacity > std::numeric_limits<unsigned>::max() / sizeof(T))
> 
> Kr, 
> --
> Chris Dumez - Apple Inc.
> Cupertino, CA
> 
> 
> 
> 
>> On Nov 19, 2014, at 1:57 PM, Geoffrey Garen <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> I don’t haver a specific opinion on size_t vs unsigned, but I have a related 
>> point:
>> 
>> If Vector uses unsigned, then it must RELEASE_ASSERT that it does not 
>> overflow unsigned when growing. 
>> 
>> Even if most normal content allocates < 4GB, exploit content surely will try 
>> to allocate > 4GB.
>> 
>> Chris, maybe you can make this change in trunk, since trunk uses unsigned?
>> 
>> Geoff
>> 
>>> On Nov 19, 2014, at 1:50 PM, Alexey Proskuryakov <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> 
>>> That makes good sense for internal implementation, do you think that class 
>>> API should also use a custom type, or should it use size_t?
>>> 
>>> With Vector though, I don't know how we would differentiate code paths that 
>>> need large allocations from ones that don't. Nearly anything that is 
>>> exposed as a JS API or deals with external world can hit sizes over 4Gb. 
>>> That's not out of reach in most scenarios, not even for resources loaded 
>>> from network.
>>> 
>>> - Alexey
>>> 
>>> 
>>> 19 нояб. 2014 г., в 13:19, Filip Pizlo <[email protected] 
>>> <mailto:[email protected]>> написал(а):
>>> 
>>>> ArrayBuffers are very special because they are part of ECMAScript.
>>>> 
>>>> We use unsigned for the length, because once upon a time, that would have 
>>>> been the right type; these days the right type would be a 53-bit integer.  
>>>> So, size_t would be wrong.  I believe that ArrayBuffers should be changed 
>>>> to use a very special size type; probably it wouldn't even be a primitive 
>>>> type but a class that carefully checked that you never overflowed 53 bits.
>>>> 
>>>> -Filip
>>>> 
>>>> 
>>>>> On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> 
>>>>> This is not exactly about Vector, but if one uses 
>>>>> FileReader.prototype.readAsArrayBuffer() on a large file, I think that it 
>>>>> overflows ArrayBuffer. WebKit actually crashes when uploading 
>>>>> multi-gigabyte files to YouTube, Google Drive and other similar services, 
>>>>> although I haven't checked whether it's because of ArrayBuffers, or 
>>>>> because of a use of int/unsigned in another code path.
>>>>> 
>>>>> I think that we should use size_t everywhere except for perhaps a few key 
>>>>> places where memory impact is critical (and of course we need explicit 
>>>>> checks when casting down to an unsigned). Or perhaps the rule can be even 
>>>>> simpler, and unsigned may never be used for indices and sizes, period.
>>>>> 
>>>>> - Alexey
>>>>> 
>>>>> 
>>>>> 19 нояб. 2014 г., в 12:32, Filip Pizlo <[email protected] 
>>>>> <mailto:[email protected]>> написал(а):
>>>>> 
>>>>>> Whatever we do, the clients of Vector should be consistent about what 
>>>>>> type they use.  I'm actually fine with Vector internally using unsigned 
>>>>>> even if the API uses size_t, but right now we have lots of code that 
>>>>>> uses a mix of size_t and unsigned when indexing into Vectors.  That's 
>>>>>> confusing.
>>>>>> 
>>>>>> If I picked one type to use for all Vector indices, it would be unsigned 
>>>>>> rather than size_t.  Vector being limited to unsigned doesn't imply 4GB 
>>>>>> unless you do Vector<char>.  Usually we have Vectors of pointer-width 
>>>>>> things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)).  
>>>>>> Even in a world where we had more than 32GB of memory to use within a 
>>>>>> single web process, I would hope that we wouldn't use it all on a single 
>>>>>> Vector and that if we did, we would treat that one specially for a bunch 
>>>>>> of other sensible reasons (among them being that allocating a contiguous 
>>>>>> slab of virtual memory of that size is rather taxing).  So, size_t would 
>>>>>> buy us almost nothing since if we had a vector grow large enough to need 
>>>>>> it, we would be having a bad time already.
>>>>>> 
>>>>>> I wonder: are there cases that anyone remembers where we have tried to 
>>>>>> use Vector to store more than UINT_MAX things?  Another interesting 
>>>>>> question is: What's the largest number of things that we store into any 
>>>>>> Vector?  We could use such a metric to project how big Vectors might get 
>>>>>> in the future.
>>>>>> 
>>>>>> -Filip
>>>>>> 
>>>>>> 
>>>>>>> On Nov 19, 2014, at 12:20 PM, Chris Dumez <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I recently started updating the WTF::Vector API to use unsigned types 
>>>>>>> instead of size_t [1][2], because:
>>>>>>> - WTF::Vector is already using unsigned type for size/capacity 
>>>>>>> internally to save memory on 64-bit, causing a mismatch between the API 
>>>>>>> and the internal representation [3]
>>>>>>> - Some reviewers have asked me to use unsigned for loop counters 
>>>>>>> iterating over vectors (which looks unsafe as the Vector API, e.g. 
>>>>>>> size(), returns a size_t).
>>>>>>> - I heard from Joseph that this type mismatch is forcing us (and other 
>>>>>>> projects using WTF) to disable some build time warnings
>>>>>>> - The few people I talked to before making that change said we should 
>>>>>>> do it
>>>>>>> 
>>>>>>> However, Alexey recently raised concerns about this change. it doesn't 
>>>>>>> "strike him as the right direction. 4Gb is not much, and we should have 
>>>>>>> more of WebKit work with the right data types, not less.”.
>>>>>>> I did not initially realize that this change was controversial, but now 
>>>>>>> that it seems it is, I thought I would raise the question on webkit-dev 
>>>>>>> to see what people think about this.
>>>>>>> 
>>>>>>> Kr,
>>>>>>> --
>>>>>>> Chris Dumez - Apple Inc.
>>>>>>> Cupertino, CA
>>>>>>> 
>>>>>>> 
>>>>>>> [1] http://trac.webkit.org/changeset/176275 
>>>>>>> <http://trac.webkit.org/changeset/176275>
>>>>>>> [2] http://trac.webkit.org/changeset/176293 
>>>>>>> <http://trac.webkit.org/changeset/176293>
>>>>>>> [3] http://trac.webkit.org/changeset/148891 
>>>>>>> <http://trac.webkit.org/changeset/148891>
>>>>>>> _______________________________________________
>>>>>>> webkit-dev mailing list
>>>>>>> [email protected] <mailto:[email protected]>
>>>>>>> https://lists.webkit.org/mailman/listinfo/webkit-dev 
>>>>>>> <https://lists.webkit.org/mailman/listinfo/webkit-dev>
>>>>>> 
>>>>>> _______________________________________________
>>>>>> webkit-dev mailing list
>>>>>> [email protected] <mailto:[email protected]>
>>>>>> https://lists.webkit.org/mailman/listinfo/webkit-dev 
>>>>>> <https://lists.webkit.org/mailman/listinfo/webkit-dev>
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> webkit-dev mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://lists.webkit.org/mailman/listinfo/webkit-dev 
>>> <https://lists.webkit.org/mailman/listinfo/webkit-dev>
>> 
>> _______________________________________________
>> webkit-dev mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.webkit.org/mailman/listinfo/webkit-dev
>

_______________________________________________
webkit-dev mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

Reply via email to