Title: [182803] trunk/Source/WebKit2
Revision
182803
Author
an...@apple.com
Date
2015-04-14 11:40:27 -0700 (Tue, 14 Apr 2015)

Log Message

Network Cache: Deduplicate body data
https://bugs.webkit.org/show_bug.cgi?id=143652

Reviewed by Darin Adler.

It is common to have cache entries with identical body data. This happens when the same resource is loaded from
a different URL (https vs http, slash vs no-slash at end, etc.). It also happens when the same URL is
referenced from different cache partitions.

We can improve disk space efficiency and use less memory by sharing identical body data between cache entries.

This patch splits the body data out from the record file. The new record file contains meta data and response
headers only. Body data is stored using the new BlobStorage interface. Files are deduplicated by computing
SHA1 hash over the data and looking for an existing blob with the same hash. If found the existing entry
is reused by creating a hard link to it.

The new disk structure looks like this:

WebKitCache/
    Version 3/
        Blobs/
            0A3C9A970ADA27FAE9BD7BC630BAD0B929C293C0
            0A6B8060BA77DF92C82A2FD7AF58F79524D8F34C
            ...
        Records/
            apple.com/
                0B8645B04E7EC78C178B7460052601C2
                0B8645B04E7EC78C178B7460052601C2-body
                0CB1A3638D1C5A09C5E3283A74FA040B
                0CB1A3638D1C5A09C5E3283A74FA040B-body
                ...

Each record file has an associated -body which is a hard link to a file in the Blobs directory.

The patch increases effective capacity by 10-20% with a typical cache. It also saves memory especially when identical
resources are used in multiple tabs.

Currently all >0 sized resources are stored as shared blobs. In future small resources should be integrated into record
files and blobs used for larger files only.

* NetworkProcess/cache/NetworkCache.cpp:
(WebKit::NetworkCache::Cache::store):
(WebKit::NetworkCache::Cache::update):

    Adopt the new storage interface.

(WebKit::NetworkCache::Cache::dumpContentsToFile):
* NetworkProcess/cache/NetworkCacheBlobStorage.cpp: Added.
(WebKit::NetworkCache::BlobStorage::BlobStorage):
(WebKit::NetworkCache::BlobStorage::synchronize):

    Compute size and delete unused files from the Blobs directory (link count == 1).

(WebKit::NetworkCache::BlobStorage::blobPath):
(WebKit::NetworkCache::BlobStorage::add):
(WebKit::NetworkCache::BlobStorage::get):

    Interface for storing and retrieving data blobs. Blobs are deduplicated on add.

(WebKit::NetworkCache::BlobStorage::remove):

    Removes the link but doesn't remove the blob even if there are no other clients. That happens on next synchronize().

(WebKit::NetworkCache::BlobStorage::shareCount):

    Checks the link count to get the number of clients.

* NetworkProcess/cache/NetworkCacheBlobStorage.h: Added.
(WebKit::NetworkCache::BlobStorage::approximateSize):
* NetworkProcess/cache/NetworkCacheCoders.cpp:
(WebKit::NetworkCache::Coder<SHA1::Digest>::encode):
(WebKit::NetworkCache::Coder<SHA1::Digest>::decode):
* NetworkProcess/cache/NetworkCacheCoders.h:
* NetworkProcess/cache/NetworkCacheData.h:
(WebKit::NetworkCache::Data::isEmpty):
* NetworkProcess/cache/NetworkCacheDataCocoa.mm:
(WebKit::NetworkCache::Data::empty):
(WebKit::NetworkCache::Data::fromMap):
(WebKit::NetworkCache::mapFile):
(WebKit::NetworkCache::computeSHA1):
(WebKit::NetworkCache::bytesEqual):

    Add some helpers.

* NetworkProcess/cache/NetworkCacheEntry.cpp:
(WebKit::NetworkCache::Entry::asJSON):
* NetworkProcess/cache/NetworkCacheIOChannelCocoa.mm:
(WebKit::NetworkCache::IOChannel::IOChannel):
* NetworkProcess/cache/NetworkCacheStorage.cpp:
(WebKit::NetworkCache::makeRecordDirectoryPath):
(WebKit::NetworkCache::makeBlobDirectoryPath):
(WebKit::NetworkCache::Storage::Storage):
(WebKit::NetworkCache::Storage::approximateSize):
(WebKit::NetworkCache::Storage::synchronize):
(WebKit::NetworkCache::partitionPathForKey):
(WebKit::NetworkCache::recordPathForKey):
(WebKit::NetworkCache::bodyPath):
(WebKit::NetworkCache::decodeRecordMetaData):
(WebKit::NetworkCache::decodeRecordHeader):
(WebKit::NetworkCache::createRecord):
(WebKit::NetworkCache::encodeRecordMetaData):
(WebKit::NetworkCache::encodeRecordHeader):
(WebKit::NetworkCache::Storage::remove):
(WebKit::NetworkCache::Storage::updateFileModificationTime):
(WebKit::NetworkCache::Storage::dispatchReadOperation):

    Read both the blob and the record entry.

(WebKit::NetworkCache::Storage::finishReadOperation):

    Factor to a function.

(WebKit::NetworkCache::Storage::store):
(WebKit::NetworkCache::Storage::traverse):
(WebKit::NetworkCache::Storage::dispatchPendingWriteOperations):
(WebKit::NetworkCache::Storage::dispatchWriteOperation):

    We don't need separate full write and header write paths anymore. Everything is treated
    as a full write and deduplication stops us writing the body again.

    This simplifies the code and data structures.

(WebKit::NetworkCache::Storage::finishWriteOperation):

    Factor to a function.

(WebKit::NetworkCache::Storage::clear):
(WebKit::NetworkCache::deletionProbability):

    Take the sharing count into account when computing deletion probability.
    It is less useful to delete a record that shares its body with others as data won't get deleted.

(WebKit::NetworkCache::Storage::shrinkIfNeeded):
(WebKit::NetworkCache::Storage::shrink):
(WebKit::NetworkCache::Storage::deleteOldVersions):
(WebKit::NetworkCache::directoryPathForKey): Deleted.
(WebKit::NetworkCache::filePathForKey): Deleted.
(WebKit::NetworkCache::openFileForKey): Deleted.
(WebKit::NetworkCache::decodeRecord): Deleted.
(WebKit::NetworkCache::Storage::update): Deleted.

    No need for separate update interface anymore. Regular store() avoids unnecessary body write.

(WebKit::NetworkCache::Storage::dispatchFullWriteOperation): Deleted.
(WebKit::NetworkCache::Storage::dispatchHeaderWriteOperation): Deleted.
* NetworkProcess/cache/NetworkCacheStorage.h:
* WebKit2.xcodeproj/project.pbxproj:

Modified Paths

Added Paths

Diff

Modified: trunk/Source/WebKit2/ChangeLog (182802 => 182803)


--- trunk/Source/WebKit2/ChangeLog	2015-04-14 17:57:36 UTC (rev 182802)
+++ trunk/Source/WebKit2/ChangeLog	2015-04-14 18:40:27 UTC (rev 182803)
@@ -1,3 +1,153 @@
+2015-04-14  Antti Koivisto  <an...@apple.com>
+
+        Network Cache: Deduplicate body data
+        https://bugs.webkit.org/show_bug.cgi?id=143652
+
+        Reviewed by Darin Adler.
+
+        It is common to have cache entries with identical body data. This happens when the same resource is loaded from
+        a different URL (https vs http, slash vs no-slash at end, etc.). It also happens when the same URL is
+        referenced from different cache partitions.
+
+        We can improve disk space efficiency and use less memory by sharing identical body data between cache entries.
+
+        This patch splits the body data out from the record file. The new record file contains meta data and response
+        headers only. Body data is stored using the new BlobStorage interface. Files are deduplicated by computing
+        SHA1 hash over the data and looking for an existing blob with the same hash. If found the existing entry
+        is reused by creating a hard link to it.
+
+        The new disk structure looks like this:
+
+        WebKitCache/
+            Version 3/
+                Blobs/
+                    0A3C9A970ADA27FAE9BD7BC630BAD0B929C293C0
+                    0A6B8060BA77DF92C82A2FD7AF58F79524D8F34C
+                    ...
+                Records/
+                    apple.com/
+                        0B8645B04E7EC78C178B7460052601C2
+                        0B8645B04E7EC78C178B7460052601C2-body
+                        0CB1A3638D1C5A09C5E3283A74FA040B
+                        0CB1A3638D1C5A09C5E3283A74FA040B-body
+                        ...
+
+        Each record file has an associated -body which is a hard link to a file in the Blobs directory.
+
+        The patch increases effective capacity by 10-20% with a typical cache. It also saves memory especially when identical
+        resources are used in multiple tabs.
+
+        Currently all >0 sized resources are stored as shared blobs. In future small resources should be integrated into record
+        files and blobs used for larger files only.
+
+        * NetworkProcess/cache/NetworkCache.cpp:
+        (WebKit::NetworkCache::Cache::store):
+        (WebKit::NetworkCache::Cache::update):
+
+            Adopt the new storage interface.
+
+        (WebKit::NetworkCache::Cache::dumpContentsToFile):
+        * NetworkProcess/cache/NetworkCacheBlobStorage.cpp: Added.
+        (WebKit::NetworkCache::BlobStorage::BlobStorage):
+        (WebKit::NetworkCache::BlobStorage::synchronize):
+
+            Compute size and delete unused files from the Blobs directory (link count == 1).
+
+        (WebKit::NetworkCache::BlobStorage::blobPath):
+        (WebKit::NetworkCache::BlobStorage::add):
+        (WebKit::NetworkCache::BlobStorage::get):
+
+            Interface for storing and retrieving data blobs. Blobs are deduplicated on add.
+
+        (WebKit::NetworkCache::BlobStorage::remove):
+
+            Removes the link but doesn't remove the blob even if there are no other clients. That happens on next synchronize().
+
+        (WebKit::NetworkCache::BlobStorage::shareCount):
+
+            Checks the link count to get the number of clients.
+
+        * NetworkProcess/cache/NetworkCacheBlobStorage.h: Added.
+        (WebKit::NetworkCache::BlobStorage::approximateSize):
+        * NetworkProcess/cache/NetworkCacheCoders.cpp:
+        (WebKit::NetworkCache::Coder<SHA1::Digest>::encode):
+        (WebKit::NetworkCache::Coder<SHA1::Digest>::decode):
+        * NetworkProcess/cache/NetworkCacheCoders.h:
+        * NetworkProcess/cache/NetworkCacheData.h:
+        (WebKit::NetworkCache::Data::isEmpty):
+        * NetworkProcess/cache/NetworkCacheDataCocoa.mm:
+        (WebKit::NetworkCache::Data::empty):
+        (WebKit::NetworkCache::Data::fromMap):
+        (WebKit::NetworkCache::mapFile):
+        (WebKit::NetworkCache::computeSHA1):
+        (WebKit::NetworkCache::bytesEqual):
+
+            Add some helpers.
+
+        * NetworkProcess/cache/NetworkCacheEntry.cpp:
+        (WebKit::NetworkCache::Entry::asJSON):
+        * NetworkProcess/cache/NetworkCacheIOChannelCocoa.mm:
+        (WebKit::NetworkCache::IOChannel::IOChannel):
+        * NetworkProcess/cache/NetworkCacheStorage.cpp:
+        (WebKit::NetworkCache::makeRecordDirectoryPath):
+        (WebKit::NetworkCache::makeBlobDirectoryPath):
+        (WebKit::NetworkCache::Storage::Storage):
+        (WebKit::NetworkCache::Storage::approximateSize):
+        (WebKit::NetworkCache::Storage::synchronize):
+        (WebKit::NetworkCache::partitionPathForKey):
+        (WebKit::NetworkCache::recordPathForKey):
+        (WebKit::NetworkCache::bodyPath):
+        (WebKit::NetworkCache::decodeRecordMetaData):
+        (WebKit::NetworkCache::decodeRecordHeader):
+        (WebKit::NetworkCache::createRecord):
+        (WebKit::NetworkCache::encodeRecordMetaData):
+        (WebKit::NetworkCache::encodeRecordHeader):
+        (WebKit::NetworkCache::Storage::remove):
+        (WebKit::NetworkCache::Storage::updateFileModificationTime):
+        (WebKit::NetworkCache::Storage::dispatchReadOperation):
+
+            Read both the blob and the record entry.
+
+        (WebKit::NetworkCache::Storage::finishReadOperation):
+
+            Factor to a function.
+
+        (WebKit::NetworkCache::Storage::store):
+        (WebKit::NetworkCache::Storage::traverse):
+        (WebKit::NetworkCache::Storage::dispatchPendingWriteOperations):
+        (WebKit::NetworkCache::Storage::dispatchWriteOperation):
+
+            We don't need separate full write and header write paths anymore. Everything is treated
+            as a full write and deduplication stops us writing the body again.
+
+            This simplifies the code and data structures.
+
+        (WebKit::NetworkCache::Storage::finishWriteOperation):
+
+            Factor to a function.
+
+        (WebKit::NetworkCache::Storage::clear):
+        (WebKit::NetworkCache::deletionProbability):
+
+            Take the sharing count into account when computing deletion probability.
+            It is less useful to delete a record that shares its body with others as data won't get deleted.
+
+        (WebKit::NetworkCache::Storage::shrinkIfNeeded):
+        (WebKit::NetworkCache::Storage::shrink):
+        (WebKit::NetworkCache::Storage::deleteOldVersions):
+        (WebKit::NetworkCache::directoryPathForKey): Deleted.
+        (WebKit::NetworkCache::filePathForKey): Deleted.
+        (WebKit::NetworkCache::openFileForKey): Deleted.
+        (WebKit::NetworkCache::decodeRecord): Deleted.
+        (WebKit::NetworkCache::Storage::update): Deleted.
+
+            No need for separate update interface anymore. Regular store() avoids unnecessary body write.
+
+        (WebKit::NetworkCache::Storage::dispatchFullWriteOperation): Deleted.
+        (WebKit::NetworkCache::Storage::dispatchHeaderWriteOperation): Deleted.
+        * NetworkProcess/cache/NetworkCacheStorage.h:
+        * WebKit2.xcodeproj/project.pbxproj:
+
 2015-04-14  Chris Dumez  <cdu...@apple.com>
 
         REGRESSION(r182603): [GTK] More than 500 crashes on the layout tests with the debug build.

Modified: trunk/Source/WebKit2/NetworkProcess/cache/NetworkCache.cpp (182802 => 182803)


--- trunk/Source/WebKit2/NetworkProcess/cache/NetworkCache.cpp	2015-04-14 17:57:36 UTC (rev 182802)
+++ trunk/Source/WebKit2/NetworkProcess/cache/NetworkCache.cpp	2015-04-14 18:40:27 UTC (rev 182803)
@@ -370,7 +370,7 @@
 
     auto record = cacheEntry.encodeAsStorageRecord();
 
-    m_storage->store(record, [completionHandler](bool success, const Data& bodyData) {
+    m_storage->store(record, [completionHandler](const Data& bodyData) {
         MappedBody mappedBody;
 #if ENABLE(SHAREABLE_RESOURCE)
         if (bodyData.isMap()) {
@@ -381,7 +381,7 @@
         }
 #endif
         completionHandler(mappedBody);
-        LOG(NetworkCache, "(NetworkProcess) store success=%d", success);
+        LOG(NetworkCache, "(NetworkProcess) stored");
     });
 }
 
@@ -396,9 +396,7 @@
 
     auto updateRecord = updateEntry.encodeAsStorageRecord();
 
-    m_storage->update(updateRecord, existingEntry.sourceStorageRecord(), [](bool success, const Data&) {
-        LOG(NetworkCache, "(NetworkProcess) updated, success=%d", success);
-    });
+    m_storage->store(updateRecord, { });
 }
 
 void Cache::remove(const Key& key)
@@ -447,7 +445,8 @@
         size_t bodySize { 0 };
     };
     Totals totals;
-    m_storage->traverse(Storage::TraverseFlag::ComputeWorth, [fd, totals](const Storage::Record* record, const Storage::RecordInfo& info) mutable {
+    auto flags = Storage::TraverseFlag::ComputeWorth | Storage::TraverseFlag::ShareCount;
+    m_storage->traverse(flags, [fd, totals](const Storage::Record* record, const Storage::RecordInfo& info) mutable {
         if (!record) {
             StringBuilder epilogue;
             epilogue.appendLiteral("{}\n],\n");

Added: trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp (0 => 182803)


--- trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp	                        (rev 0)
+++ trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.cpp	2015-04-14 18:40:27 UTC (rev 182803)
@@ -0,0 +1,171 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "config.h"
+#include "NetworkCacheBlobStorage.h"
+
+#if ENABLE(NETWORK_CACHE)
+
+#include "Logging.h"
+#include "NetworkCacheFileSystemPosix.h"
+#include <WebCore/FileSystem.h>
+#include <sys/mman.h>
+#include <wtf/RunLoop.h>
+#include <wtf/SHA1.h>
+#include <wtf/text/StringBuilder.h>
+
+namespace WebKit {
+namespace NetworkCache {
+
+BlobStorage::BlobStorage(const String& blobDirectoryPath)
+    : m_blobDirectoryPath(blobDirectoryPath)
+{
+}
+
+String BlobStorage::blobDirectoryPath() const
+{
+    return m_blobDirectoryPath.isolatedCopy();
+}
+
+void BlobStorage::synchronize()
+{
+    ASSERT(!RunLoop::isMain());
+
+    WebCore::makeAllDirectories(blobDirectoryPath());
+
+    m_approximateSize = 0;
+    auto blobDirectory = blobDirectoryPath();
+    traverseDirectory(blobDirectory, DT_REG, [this, &blobDirectory](const String& name) {
+        auto path = WebCore::pathByAppendingComponent(blobDirectory, name);
+        auto filePath = WebCore::fileSystemRepresentation(path);
+        struct stat stat;
+        ::stat(filePath.data(), &stat);
+        // No clients left for this blob.
+        if (stat.st_nlink == 1)
+            unlink(filePath.data());
+        else
+            m_approximateSize += stat.st_size;
+    });
+
+    LOG(NetworkCacheStorage, "(NetworkProcess) blob synchronization completed approximateSize=%zu", approximateSize());
+}
+
+String BlobStorage::blobPathForHash(const SHA1::Digest& hash) const
+{
+    auto hashAsString = SHA1::hexDigest(hash);
+    return WebCore::pathByAppendingComponent(blobDirectoryPath(), String::fromUTF8(hashAsString));
+}
+
+BlobStorage::Blob BlobStorage::add(const String& path, const Data& data)
+{
+    ASSERT(!RunLoop::isMain());
+
+    auto hash = computeSHA1(data);
+    if (data.isEmpty())
+        return { data, hash };
+
+    auto blobPath = WebCore::fileSystemRepresentation(blobPathForHash(hash));
+    auto linkPath = WebCore::fileSystemRepresentation(path);
+    unlink(linkPath.data());
+
+    bool blobExists = access(blobPath.data(), F_OK) != -1;
+    if (blobExists) {
+        auto existingData = mapFile(blobPath.data());
+        if (bytesEqual(existingData, data)) {
+            link(blobPath.data(), linkPath.data());
+            return { existingData, hash };
+        }
+        unlink(blobPath.data());
+    }
+
+    int fd = open(blobPath.data(), O_CREAT | O_EXCL | O_RDWR , S_IRUSR | S_IWUSR);
+    if (fd < 0)
+        return { };
+
+    size_t size = data.size();
+    if (ftruncate(fd, size) < 0) {
+        close(fd);
+        return { };
+    }
+
+    void* map = mmap(nullptr, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+    close(fd);
+
+    if (map == MAP_FAILED)
+        return { };
+
+    uint8_t* mapData = static_cast<uint8_t*>(map);
+    data.apply([&mapData](const uint8_t* bytes, size_t bytesSize) {
+        memcpy(mapData, bytes, bytesSize);
+        mapData += bytesSize;
+        return true;
+    });
+
+    // Drop the write permission.
+    mprotect(map, size, PROT_READ);
+
+    auto mappedData = Data::adoptMap(map, size);
+
+    link(blobPath.data(), linkPath.data());
+
+    m_approximateSize += size;
+
+    return { mappedData, hash };
+}
+
+BlobStorage::Blob BlobStorage::get(const String& path)
+{
+    ASSERT(!RunLoop::isMain());
+
+    auto linkPath = WebCore::fileSystemRepresentation(path);
+    auto data = ""
+
+    return { data, computeSHA1(data) };
+}
+
+void BlobStorage::remove(const String& path)
+{
+    ASSERT(!RunLoop::isMain());
+
+    auto linkPath = WebCore::fileSystemRepresentation(path);
+    unlink(linkPath.data());
+}
+
+unsigned BlobStorage::shareCount(const String& path)
+{
+    ASSERT(!RunLoop::isMain());
+
+    auto linkPath = WebCore::fileSystemRepresentation(path);
+    struct stat stat;
+    if (::stat(linkPath.data(), &stat) < 0)
+        return 0;
+    // Link count is 2 in the single client case (the blob file and a link).
+    return stat.st_nlink - 1;
+}
+
+}
+}
+
+#endif

Added: trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h (0 => 182803)


--- trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h	                        (rev 0)
+++ trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheBlobStorage.h	2015-04-14 18:40:27 UTC (rev 182803)
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. AND ITS CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL APPLE INC. OR ITS CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ * THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef NetworkCacheBlobStorage_h
+#define NetworkCacheBlobStorage_h
+
+#if ENABLE(NETWORK_CACHE)
+
+#include "NetworkCacheData.h"
+#include "NetworkCacheKey.h"
+#include <wtf/SHA1.h>
+
+namespace WebKit {
+namespace NetworkCache {
+
+// BlobStorage deduplicates the data using SHA1 hash computed over the blob bytes.
+class BlobStorage {
+    WTF_MAKE_NONCOPYABLE(BlobStorage);
+public:
+    BlobStorage(const String& blobDirectoryPath);
+
+    struct Blob {
+        Data data;
+        SHA1::Digest hash;
+    };
+    // These are all synchronous and should not be used from the main thread.
+    Blob add(const String& path, const Data&);
+    Blob get(const String& path);
+
+    // Blob won't be removed until synchronization.
+    void remove(const String& path);
+
+    unsigned shareCount(const String& path);
+
+    size_t approximateSize() const { return m_approximateSize; }
+
+    void synchronize();
+
+private:
+    String blobDirectoryPath() const;
+    String blobPathForHash(const SHA1::Digest&) const;
+
+    const String m_blobDirectoryPath;
+
+    std::atomic<size_t> m_approximateSize { 0 };
+};
+
+}
+}
+
+#endif
+#endif

Modified: trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheCoders.cpp (182802 => 182803)


--- trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheCoders.cpp	2015-04-14 17:57:36 UTC (rev 182802)
+++ trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheCoders.cpp	2015-04-14 18:40:27 UTC (rev 182803)
@@ -184,7 +184,17 @@
     return decoder.decodeFixedLengthData(digest.data(), sizeof(digest));
 }
 
+void Coder<SHA1::Digest>::encode(Encoder& encoder, const SHA1::Digest& digest)
+{
+    encoder.encodeFixedLengthData(digest.data(), sizeof(digest));
 }
+
+bool Coder<SHA1::Digest>::decode(Decoder& decoder, SHA1::Digest& digest)
+{
+    return decoder.decodeFixedLengthData(digest.data(), sizeof(digest));
 }
 
+}
+}
+
 #endif

Modified: trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheCoders.h (182802 => 182803)


--- trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheCoders.h	2015-04-14 17:57:36 UTC (rev 182802)
+++ trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheCoders.h	2015-04-14 18:40:27 UTC (rev 182803)
@@ -36,6 +36,7 @@
 #include <wtf/HashMap.h>
 #include <wtf/HashSet.h>
 #include <wtf/MD5.h>
+#include <wtf/SHA1.h>
 #include <wtf/Vector.h>
 
 namespace WebKit {
@@ -262,6 +263,11 @@
     static bool decode(Decoder&, MD5::Digest&);
 };
 
+template<> struct Coder<SHA1::Digest> {
+    static void encode(Encoder&, const SHA1::Digest&);
+    static bool decode(Decoder&, SHA1::Digest&);
+};
+
 }
 }
 #endif

Modified: trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheData.h (182802 => 182803)


--- trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheData.h	2015-04-14 17:57:36 UTC (rev 182802)
+++ trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheData.h	2015-04-14 18:40:27 UTC (rev 182803)
@@ -30,6 +30,7 @@
 
 #include <functional>
 #include <wtf/FunctionDispatcher.h>
+#include <wtf/SHA1.h>
 #include <wtf/ThreadSafeRefCounted.h>
 #include <wtf/text/WTFString.h>
 
@@ -98,11 +99,15 @@
     Data() { }
     Data(const uint8_t*, size_t);
 
+    static Data empty();
+    static Data adoptMap(void* map, size_t);
+
     enum class Backing { Buffer, Map };
 #if PLATFORM(COCOA)
     Data(DispatchPtr<dispatch_data_t>, Backing = Backing::Buffer);
 #endif
     bool isNull() const;
+    bool isEmpty() const { return !m_size; }
 
     const uint8_t* data() const;
     size_t size() const { return m_size; }
@@ -125,7 +130,10 @@
 };
 
 Data concatenate(const Data&, const Data&);
+bool bytesEqual(const Data&, const Data&);
 Data mapFile(int fd, size_t offset, size_t);
+Data mapFile(const char* path);
+SHA1::Digest computeSHA1(const Data&);
 
 }
 }

Modified: trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheDataCocoa.mm (182802 => 182803)


--- trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheDataCocoa.mm	2015-04-14 17:57:36 UTC (rev 182802)
+++ trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheDataCocoa.mm	2015-04-14 18:40:27 UTC (rev 182803)
@@ -30,6 +30,7 @@
 
 #include <dispatch/dispatch.h>
 #include <sys/mman.h>
+#include <sys/stat.h>
 
 namespace WebKit {
 namespace NetworkCache {
@@ -47,6 +48,11 @@
 {
 }
 
+Data Data::empty()
+{
+    return { DispatchPtr<dispatch_data_t>(dispatch_data_empty) };
+}
+
 const uint8_t* Data::data() const
 {
     if (!m_data && m_dispatchData) {
@@ -87,18 +93,65 @@
     return { adoptDispatch(dispatch_data_create_concat(a.dispatchData(), b.dispatchData())) };
 }
 
-Data mapFile(int fd, size_t offset, size_t size)
+Data Data::adoptMap(void* map, size_t size)
 {
-    void* map = mmap(nullptr, size, PROT_READ, MAP_PRIVATE, fd, offset);
-    if (map == MAP_FAILED)
-        return { };
+    ASSERT(map && map != MAP_FAILED);
     auto bodyMap = adoptDispatch(dispatch_data_create(map, size, dispatch_get_main_queue(), [map, size] {
         munmap(map, size);
     }));
     return { bodyMap, Data::Backing::Map };
 }
 
+Data mapFile(int fd, size_t offset, size_t size)
+{
+    if (!size)
+        return Data::empty();
+
+    void* map = mmap(nullptr, size, PROT_READ, MAP_PRIVATE, fd, offset);
+    if (map == MAP_FAILED)
+        return { };
+    return Data::adoptMap(map, size);
 }
+
+Data mapFile(const char* path)
+{
+    int fd = open(path, O_RDONLY, 0);
+    if (fd < 0)
+        return { };
+    struct stat stat;
+    if (fstat(fd, &stat) < 0) {
+        close(fd);
+        return { };
+    }
+    size_t size = stat.st_size;
+    auto data = "" 0, size);
+    close(fd);
+
+    return data;
 }
 
+SHA1::Digest computeSHA1(const Data& data)
+{
+    SHA1 sha1;
+    data.apply([&sha1](const uint8_t* data, size_t size) {
+        sha1.addBytes(data, size);
+        return true;
+    });
+    SHA1::Digest digest;
+    sha1.computeHash(digest);
+    return digest;
+}
+
+bool bytesEqual(const Data& a, const Data& b)
+{
+    if (a.isNull() || b.isNull())
+        return false;
+    if (a.size() != b.size())
+        return false;
+    return !memcmp(a.data(), b.data(), a.size());
+}
+
+}
+}
+
 #endif

Modified: trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheEntry.cpp (182802 => 182803)


--- trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheEntry.cpp	2015-04-14 17:57:36 UTC (rev 182802)
+++ trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheEntry.cpp	2015-04-14 18:40:27 UTC (rev 182803)
@@ -165,6 +165,12 @@
     json.appendLiteral("\"URL\": ");
     JSC::appendQuotedJSONStringToBuilder(json, m_response.url().string());
     json.appendLiteral(",\n");
+    json.appendLiteral("\"bodyHash\": ");
+    JSC::appendQuotedJSONStringToBuilder(json, info.bodyHash);
+    json.appendLiteral(",\n");
+    json.appendLiteral("\"bodyShareCount\": ");
+    json.appendNumber(info.bodyShareCount);
+    json.appendLiteral(",\n");
     json.appendLiteral("\"headers\": {\n");
     bool firstHeader = true;
     for (auto& header : m_response.httpHeaderFields()) {

Modified: trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp (182802 => 182803)


--- trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp	2015-04-14 17:57:36 UTC (rev 182802)
+++ trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.cpp	2015-04-14 18:40:27 UTC (rev 182803)
@@ -43,6 +43,9 @@
 
 static const char networkCacheSubdirectory[] = "WebKitCache";
 static const char versionDirectoryPrefix[] = "Version ";
+static const char recordsDirectoryName[] = "Records";
+static const char blobsDirectoryName[] = "Blobs";
+static const char bodyPostfix[] = "-body";
 
 static double computeRecordWorth(FileTimes);
 
@@ -62,17 +65,33 @@
     return WebCore::pathByAppendingComponent(baseDirectoryPath, versionSubdirectory);
 }
 
+static String makeRecordDirectoryPath(const String& baseDirectoryPath)
+{
+    return WebCore::pathByAppendingComponent(makeVersionedDirectoryPath(baseDirectoryPath), recordsDirectoryName);
+}
+
+static String makeBlobDirectoryPath(const String& baseDirectoryPath)
+{
+    return WebCore::pathByAppendingComponent(makeVersionedDirectoryPath(baseDirectoryPath), blobsDirectoryName);
+}
+
 Storage::Storage(const String& baseDirectoryPath)
     : m_baseDirectoryPath(baseDirectoryPath)
-    , m_directoryPath(makeVersionedDirectoryPath(baseDirectoryPath))
+    , m_directoryPath(makeRecordDirectoryPath(baseDirectoryPath))
     , m_ioQueue(WorkQueue::create("com.apple.WebKit.Cache.Storage", WorkQueue::Type::Concurrent))
     , m_backgroundIOQueue(WorkQueue::create("com.apple.WebKit.Cache.Storage.background", WorkQueue::Type::Concurrent, WorkQueue::QOS::Background))
     , m_serialBackgroundIOQueue(WorkQueue::create("com.apple.WebKit.Cache.Storage.serialBackground", WorkQueue::Type::Serial, WorkQueue::QOS::Background))
+    , m_blobStorage(makeBlobDirectoryPath(baseDirectoryPath))
 {
     deleteOldVersions();
     synchronize();
 }
 
+size_t Storage::approximateSize() const
+{
+    return m_approximateSize + m_blobStorage.approximateSize();
+}
+
 void Storage::synchronize()
 {
     ASSERT(RunLoop::isMain());
@@ -117,7 +136,9 @@
             m_synchronizationInProgress = false;
         });
 
-        LOG(NetworkCacheStorage, "(NetworkProcess) cache synchronization completed approximateSize=%zu count=%d", size, count);
+        m_blobStorage.synchronize();
+
+        LOG(NetworkCacheStorage, "(NetworkProcess) cache synchronization completed size=%zu count=%d", size, count);
     });
 }
 
@@ -139,7 +160,7 @@
     return !m_contentsFilter || m_contentsFilter->mayContain(key.hash());
 }
 
-static String directoryPathForKey(const Key& key, const String& cachePath)
+static String partitionPathForKey(const Key& key, const String& cachePath)
 {
     ASSERT(!key.partition().isEmpty());
     return WebCore::pathByAppendingComponent(cachePath, key.partition());
@@ -150,20 +171,21 @@
     return key.hashAsString();
 }
 
-static String filePathForKey(const Key& key, const String& cachePath)
+static String recordPathForKey(const Key& key, const String& cachePath)
 {
-    return WebCore::pathByAppendingComponent(directoryPathForKey(key, cachePath), fileNameForKey(key));
+    return WebCore::pathByAppendingComponent(partitionPathForKey(key, cachePath), fileNameForKey(key));
 }
 
-static Ref<IOChannel> openFileForKey(const Key& key, IOChannel::Type type, const String& cachePath)
+static String bodyPathForRecordPath(const String& recordPath)
 {
-    auto directoryPath = directoryPathForKey(key, cachePath);
-    auto filePath = WebCore::pathByAppendingComponent(directoryPath, fileNameForKey(key));
-    if (type == IOChannel::Type::Create)
-        WebCore::makeAllDirectories(directoryPath);
-    return IOChannel::open(filePath, type);
+    return recordPath + bodyPostfix;
 }
 
+static String bodyPathForKey(const Key& key, const String& cachePath)
+{
+    return bodyPathForRecordPath(recordPathForKey(key, cachePath));
+}
+
 static unsigned hashData(const Data& data)
 {
     StringHasher hasher;
@@ -188,8 +210,7 @@
     unsigned headerChecksum;
     uint64_t headerOffset;
     uint64_t headerSize;
-    unsigned bodyChecksum;
-    uint64_t bodyOffset;
+    SHA1::Digest bodyHash;
     uint64_t bodySize;
 };
 
@@ -208,14 +229,13 @@
             return false;
         if (!decoder.decode(metaData.headerSize))
             return false;
-        if (!decoder.decode(metaData.bodyChecksum))
+        if (!decoder.decode(metaData.bodyHash))
             return false;
         if (!decoder.decode(metaData.bodySize))
             return false;
         if (!decoder.verifyChecksum())
             return false;
         metaData.headerOffset = decoder.currentOffset();
-        metaData.bodyOffset = WTF::roundUpToMultipleOf(pageSize(), metaData.headerOffset + metaData.headerSize);
         success = true;
         return false;
     });
@@ -233,10 +253,6 @@
         LOG(NetworkCacheStorage, "(NetworkProcess) version mismatch");
         return false;
     }
-    if (metaData.headerOffset + metaData.headerSize > metaData.bodyOffset) {
-        LOG(NetworkCacheStorage, "(NetworkProcess) body offset mismatch");
-        return false;
-    }
 
     auto headerData = fileData.subrange(metaData.headerOffset, metaData.headerSize);
     if (metaData.headerChecksum != hashData(headerData)) {
@@ -247,11 +263,11 @@
     return true;
 }
 
-static std::unique_ptr<Storage::Record> decodeRecord(const Data& fileData, int fd, const Key& key)
+static std::unique_ptr<Storage::Record> createRecord(const Data& recordData, const BlobStorage::Blob& bodyBlob, const Key& key)
 {
     RecordMetaData metaData;
     Data headerData;
-    if (!decodeRecordHeader(fileData, metaData, headerData))
+    if (!decodeRecordHeader(recordData, metaData, headerData))
         return nullptr;
 
     if (metaData.key != key)
@@ -261,29 +277,16 @@
     auto timeStamp = std::chrono::system_clock::time_point(metaData.epochRelativeTimeStamp);
     if (timeStamp > std::chrono::system_clock::now())
         return nullptr;
+    if (metaData.bodySize != bodyBlob.data.size())
+        return nullptr;
+    if (metaData.bodyHash != bodyBlob.hash)
+        return nullptr;
 
-    Data bodyData;
-    if (metaData.bodySize) {
-        if (metaData.bodyOffset + metaData.bodySize != fileData.size())
-            return nullptr;
-
-        bodyData = mapFile(fd, metaData.bodyOffset, metaData.bodySize);
-        if (bodyData.isNull()) {
-            LOG(NetworkCacheStorage, "(NetworkProcess) map failed");
-            return nullptr;
-        }
-
-        if (metaData.bodyChecksum != hashData(bodyData)) {
-            LOG(NetworkCacheStorage, "(NetworkProcess) data checksum mismatch");
-            return nullptr;
-        }
-    }
-
     return std::make_unique<Storage::Record>(Storage::Record {
         metaData.key,
         timeStamp,
         headerData,
-        bodyData
+        bodyBlob.data
     });
 }
 
@@ -296,7 +299,7 @@
     encoder << metaData.epochRelativeTimeStamp;
     encoder << metaData.headerChecksum;
     encoder << metaData.headerSize;
-    encoder << metaData.bodyChecksum;
+    encoder << metaData.bodyHash;
     encoder << metaData.bodySize;
 
     encoder.encodeChecksum();
@@ -304,25 +307,18 @@
     return Data(encoder.buffer(), encoder.bufferSize());
 }
 
-static Data encodeRecordHeader(const Storage::Record& record)
+static Data encodeRecordHeader(const Storage::Record& record, SHA1::Digest bodyHash)
 {
     RecordMetaData metaData(record.key);
     metaData.epochRelativeTimeStamp = std::chrono::duration_cast<std::chrono::milliseconds>(record.timeStamp.time_since_epoch());
     metaData.headerChecksum = hashData(record.header);
     metaData.headerSize = record.header.size();
-    metaData.bodyChecksum = hashData(record.body);
+    metaData.bodyHash = bodyHash;
     metaData.bodySize = record.body.size();
 
     auto encodedMetaData = encodeRecordMetaData(metaData);
     auto headerData = concatenate(encodedMetaData, record.header);
-    if (!record.body.size())
-        return { headerData };
-
-    size_t dataOffset = WTF::roundUpToMultipleOf(pageSize(), headerData.size());
-    Vector<uint8_t, 4096> filler(dataOffset - headerData.size(), 0);
-    Data alignmentData(filler.data(), filler.size());
-
-    return concatenate(headerData, alignmentData);
+    return { headerData };
 }
 
 void Storage::remove(const Key& key)
@@ -333,15 +329,17 @@
     // For simplicity we also don't reduce m_approximateSize on removals.
     // The next synchronization will update everything.
 
-    StringCapture filePathCapture(filePathForKey(key, m_directoryPath));
-    serialBackgroundIOQueue().dispatch([this, filePathCapture] {
-        WebCore::deleteFile(filePathCapture.string());
+    StringCapture recordPathCapture(recordPathForKey(key, m_directoryPath));
+    StringCapture bodyPathCapture(bodyPathForKey(key, m_directoryPath));
+    serialBackgroundIOQueue().dispatch([this, recordPathCapture, bodyPathCapture] {
+        WebCore::deleteFile(recordPathCapture.string());
+        m_blobStorage.remove(bodyPathCapture.string());
     });
 }
 
-void Storage::updateFileModificationTime(IOChannel& channel)
+void Storage::updateFileModificationTime(const String& path)
 {
-    StringCapture filePathCapture(channel.path());
+    StringCapture filePathCapture(path);
     serialBackgroundIOQueue().dispatch([filePathCapture] {
         updateFileModificationTimeIfNeeded(filePathCapture.string());
     });
@@ -354,29 +352,35 @@
 
     StringCapture cachePathCapture(m_directoryPath);
     ioQueue().dispatch([this, &read, cachePathCapture] {
-        RefPtr<IOChannel> channel = openFileForKey(read.key, IOChannel::Type::Read, cachePathCapture.string());
-        channel->read(0, std::numeric_limits<size_t>::max(), [this, channel, &read](Data& fileData, int error) {
-            if (error) {
-                remove(read.key);
-                read.completionHandler(nullptr);
-            } else {
-                auto record = decodeRecord(fileData, channel->fileDescriptor(), read.key);
-                bool success = read.completionHandler(WTF::move(record));
-                if (success)
-                    updateFileModificationTime(*channel);
-                else
-                    remove(read.key);
-            }
+        auto recordPath = recordPathForKey(read.key, cachePathCapture.string());
+        auto bodyPath = bodyPathForKey(read.key, cachePathCapture.string());
+        // FIXME: Body and header retrieves can be done in parallel.
+        auto bodyBlob = m_blobStorage.get(bodyPath);
 
-            ASSERT(m_activeReadOperations.contains(&read));
-            m_activeReadOperations.remove(&read);
-            dispatchPendingReadOperations();
-
-            LOG(NetworkCacheStorage, "(NetworkProcess) read complete error=%d", error);
+        RefPtr<IOChannel> channel = IOChannel::open(recordPath, IOChannel::Type::Read);
+        channel->read(0, std::numeric_limits<size_t>::max(), [this, &read, bodyBlob](Data& fileData, int error) {
+            auto record = error ? nullptr : createRecord(fileData, bodyBlob, read.key);
+            finishReadOperation(read, WTF::move(record));
         });
     });
 }
 
+void Storage::finishReadOperation(const ReadOperation& read, std::unique_ptr<Record> record)
+{
+    ASSERT(RunLoop::isMain());
+
+    bool success = read.completionHandler(WTF::move(record));
+    if (success)
+        updateFileModificationTime(recordPathForKey(read.key, m_directoryPath));
+    else
+        remove(read.key);
+    ASSERT(m_activeReadOperations.contains(&read));
+    m_activeReadOperations.remove(&read);
+    dispatchPendingReadOperations();
+
+    LOG(NetworkCacheStorage, "(NetworkProcess) read complete success=%d", success);
+}
+
 void Storage::dispatchPendingReadOperations()
 {
     ASSERT(RunLoop::isMain());
@@ -413,6 +417,83 @@
     return false;
 }
 
+void Storage::dispatchPendingWriteOperations()
+{
+    ASSERT(RunLoop::isMain());
+
+    const int maximumActiveWriteOperationCount { 3 };
+
+    while (!m_pendingWriteOperations.isEmpty()) {
+        if (m_activeWriteOperations.size() >= maximumActiveWriteOperationCount) {
+            LOG(NetworkCacheStorage, "(NetworkProcess) limiting parallel writes");
+            return;
+        }
+        auto writeOperation = m_pendingWriteOperations.takeFirst();
+        auto& write = *writeOperation;
+        m_activeWriteOperations.add(WTF::move(writeOperation));
+
+        dispatchWriteOperation(write);
+    }
+}
+
+void Storage::dispatchWriteOperation(const WriteOperation& write)
+{
+    ASSERT(RunLoop::isMain());
+    ASSERT(m_activeWriteOperations.contains(&write));
+
+    // This was added already when starting the store but filter might have been wiped.
+    addToContentsFilter(write.record.key);
+
+    StringCapture cachePathCapture(m_directoryPath);
+    backgroundIOQueue().dispatch([this, &write, cachePathCapture] {
+        auto partitionPath = partitionPathForKey(write.record.key, cachePathCapture.string());
+        auto recordPath = recordPathForKey(write.record.key, cachePathCapture.string());
+        auto bodyPath = bodyPathForKey(write.record.key, cachePathCapture.string());
+
+        WebCore::makeAllDirectories(partitionPath);
+
+        // Store the body.
+        auto blob = m_blobStorage.add(bodyPath, write.record.body);
+        if (blob.data.isNull()) {
+            RunLoop::main().dispatch([this, &write] {
+                finishWriteOperation(write);
+            });
+            return;
+        }
+
+        // Tell the client we now have a disk-backed map for this data.
+        size_t minimumMapSize = pageSize();
+        if (blob.data.size() >= minimumMapSize && blob.data.isMap() && write.mappedBodyHandler) {
+            auto& mappedBodyHandler = write.mappedBodyHandler;
+            RunLoop::main().dispatch([blob, mappedBodyHandler] {
+                mappedBodyHandler(blob.data);
+            });
+        }
+
+        // Store the header and meta data.
+        auto encodedHeader = encodeRecordHeader(write.record, blob.hash);
+        auto channel = IOChannel::open(recordPath, IOChannel::Type::Create);
+        int fd = channel->fileDescriptor();
+        size_t headerSize = encodedHeader.size();
+        channel->write(0, encodedHeader, [this, &write, headerSize, fd](int error) {
+            // On error the entry still stays in the contents filter until next synchronization.
+            m_approximateSize += headerSize;
+            finishWriteOperation(write);
+
+            LOG(NetworkCacheStorage, "(NetworkProcess) write complete error=%d", error);
+        });
+    });
+}
+
+void Storage::finishWriteOperation(const WriteOperation& write)
+{
+    ASSERT(m_activeWriteOperations.contains(&write));
+    m_activeWriteOperations.remove(&write);
+    dispatchPendingWriteOperations();
+
+    shrinkIfNeeded();
+}
+
 void Storage::retrieve(const Key& key, unsigned priority, RetrieveCompletionHandler&& completionHandler)
 {
     ASSERT(RunLoop::isMain());
@@ -438,17 +519,15 @@
     dispatchPendingReadOperations();
 }
 
-void Storage::store(const Record& record, StoreCompletionHandler&& completionHandler)
+void Storage::store(const Record& record, MappedBodyHandler&& mappedBodyHandler)
 {
     ASSERT(RunLoop::isMain());
     ASSERT(!record.key.isNull());
 
-    if (!m_capacity) {
-        completionHandler(false, { });
+    if (!m_capacity)
         return;
-    }
 
-    m_pendingWriteOperations.append(new WriteOperation { record, { }, WTF::move(completionHandler) });
+    m_pendingWriteOperations.append(new WriteOperation { record, WTF::move(mappedBodyHandler) });
 
     // Add key to the filter already here as we do lookups from the pending operations too.
     addToContentsFilter(record.key);
@@ -456,43 +535,29 @@
     dispatchPendingWriteOperations();
 }
 
-void Storage::update(const Record& updateRecord, const Record& existingRecord, StoreCompletionHandler&& completionHandler)
-{
-    ASSERT(RunLoop::isMain());
-    ASSERT(!existingRecord.key.isNull());
-    ASSERT(existingRecord.key == updateRecord.key);
-
-    if (!m_capacity) {
-        completionHandler(false, { });
-        return;
-    }
-
-    m_pendingWriteOperations.append(new WriteOperation { updateRecord, existingRecord, WTF::move(completionHandler) });
-
-    dispatchPendingWriteOperations();
-}
-
 void Storage::traverse(TraverseFlags flags, std::function<void (const Record*, const RecordInfo&)>&& traverseHandler)
 {
     StringCapture cachePathCapture(m_directoryPath);
     ioQueue().dispatch([this, flags, cachePathCapture, traverseHandler] {
         String cachePath = cachePathCapture.string();
         traverseCacheFiles(cachePath, [this, flags, &traverseHandler](const String& fileName, const String& partitionPath) {
-            auto filePath = WebCore::pathByAppendingComponent(partitionPath, fileName);
+            auto recordPath = WebCore::pathByAppendingComponent(partitionPath, fileName);
 
             RecordInfo info;
             if (flags & TraverseFlag::ComputeWorth)
-                info.worth = computeRecordWorth(fileTimes(filePath));
+                info.worth = computeRecordWorth(fileTimes(recordPath));
+            if (flags & TraverseFlag::ShareCount)
+                info.bodyShareCount = m_blobStorage.shareCount(bodyPathForRecordPath(recordPath));
 
-            auto channel = IOChannel::open(filePath, IOChannel::Type::Read);
-            const size_t headerReadSize = 16 << 10;
+            auto channel = IOChannel::open(recordPath, IOChannel::Type::Read);
             // FIXME: Traversal is slower than it should be due to lack of parallelism.
-            channel->readSync(0, headerReadSize, [this, &traverseHandler, &info](Data& fileData, int) {
+            channel->readSync(0, std::numeric_limits<size_t>::max(), [this, &traverseHandler, &info](Data& fileData, int) {
                 RecordMetaData metaData;
                 Data headerData;
                 if (decodeRecordHeader(fileData, metaData, headerData)) {
                     Record record { metaData.key, std::chrono::system_clock::time_point(metaData.epochRelativeTimeStamp), headerData, { } };
                     info.bodySize = metaData.bodySize;
+                    info.bodyHash = String::fromUTF8(SHA1::hexDigest(metaData.bodyHash));
                     traverseHandler(&record, info);
                 }
             });
@@ -503,107 +568,6 @@
     });
 }
 
-void Storage::dispatchPendingWriteOperations()
-{
-    ASSERT(RunLoop::isMain());
-
-    const int maximumActiveWriteOperationCount { 3 };
-
-    while (!m_pendingWriteOperations.isEmpty()) {
-        if (m_activeWriteOperations.size() >= maximumActiveWriteOperationCount) {
-            LOG(NetworkCacheStorage, "(NetworkProcess) limiting parallel writes");
-            return;
-        }
-        auto writeOperation = m_pendingWriteOperations.takeFirst();
-        auto& write = *writeOperation;
-        m_activeWriteOperations.add(WTF::move(writeOperation));
-
-        if (write.existingRecord && mayContain(write.record.key)) {
-            dispatchHeaderWriteOperation(write);
-            continue;
-        }
-        dispatchFullWriteOperation(write);
-    }
-}
-
-void Storage::dispatchFullWriteOperation(const WriteOperation& write)
-{
-    ASSERT(RunLoop::isMain());
-    ASSERT(m_activeWriteOperations.contains(&write));
-
-    // This was added already when starting the store but filter might have been wiped.
-    addToContentsFilter(write.record.key);
-
-    StringCapture cachePathCapture(m_directoryPath);
-    backgroundIOQueue().dispatch([this, &write, cachePathCapture] {
-        auto encodedHeader = encodeRecordHeader(write.record);
-        auto headerAndBodyData = concatenate(encodedHeader, write.record.body);
-
-        auto channel = openFileForKey(write.record.key, IOChannel::Type::Create, cachePathCapture.string());
-        int fd = channel->fileDescriptor();
-        size_t bodyOffset = encodedHeader.size();
-
-        channel->write(0, headerAndBodyData, [this, &write, bodyOffset, fd](int error) {
-            size_t bodySize = write.record.body.size();
-            size_t totalSize = bodyOffset + bodySize;
-
-            // On error the entry still stays in the contents filter until next synchronization.
-            m_approximateSize += totalSize;
-
-            bool shouldMapBody = !error && bodySize >= pageSize();
-            auto bodyMap = shouldMapBody ? mapFile(fd, bodyOffset, bodySize) : Data();
-
-            write.completionHandler(!error, bodyMap);
-
-            ASSERT(m_activeWriteOperations.contains(&write));
-            m_activeWriteOperations.remove(&write);
-            dispatchPendingWriteOperations();
-
-            LOG(NetworkCacheStorage, "(NetworkProcess) write complete error=%d", error);
-        });
-    });
-
-    shrinkIfNeeded();
-}
-
-void Storage::dispatchHeaderWriteOperation(const WriteOperation& write)
-{
-    ASSERT(RunLoop::isMain());
-    ASSERT(write.existingRecord);
-    ASSERT(m_activeWriteOperations.contains(&write));
-    ASSERT(mayContain(write.record.key));
-
-    // Try to update the header of an existing entry.
-    StringCapture cachePathCapture(m_directoryPath);
-    backgroundIOQueue().dispatch([this, &write, cachePathCapture] {
-        auto headerData = encodeRecordHeader(write.record);
-        auto existingHeaderData = encodeRecordHeader(write.existingRecord.value());
-
-        bool pageRoundedHeaderSizeChanged = headerData.size() != existingHeaderData.size();
-        if (pageRoundedHeaderSizeChanged) {
-            LOG(NetworkCacheStorage, "(NetworkProcess) page-rounded header size changed, storing full entry");
-            RunLoop::main().dispatch([this, &write] {
-                dispatchFullWriteOperation(write);
-            });
-            return;
-        }
-
-        auto channel = openFileForKey(write.record.key, IOChannel::Type::Write, cachePathCapture.string());
-        channel->write(0, headerData, [this, &write](int error) {
-            LOG(NetworkCacheStorage, "(NetworkProcess) update complete error=%d", error);
-
-            if (error)
-                remove(write.record.key);
-
-            write.completionHandler(!error, { });
-
-            ASSERT(m_activeWriteOperations.contains(&write));
-            m_activeWriteOperations.remove(&write);
-            dispatchPendingWriteOperations();
-        });
-    });
-}
-
 void Storage::setCapacity(size_t capacity)
 {
     ASSERT(RunLoop::isMain());
@@ -633,7 +597,7 @@
 
     StringCapture directoryPathCapture(m_directoryPath);
 
-    ioQueue().dispatch([directoryPathCapture] {
+    ioQueue().dispatch([this, directoryPathCapture] {
         String directoryPath = directoryPathCapture.string();
         traverseDirectory(directoryPath, DT_DIR, [&directoryPath](const String& subdirName) {
             String subdirPath = WebCore::pathByAppendingComponent(directoryPath, subdirName);
@@ -642,6 +606,9 @@
             });
             WebCore::deleteEmptyDirectory(subdirPath);
         });
+
+        // This cleans unreferences blobs.
+        m_blobStorage.synchronize();
     });
 }
 
@@ -660,24 +627,30 @@
     return duration<double>(accessAge) / age;
 }
 
-
-static double deletionProbability(FileTimes times)
+static double deletionProbability(FileTimes times, unsigned bodyShareCount)
 {
     static const double maximumProbability { 0.33 };
+    static const unsigned maximumEffectiveShareCount { 5 };
 
     auto worth = computeRecordWorth(times);
 
     // Adjust a bit so the most valuable entries don't get deleted at all.
     auto effectiveWorth = std::min(1.1 * worth, 1.);
 
-    return (1 - effectiveWorth) * maximumProbability;
+    auto probability =  (1 - effectiveWorth) * maximumProbability;
+
+    // It is less useful to remove an entry that shares its body data.
+    if (bodyShareCount)
+        probability /= std::min(bodyShareCount, maximumEffectiveShareCount);
+
+    return probability;
 }
 
 void Storage::shrinkIfNeeded()
 {
     ASSERT(RunLoop::isMain());
 
-    if (m_approximateSize > m_capacity)
+    if (approximateSize() > m_capacity)
         shrink();
 }
 
@@ -689,22 +662,27 @@
         return;
     m_shrinkInProgress = true;
 
-    LOG(NetworkCacheStorage, "(NetworkProcess) shrinking cache approximateSize=%zu capacity=%zu", static_cast<size_t>(m_approximateSize), m_capacity);
+    LOG(NetworkCacheStorage, "(NetworkProcess) shrinking cache approximateSize=%zu capacity=%zu", approximateSize(), m_capacity);
 
     StringCapture cachePathCapture(m_directoryPath);
     backgroundIOQueue().dispatch([this, cachePathCapture] {
         String cachePath = cachePathCapture.string();
-        traverseCacheFiles(cachePath, [](const String& fileName, const String& partitionPath) {
-            auto filePath = WebCore::pathByAppendingComponent(partitionPath, fileName);
+        traverseCacheFiles(cachePath, [this](const String& fileName, const String& partitionPath) {
+            auto recordPath = WebCore::pathByAppendingComponent(partitionPath, fileName);
+            auto bodyPath = bodyPathForRecordPath(recordPath);
 
-            auto times = fileTimes(filePath);
-            auto probability = deletionProbability(times);
+            auto times = fileTimes(recordPath);
+            unsigned bodyShareCount = m_blobStorage.shareCount(bodyPath);
+            auto probability = deletionProbability(times, bodyShareCount);
+
             bool shouldDelete = randomNumber() < probability;
 
-            LOG(NetworkCacheStorage, "Deletion probability=%f shouldDelete=%d", probability, shouldDelete);
+            LOG(NetworkCacheStorage, "Deletion probability=%f bodyLinkCount=%d shouldDelete=%d", probability, bodyShareCount, shouldDelete);
 
-            if (shouldDelete)
-                WebCore::deleteFile(filePath);
+            if (shouldDelete) {
+                WebCore::deleteFile(recordPath);
+                m_blobStorage.remove(bodyPath);
+            }
         });
 
         // Let system figure out if they are really empty.
@@ -739,6 +717,7 @@
             WebCore::deleteEmptyDirectory(partitionPath);
         });
     });
+    // FIXME: Delete V2 cache.
 }
 
 }

Modified: trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.h (182802 => 182803)


--- trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.h	2015-04-14 17:57:36 UTC (rev 182802)
+++ trunk/Source/WebKit2/NetworkProcess/cache/NetworkCacheStorage.h	2015-04-14 18:40:27 UTC (rev 182803)
@@ -28,6 +28,7 @@
 
 #if ENABLE(NETWORK_CACHE)
 
+#include "NetworkCacheBlobStorage.h"
 #include "NetworkCacheData.h"
 #include "NetworkCacheKey.h"
 #include <wtf/BloomFilter.h>
@@ -57,27 +58,30 @@
     typedef std::function<bool (std::unique_ptr<Record>)> RetrieveCompletionHandler;
     void retrieve(const Key&, unsigned priority, RetrieveCompletionHandler&&);
 
-    typedef std::function<void (bool success, const Data& mappedBody)> StoreCompletionHandler;
-    void store(const Record&, StoreCompletionHandler&&);
-    void update(const Record& updateRecord, const Record& existingRecord, StoreCompletionHandler&&);
+    typedef std::function<void (const Data& mappedBody)> MappedBodyHandler;
+    void store(const Record&, MappedBodyHandler&&);
 
     void remove(const Key&);
 
     struct RecordInfo {
         size_t bodySize { 0 };
         double worth { -1 }; // 0-1 where 1 is the most valuable.
+        unsigned bodyShareCount { 0 };
+        String bodyHash;
     };
     enum TraverseFlag {
         ComputeWorth = 1 << 0,
+        ShareCount = 1 << 1,
     };
     typedef unsigned TraverseFlags;
     // Null record signals end.
     void traverse(TraverseFlags, std::function<void (const Record*, const RecordInfo&)>&&);
 
     void setCapacity(size_t);
+    size_t approximateSize() const;
     void clear();
 
-    static const unsigned version = 2;
+    static const unsigned version = 3;
 
     const String& baseDirectoryPath() const { return m_baseDirectoryPath; }
     const String& directoryPath() const { return m_directoryPath; }
@@ -96,17 +100,17 @@
     };
     void dispatchReadOperation(const ReadOperation&);
     void dispatchPendingReadOperations();
+    void finishReadOperation(const ReadOperation&, std::unique_ptr<Record>);
 
     struct WriteOperation {
         Record record;
-        Optional<Record> existingRecord;
-        StoreCompletionHandler completionHandler;
+        MappedBodyHandler mappedBodyHandler;
     };
-    void dispatchFullWriteOperation(const WriteOperation&);
-    void dispatchHeaderWriteOperation(const WriteOperation&);
+    void dispatchWriteOperation(const WriteOperation&);
     void dispatchPendingWriteOperations();
+    void finishWriteOperation(const WriteOperation&);
 
-    void updateFileModificationTime(IOChannel&);
+    void updateFileModificationTime(const String& path);
 
     WorkQueue& ioQueue() { return m_ioQueue.get(); }
     WorkQueue& backgroundIOQueue() { return m_backgroundIOQueue.get(); }
@@ -141,6 +145,8 @@
     Ref<WorkQueue> m_ioQueue;
     Ref<WorkQueue> m_backgroundIOQueue;
     Ref<WorkQueue> m_serialBackgroundIOQueue;
+
+    BlobStorage m_blobStorage;
 };
 
 }

Modified: trunk/Source/WebKit2/WebKit2.xcodeproj/project.pbxproj (182802 => 182803)


--- trunk/Source/WebKit2/WebKit2.xcodeproj/project.pbxproj	2015-04-14 17:57:36 UTC (rev 182802)
+++ trunk/Source/WebKit2/WebKit2.xcodeproj/project.pbxproj	2015-04-14 18:40:27 UTC (rev 182803)
@@ -1796,6 +1796,8 @@
 		E489D28E1A0A2DB80078C06A /* NetworkCacheDecoder.h in Headers */ = {isa = PBXBuildFile; fileRef = E489D2871A0A2DB80078C06A /* NetworkCacheDecoder.h */; };
 		E489D28F1A0A2DB80078C06A /* NetworkCacheEncoder.cpp in Sources */ = {isa = PBXBuildFile; fileRef = E489D2881A0A2DB80078C06A /* NetworkCacheEncoder.cpp */; };
 		E489D2901A0A2DB80078C06A /* NetworkCacheEncoder.h in Headers */ = {isa = PBXBuildFile; fileRef = E489D2891A0A2DB80078C06A /* NetworkCacheEncoder.h */; };
+		E49D40D71AD3FB170066B7B9 /* NetworkCacheBlobStorage.h in Headers */ = {isa = PBXBuildFile; fileRef = E49D40D61AD3FB170066B7B9 /* NetworkCacheBlobStorage.h */; };
+		E49D40D91AD3FB210066B7B9 /* NetworkCacheBlobStorage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = E49D40D81AD3FB210066B7B9 /* NetworkCacheBlobStorage.cpp */; };
 		ED82A7F2128C6FAF004477B3 /* WKBundlePageOverlay.h in Headers */ = {isa = PBXBuildFile; fileRef = 1A22F0FF1289FCD90085E74F /* WKBundlePageOverlay.h */; settings = {ATTRIBUTES = (Private, ); }; };
 		EDCA71B7128DDA8C00201B26 /* WKBundlePageOverlay.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 1A22F1001289FCD90085E74F /* WKBundlePageOverlay.cpp */; };
 		F036978815F4BF0500C3A80E /* WebColorPicker.cpp in Sources */ = {isa = PBXBuildFile; fileRef = F036978715F4BF0500C3A80E /* WebColorPicker.cpp */; };
@@ -4059,6 +4061,8 @@
 		E489D2871A0A2DB80078C06A /* NetworkCacheDecoder.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = NetworkCacheDecoder.h; sourceTree = "<group>"; };
 		E489D2881A0A2DB80078C06A /* NetworkCacheEncoder.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = NetworkCacheEncoder.cpp; sourceTree = "<group>"; };
 		E489D2891A0A2DB80078C06A /* NetworkCacheEncoder.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = NetworkCacheEncoder.h; sourceTree = "<group>"; };
+		E49D40D61AD3FB170066B7B9 /* NetworkCacheBlobStorage.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = NetworkCacheBlobStorage.h; sourceTree = "<group>"; };
+		E49D40D81AD3FB210066B7B9 /* NetworkCacheBlobStorage.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = NetworkCacheBlobStorage.cpp; sourceTree = "<group>"; };
 		F036978715F4BF0500C3A80E /* WebColorPicker.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = WebColorPicker.cpp; sourceTree = "<group>"; };
 		F6113E24126CE1820057D0A7 /* APIUserContentURLPattern.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = APIUserContentURLPattern.h; sourceTree = "<group>"; };
 		F6113E26126CE19B0057D0A7 /* WKUserContentURLPattern.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = WKUserContentURLPattern.cpp; sourceTree = "<group>"; };
@@ -7498,6 +7502,8 @@
 		E489D2821A0A2BE80078C06A /* cache */ = {
 			isa = PBXGroup;
 			children = (
+				E49D40D81AD3FB210066B7B9 /* NetworkCacheBlobStorage.cpp */,
+				E49D40D61AD3FB170066B7B9 /* NetworkCacheBlobStorage.h */,
 				E4436EBE1A0CFDB200EAD204 /* NetworkCache.cpp */,
 				E4436EBF1A0CFDB200EAD204 /* NetworkCache.h */,
 				E489D2831A0A2DB80078C06A /* NetworkCacheCoder.h */,
@@ -8149,6 +8155,7 @@
 				1AAF08A2192681D100B6390C /* WebUserContentControllerProxy.h in Headers */,
 				7C361D79192803BD0036A59D /* WebUserContentControllerProxyMessages.h in Headers */,
 				3F889D15188778C900FEADAF /* WebVideoFullscreenManagerProxy.h in Headers */,
+				E49D40D71AD3FB170066B7B9 /* NetworkCacheBlobStorage.h in Headers */,
 				29CD55AA128E294F00133C85 /* WKAccessibilityWebPageObjectBase.h in Headers */,
 				29232DF418B29D6800D0596F /* WKAccessibilityWebPageObjectMac.h in Headers */,
 				2D0730A319F9C7DA00E9D9C4 /* WKActionMenuController.h in Headers */,
@@ -9512,6 +9519,7 @@
 				CD67D30E15C08F9A00843ADF /* InjectedBundlePageDiagnosticLoggingClient.cpp in Sources */,
 				E1EE53E711F8CFFB00CCBEE4 /* InjectedBundlePageEditorClient.cpp in Sources */,
 				BC14E109120B905E00826C0C /* InjectedBundlePageFormClient.cpp in Sources */,
+				E49D40D91AD3FB210066B7B9 /* NetworkCacheBlobStorage.cpp in Sources */,
 				CD5C66A0134B9D38004FE2A8 /* InjectedBundlePageFullScreenClient.cpp in Sources */,
 				BCA8C6A811E3BA5F00812FB7 /* InjectedBundlePageLoaderClient.cpp in Sources */,
 				BC8147AA12F64CDA007B2C32 /* InjectedBundlePagePolicyClient.cpp in Sources */,
_______________________________________________
webkit-changes mailing list
webkit-changes@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to