Re: [Freedreno] [PATCH RFC v1 00/52] drm/crtc: Rename struct drm_crtc::dev to drm_dev
On 13/07/2023 16:09, Thomas Zimmermann wrote: Hi Am 13.07.23 um 16:41 schrieb Sean Paul: On Thu, Jul 13, 2023 at 9:04 AM Uwe Kleine-König wrote: hello Sean, On Wed, Jul 12, 2023 at 02:31:02PM -0400, Sean Paul wrote: I'd really prefer this patch (series or single) is not accepted. This will cause problems for everyone cherry-picking patches to a downstream kernel (LTS or distro tree). I usually wouldn't expect sympathy here, but the questionable benefit does not outweigh the cost IM[biased]O. I agree that for backports this isn't so nice. However with the split approach (that was argumented against here) it's not soo bad. Patch #1 (and similar changes for the other affected structures) could be trivially backported and with that it doesn't matter if you write dev or drm (or whatever name is chosen in the end); both work in the same way. Patch #1 avoids the need to backport the entire set, however every change occuring after the rename patches will cause conflicts on future cherry-picks. Downstream kernels will have to backport the whole set. Backporting the entire set will create an epoch in downstream kernels where cherry-picking patches preceding this set will need to undergo conflict resolution as well. As mentioned in my previous email, I don't expect sympathy here, it's part of maintaining a downstream kernel, but there is a real cost to kernel consumers. But even with the one-patch-per-rename approach I'd consider the renaming a net win, because ease of understanding code has a big value. It's value is not so easy measurable as "conflicts when backporting", but it also matters in say two years from now, while backporting shouldn't be an issue then any more. You've rightly identified the conjecture in your statement. I've been on both sides of the argument, having written/maintained drm code upstream and cherry-picked changes to a downstream kernel. Perhaps it's because drm's definition of dev is ingrained in my muscle memory, or maybe it's because I don't do a lot of upstream development these days, but I just have a hard time seeing the benefit here. I can only second what Sean writes. I've done quite a bit of backporting of DRM code. It's hard already. And this kind of change is going to to affect almost every backported DRM patch in the coming years. Not just for distribution kernels, but also for upstream's stable series. It's really only possible to do this change over many releases while keeping compatible with the old name. So the more I think about it, the less I like this change. I've done my share of backporting, and still am doing it, so I can say I dislike it as much as anyone, however.. Is this an argument which the kernel as a wider entity typically accepts? If not could it be a slippery slope to start a precedent? It is a honest question - I am not familiar if there were or were not any similar discussions in the past. My gut feeling is that *if* there is a consensus that something _improves_ the code base significantly, backporting pains should probably not be weighted very heavily as a contra argument. Regards, Tvrtko
Re: [patch 14/30] drm/i915/pmu: Replace open coded kstat_irqs() copy
On 10/12/2020 19:25, Thomas Gleixner wrote: Driver code has no business with the internals of the irq descriptor. Aside of that the count is per interrupt line and therefore takes interrupts from other devices into account which share the interrupt line and are not handled by the graphics driver. Replace it with a pmu private count which only counts interrupts which originate from the graphics card. To avoid atomics or heuristics of some sort make the counter field 'unsigned long'. That limits the count to 4e9 on 32bit which is a lot and postprocessing can easily deal with the occasional wraparound. After my failed hasty sketch from last night I had a different one which was kind of heuristics based (re-reading the upper dword and retrying if it changed on 32-bit). But you are right - it is okay to at least start like this today and if later there is a need we can either do that or deal with wrap at PMU read time. So thanks for dealing with it, some small comments below but overall it is fine. Signed-off-by: Thomas Gleixner Cc: Tvrtko Ursulin Cc: Jani Nikula Cc: Joonas Lahtinen Cc: Rodrigo Vivi Cc: David Airlie Cc: Daniel Vetter Cc: intel-...@lists.freedesktop.org Cc: dri-de...@lists.freedesktop.org --- drivers/gpu/drm/i915/i915_irq.c | 34 ++ drivers/gpu/drm/i915/i915_pmu.c | 18 +- drivers/gpu/drm/i915/i915_pmu.h |8 3 files changed, 43 insertions(+), 17 deletions(-) --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -60,6 +60,24 @@ * and related files, but that will be described in separate chapters. */ +/* + * Interrupt statistic for PMU. Increments the counter only if the + * interrupt originated from the the GPU so interrupts from a device which + * shares the interrupt line are not accounted. + */ +static inline void pmu_irq_stats(struct drm_i915_private *priv, We never use priv as a local name, it should be either i915 or dev_priv. +irqreturn_t res) +{ + if (unlikely(res != IRQ_HANDLED)) + return; + + /* +* A clever compiler translates that into INC. A not so clever one +* should at least prevent store tearing. +*/ + WRITE_ONCE(priv->pmu.irq_count, priv->pmu.irq_count + 1); Curious, probably more educational for me - given x86_32 and x86_64, and the context of it getting called, what is the difference from just doing irq_count++? +} + typedef bool (*long_pulse_detect_func)(enum hpd_pin pin, u32 val); static const u32 hpd_ilk[HPD_NUM_PINS] = { @@ -1599,6 +1617,8 @@ static irqreturn_t valleyview_irq_handle valleyview_pipestat_irq_handler(dev_priv, pipe_stats); } while (0); + pmu_irq_stats(dev_priv, ret); + enable_rpm_wakeref_asserts(_priv->runtime_pm); return ret; @@ -1676,6 +1696,8 @@ static irqreturn_t cherryview_irq_handle valleyview_pipestat_irq_handler(dev_priv, pipe_stats); } while (0); + pmu_irq_stats(dev_priv, ret); + enable_rpm_wakeref_asserts(_priv->runtime_pm); return ret; @@ -2103,6 +2125,8 @@ static irqreturn_t ilk_irq_handler(int i if (sde_ier) raw_reg_write(regs, SDEIER, sde_ier); + pmu_irq_stats(i915, ret); + /* IRQs are synced during runtime_suspend, we don't require a wakeref */ enable_rpm_wakeref_asserts(>runtime_pm); @@ -2419,6 +2443,8 @@ static irqreturn_t gen8_irq_handler(int gen8_master_intr_enable(regs); + pmu_irq_stats(dev_priv, IRQ_HANDLED); + return IRQ_HANDLED; } @@ -2514,6 +2540,8 @@ static __always_inline irqreturn_t gen11_gu_misc_irq_handler(gt, gu_misc_iir); + pmu_irq_stats(i915, IRQ_HANDLED); + return IRQ_HANDLED; } @@ -3688,6 +3716,8 @@ static irqreturn_t i8xx_irq_handler(int i8xx_pipestat_irq_handler(dev_priv, iir, pipe_stats); } while (0); + pmu_irq_stats(dev_priv, ret); + enable_rpm_wakeref_asserts(_priv->runtime_pm); return ret; @@ -3796,6 +3826,8 @@ static irqreturn_t i915_irq_handler(int i915_pipestat_irq_handler(dev_priv, iir, pipe_stats); } while (0); + pmu_irq_stats(dev_priv, ret); + enable_rpm_wakeref_asserts(_priv->runtime_pm); return ret; @@ -3941,6 +3973,8 @@ static irqreturn_t i965_irq_handler(int i965_pipestat_irq_handler(dev_priv, iir, pipe_stats); } while (0); + pmu_irq_stats(dev_priv, IRQ_HANDLED); + enable_rpm_wakeref_asserts(_priv->runtime_pm); return ret; --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -423,22 +423,6 @@ static enum hrtimer_restart i915_sample( return HRTIMER_RESTART; } In this file you can also drop the #include line. -static u64 count_interrupts(struct drm_i915_private *i915) -{ - /* open-coded kstat_irqs() */
Re: [PATCH 06/11] drm/i915: use vmap in shmem_pin_map
On 24/09/2020 14:58, Christoph Hellwig wrote: shmem_pin_map somewhat awkwardly reimplements vmap using alloc_vm_area and manual pte setup. The only practical difference is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't seem to be required here (and could be added to vmap using a flag if actually required). Switch to use vmap, and use vfree to free both the vmalloc mapping and the page array, as well as dropping the references to each page. Signed-off-by: Christoph Hellwig --- drivers/gpu/drm/i915/gt/shmem_utils.c | 76 +++ 1 file changed, 18 insertions(+), 58 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index 43c7acbdc79dea..f011ea42487e11 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -49,80 +49,40 @@ struct file *shmem_create_from_object(struct drm_i915_gem_object *obj) return file; } -static size_t shmem_npte(struct file *file) -{ - return file->f_mapping->host->i_size >> PAGE_SHIFT; -} - -static void __shmem_unpin_map(struct file *file, void *ptr, size_t n_pte) -{ - unsigned long pfn; - - vunmap(ptr); - - for (pfn = 0; pfn < n_pte; pfn++) { - struct page *page; - - page = shmem_read_mapping_page_gfp(file->f_mapping, pfn, - GFP_KERNEL); - if (!WARN_ON(IS_ERR(page))) { - put_page(page); - put_page(page); - } - } -} - void *shmem_pin_map(struct file *file) { - const size_t n_pte = shmem_npte(file); - pte_t *stack[32], **ptes, **mem; - struct vm_struct *area; - unsigned long pfn; - - mem = stack; - if (n_pte > ARRAY_SIZE(stack)) { - mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL); - if (!mem) - return NULL; - } + struct page **pages; + size_t n_pages, i; + void *vaddr; - area = alloc_vm_area(n_pte << PAGE_SHIFT, mem); - if (!area) { - if (mem != stack) - kvfree(mem); + n_pages = file->f_mapping->host->i_size >> PAGE_SHIFT; + pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL); + if (!pages) return NULL; - } - ptes = mem; - for (pfn = 0; pfn < n_pte; pfn++) { - struct page *page; - - page = shmem_read_mapping_page_gfp(file->f_mapping, pfn, - GFP_KERNEL); - if (IS_ERR(page)) + for (i = 0; i < n_pages; i++) { + pages[i] = shmem_read_mapping_page_gfp(file->f_mapping, i, + GFP_KERNEL); + if (IS_ERR(pages[i])) goto err_page; - - **ptes++ = mk_pte(page, PAGE_KERNEL); } - if (mem != stack) - kvfree(mem); - + vaddr = vmap(pages, n_pages, VM_MAP_PUT_PAGES, PAGE_KERNEL); + if (!vaddr) + goto err_page; mapping_set_unevictable(file->f_mapping); - return area->addr; - + return vaddr; err_page: - if (mem != stack) - kvfree(mem); - - __shmem_unpin_map(file, area->addr, pfn); + while (--i >= 0) + put_page(pages[i]); + kvfree(pages); return NULL; } void shmem_unpin_map(struct file *file, void *ptr) { mapping_clear_unevictable(file->f_mapping); - __shmem_unpin_map(file, ptr, shmem_npte(file)); + vfree(ptr); } static int __shmem_rw(struct file *file, loff_t off, Reviewed-by: Tvrtko Ursulin Regards, Tvrtko
Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map
On 23/09/2020 15:44, Christoph Hellwig wrote: On Wed, Sep 23, 2020 at 02:58:43PM +0100, Tvrtko Ursulin wrote: Series did not get a CI run from our side because of a different base so I don't know if you would like to have a run there? If so you would need to rebase against git://anongit.freedesktop.org/drm-tip drm-tip and you could even send a series to intel-gfx-try...@lists.freedesktop.org, suppressing cc, to check it out without sending a copy to the real mailing list. It doesn't seem like I can post to any freedesktop list, as I always get rejection messages. But I'll happily prepare a branch if one of you an feed it into your CI. That's fine, just ping me and I will forward it for testing, thanks! git://git.infradead.org/users/hch/misc.git i915-vmap-wip Gitweb: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/i915-vmap-wip note that this includes a new commit to clean up one of the recent commits in the code. CI says series looks good from the i915 perspective (*). I don't know how will you handle it logistically, but when you have final version I am happy to re-read and r-b the i915 patches. Regards, Tvrtko *) https://patchwork.freedesktop.org/series/82051/
Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map
On 23/09/2020 14:41, Christoph Hellwig wrote: On Wed, Sep 23, 2020 at 10:52:33AM +0100, Tvrtko Ursulin wrote: On 18/09/2020 17:37, Christoph Hellwig wrote: i915_gem_object_map implements fairly low-level vmap functionality in a driver. Split it into two helpers, one for remapping kernel memory which can use vmap, and one for I/O memory that uses vmap_pfn. The only practical difference is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't seem to be required here for the kernel memory case (and could be added to vmap using a flag if actually required). Patch looks good to me. Series did not get a CI run from our side because of a different base so I don't know if you would like to have a run there? If so you would need to rebase against git://anongit.freedesktop.org/drm-tip drm-tip and you could even send a series to intel-gfx-try...@lists.freedesktop.org, suppressing cc, to check it out without sending a copy to the real mailing list. It doesn't seem like I can post to any freedesktop list, as I always get rejection messages. But I'll happily prepare a branch if one of you an feed it into your CI. That's fine, just ping me and I will forward it for testing, thanks! Regards, Tvrtko
Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map
On 18/09/2020 17:37, Christoph Hellwig wrote: i915_gem_object_map implements fairly low-level vmap functionality in a driver. Split it into two helpers, one for remapping kernel memory which can use vmap, and one for I/O memory that uses vmap_pfn. The only practical difference is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't seem to be required here for the kernel memory case (and could be added to vmap using a flag if actually required). Patch looks good to me. Series did not get a CI run from our side because of a different base so I don't know if you would like to have a run there? If so you would need to rebase against git://anongit.freedesktop.org/drm-tip drm-tip and you could even send a series to intel-gfx-try...@lists.freedesktop.org, suppressing cc, to check it out without sending a copy to the real mailing list. Regards, Tvrtko Signed-off-by: Christoph Hellwig --- drivers/gpu/drm/i915/Kconfig | 1 + drivers/gpu/drm/i915/gem/i915_gem_pages.c | 101 ++ 2 files changed, 47 insertions(+), 55 deletions(-) diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig index 9afa5c4a6bf006..1e1cb245fca778 100644 --- a/drivers/gpu/drm/i915/Kconfig +++ b/drivers/gpu/drm/i915/Kconfig @@ -25,6 +25,7 @@ config DRM_I915 select CRC32 select SND_HDA_I915 if SND_HDA_CORE select CEC_CORE if CEC_NOTIFIER + select VMAP_PFN help Choose this option if you have a system that has "Intel Graphics Media Accelerator" or "HD Graphics" integrated graphics, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index e8a083743e0927..90029ea83aede9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -234,50 +234,24 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj) return err; } -static inline pte_t iomap_pte(resource_size_t base, - dma_addr_t offset, - pgprot_t prot) -{ - return pte_mkspecial(pfn_pte((base + offset) >> PAGE_SHIFT, prot)); -} - /* The 'mapping' part of i915_gem_object_pin_map() below */ -static void *i915_gem_object_map(struct drm_i915_gem_object *obj, +static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj, enum i915_map_type type) { - unsigned long n_pte = obj->base.size >> PAGE_SHIFT; - struct sg_table *sgt = obj->mm.pages; - pte_t *stack[32], **mem; - struct vm_struct *area; + unsigned long n_pages = obj->base.size >> PAGE_SHIFT, i; + struct page *stack[32], **pages = stack, *page; + struct sgt_iter iter; pgprot_t pgprot; - - if (!i915_gem_object_has_struct_page(obj) && type != I915_MAP_WC) - return NULL; - - /* A single page can always be kmapped */ - if (n_pte == 1 && type == I915_MAP_WB) - return kmap(sg_page(sgt->sgl)); - - mem = stack; - if (n_pte > ARRAY_SIZE(stack)) { - /* Too big for stack -- allocate temporary array instead */ - mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL); - if (!mem) - return NULL; - } - - area = alloc_vm_area(obj->base.size, mem); - if (!area) { - if (mem != stack) - kvfree(mem); - return NULL; - } + void *vaddr; switch (type) { default: MISSING_CASE(type); fallthrough;/* to use PAGE_KERNEL anyway */ case I915_MAP_WB: + /* A single page can always be kmapped */ + if (n_pages == 1) + return kmap(sg_page(obj->mm.pages->sgl)); pgprot = PAGE_KERNEL; break; case I915_MAP_WC: @@ -285,30 +259,44 @@ static void *i915_gem_object_map(struct drm_i915_gem_object *obj, break; } - if (i915_gem_object_has_struct_page(obj)) { - struct sgt_iter iter; - struct page *page; - pte_t **ptes = mem; - - for_each_sgt_page(page, iter, sgt) - **ptes++ = mk_pte(page, pgprot); - } else { - resource_size_t iomap; - struct sgt_iter iter; - pte_t **ptes = mem; - dma_addr_t addr; + if (n_pages > ARRAY_SIZE(stack)) { + /* Too big for stack -- allocate temporary array instead */ + pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL); + if (!pages) + return NULL; + } - iomap = obj->mm.region->iomap.base; - iomap -= obj->mm.region->region.start; + for_each_sgt_page(page, iter, obj->mm.pages) + pages[i++] = page; + vaddr =
Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map
On 22/09/2020 17:33, Christoph Hellwig wrote: On Tue, Sep 22, 2020 at 05:13:45PM +0100, Tvrtko Ursulin wrote: void *shmem_pin_map(struct file *file) { - const size_t n_pte = shmem_npte(file); - pte_t *stack[32], **ptes, **mem; Chris can comment how much he'd miss the 32 page stack shortcut. I'd like to see a profile that claim that kmalloc matters in a path that does a vmap and reads pages through the page cache. Especially when the kmalloc saves doing another page cache lookup on the free side. Only reason I can come up with now is if mapping side is on a latency sensitive path, while un-mapping is lazy/delayed so can be more costly. Then fast map and extra cost on unmap may make sense. It more applies to the other i915 patch, which implements a much more used API, but whether or not we can demonstrate any difference in the perf profiles I couldn't tell you without trying to collect some. Is there something in vmap() preventing us from freeing the pages array here? I can't spot anything that is holding on to the pointer. Or it was just a sketch before you realized we could walk the vm_area? Also, I may be totally misunderstanding something, but I think you need to assign area->pages manually so shmem_unpin_map can access it below. We need area->pages to hold the pages for the free side. That being said the patch I posted is broken because it never assigned to that. As said it was a sketch. This is the patch I just rebooted into on my Laptop: http://git.infradead.org/users/hch/misc.git/commitdiff/048522dfa26b6667adfb0371ff530dc263abe829 it needs extra prep patches from the series: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/alloc_vm_area mapping_clear_unevictable(file->f_mapping); - __shmem_unpin_map(file, ptr, shmem_npte(file)); + for (i = 0; i < shmem_npages(file); i++) + put_page(area->pages[i]); + kvfree(area->pages); + vunmap(ptr); Is the verdict from mm experts that we can't use vfree due __free_pages vs put_page differences? Switched to vfree now. Could we get from ptes to pages, so that we don't have to keep the area->pages array allocated for the duration of the pin? We could do vmalloc_to_page, but that is fairly expensive (not as bad as reading from the page cache..). Are you really worried about the allocation? Not so much given how we don't even use shmem_pin_map outside selftests. If we start using it I expect it will be for tiny objects anyway. Only if they end up being pinned for the lifetime of the driver, it may be a pointless waste of memory compared to the downsides of vmalloc_to_page. But we can revisit this particular edge case optimization if the need arises. I'll look at your other i915 patch tomorrow. Regards, Tvrtko
Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map
On 22/09/2020 15:31, Christoph Hellwig wrote: On Tue, Sep 22, 2020 at 09:23:59AM +0100, Tvrtko Ursulin wrote: If I understood this sub-thread correctly, iterating and freeing the pages via the vmapped ptes, so no need for a shmem_read_mapping_page_gfp loop in shmem_unpin_map looks plausible to me. I did not get the reference to kernel/dma/remap.c though, What I mean is the code in dma_common_find_pages, which returns the page array for freeing. Got it. and also not sure how to do the error unwind path in shmem_pin_map at which point the allocated vm area hasn't been fully populated yet. Hand-roll the loop walking vm area struct in there? Yes. What I originally did (re-created as I didn't save it) would be something like this: --- From 5605e77cda246df6dd7ded99ec22cb3f341ef5d5 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 16 Sep 2020 13:54:04 +0200 Subject: drm/i915: use vmap in shmem_pin_map shmem_pin_map somewhat awkwardly reimplements vmap using alloc_vm_area and manual pte setup. The only practical difference is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't seem to be required here (and could be added to vmap using a flag if actually required). Signed-off-by: Christoph Hellwig --- drivers/gpu/drm/i915/gt/shmem_utils.c | 81 +-- 1 file changed, 27 insertions(+), 54 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index 43c7acbdc79dea..7ec6ba4c1065b2 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -49,80 +49,53 @@ struct file *shmem_create_from_object(struct drm_i915_gem_object *obj) return file; } -static size_t shmem_npte(struct file *file) +static size_t shmem_npages(struct file *file) { return file->f_mapping->host->i_size >> PAGE_SHIFT; } -static void __shmem_unpin_map(struct file *file, void *ptr, size_t n_pte) -{ - unsigned long pfn; - - vunmap(ptr); - - for (pfn = 0; pfn < n_pte; pfn++) { - struct page *page; - - page = shmem_read_mapping_page_gfp(file->f_mapping, pfn, - GFP_KERNEL); - if (!WARN_ON(IS_ERR(page))) { - put_page(page); - put_page(page); - } - } -} - void *shmem_pin_map(struct file *file) { - const size_t n_pte = shmem_npte(file); - pte_t *stack[32], **ptes, **mem; Chris can comment how much he'd miss the 32 page stack shortcut. - struct vm_struct *area; - unsigned long pfn; - - mem = stack; - if (n_pte > ARRAY_SIZE(stack)) { - mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL); - if (!mem) - return NULL; - } + size_t n_pages = shmem_npages(file), i; + struct page **pages; + void *vaddr; - area = alloc_vm_area(n_pte << PAGE_SHIFT, mem); - if (!area) { - if (mem != stack) - kvfree(mem); + pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL); + if (!pages) return NULL; - } - - ptes = mem; - for (pfn = 0; pfn < n_pte; pfn++) { - struct page *page; - page = shmem_read_mapping_page_gfp(file->f_mapping, pfn, - GFP_KERNEL); - if (IS_ERR(page)) + for (i = 0; i < n_pages; i++) { + pages[i] = shmem_read_mapping_page_gfp(file->f_mapping, i, + GFP_KERNEL); + if (IS_ERR(pages[i])) goto err_page; - - **ptes++ = mk_pte(page, PAGE_KERNEL); } - if (mem != stack) - kvfree(mem); - + vaddr = vmap(pages, n_pages, 0, PAGE_KERNEL); + if (!vaddr) + goto err_page; mapping_set_unevictable(file->f_mapping); - return area->addr; - + return vaddr; Is there something in vmap() preventing us from freeing the pages array here? I can't spot anything that is holding on to the pointer. Or it was just a sketch before you realized we could walk the vm_area? Also, I may be totally misunderstanding something, but I think you need to assign area->pages manually so shmem_unpin_map can access it below. err_page: - if (mem != stack) - kvfree(mem); - - __shmem_unpin_map(file, area->addr, pfn); + while (--i >= 0) + put_page(pages[i]); + kvfree(pages); return NULL; } void shmem_unpin_map(struct file *file, void *ptr) { + struct vm_struct *area = find_vm_area(ptr); + size_t i = shmem_npages(file); + + if (WARN_ON_ONCE(!area || !area->pages)) + return; + mapping_clear_unevi
Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map
On 22/09/2020 07:22, Christoph Hellwig wrote: On Mon, Sep 21, 2020 at 08:11:57PM +0100, Matthew Wilcox wrote: This is awkward. I'd like it if we had a vfree() variant which called put_page() instead of __free_pages(). I'd like it even more if we used release_pages() instead of our own loop that called put_page(). Note that we don't need a new vfree variant, we can do this manually if we want, take a look at kernel/dma/remap.c. But I thought this code intentionally doesn't want to do that to avoid locking in the memory for the pages array. Maybe the i915 maintainers can clarify. + Chris & Matt who were involved with this part of i915. If I understood this sub-thread correctly, iterating and freeing the pages via the vmapped ptes, so no need for a shmem_read_mapping_page_gfp loop in shmem_unpin_map looks plausible to me. I did not get the reference to kernel/dma/remap.c though, and also not sure how to do the error unwind path in shmem_pin_map at which point the allocated vm area hasn't been fully populated yet. Hand-roll the loop walking vm area struct in there? Regards, Tvrtko
Re: [PATCH v2 06/21] drm/i915: Introduce GEM object functions
On 15/09/2020 15:59, Thomas Zimmermann wrote: GEM object functions deprecate several similar callback interfaces in struct drm_driver. This patch replaces the per-driver callbacks with per-instance callbacks in i915. v2: * move object-function instance to i915_gem_object.c (Jani) Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/i915/gem/i915_gem_object.c| 21 --- drivers/gpu/drm/i915/gem/i915_gem_object.h| 3 --- drivers/gpu/drm/i915/i915_drv.c | 4 .../gpu/drm/i915/selftests/mock_gem_device.c | 3 --- 4 files changed, 18 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index c8421fd9d2dc..3389ac972d16 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -39,9 +39,18 @@ static struct i915_global_object { struct kmem_cache *slab_objects; } global; +static const struct drm_gem_object_funcs i915_gem_object_funcs; + struct drm_i915_gem_object *i915_gem_object_alloc(void) { - return kmem_cache_zalloc(global.slab_objects, GFP_KERNEL); + struct drm_i915_gem_object *obj; + + obj = kmem_cache_zalloc(global.slab_objects, GFP_KERNEL); + if (!obj) + return NULL; + obj->base.funcs = _gem_object_funcs; + + return obj; } void i915_gem_object_free(struct drm_i915_gem_object *obj) @@ -101,7 +110,7 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj, !(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE); } -void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file) +static void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file) { struct drm_i915_gem_object *obj = to_intel_bo(gem); struct drm_i915_file_private *fpriv = file->driver_priv; @@ -264,7 +273,7 @@ static void __i915_gem_free_work(struct work_struct *work) i915_gem_flush_free_objects(i915); } -void i915_gem_free_object(struct drm_gem_object *gem_obj) +static void i915_gem_free_object(struct drm_gem_object *gem_obj) { struct drm_i915_gem_object *obj = to_intel_bo(gem_obj); struct drm_i915_private *i915 = to_i915(obj->base.dev); @@ -403,6 +412,12 @@ int __init i915_global_objects_init(void) return 0; } +static const struct drm_gem_object_funcs i915_gem_object_funcs = { + .free = i915_gem_free_object, + .close = i915_gem_close_object, + .export = i915_gem_prime_export, +}; + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/huge_gem_object.c" #include "selftests/huge_pages.c" diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index d46db8d8f38e..eaf3d4147be0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -38,9 +38,6 @@ void __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj, int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align); -void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file); -void i915_gem_free_object(struct drm_gem_object *obj); - void i915_gem_flush_free_objects(struct drm_i915_private *i915); struct sg_table * diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 94e00e450683..011a3fb41ece 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -1750,12 +1750,8 @@ static struct drm_driver driver = { .lastclose = i915_driver_lastclose, .postclose = i915_driver_postclose, - .gem_close_object = i915_gem_close_object, - .gem_free_object_unlocked = i915_gem_free_object, - .prime_handle_to_fd = drm_gem_prime_handle_to_fd, .prime_fd_to_handle = drm_gem_prime_fd_to_handle, - .gem_prime_export = i915_gem_prime_export, .gem_prime_import = i915_gem_prime_import, .dumb_create = i915_gem_dumb_create, diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index f127e633f7ca..9244b5d6fb01 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -87,9 +87,6 @@ static struct drm_driver mock_driver = { .name = "mock", .driver_features = DRIVER_GEM, .release = mock_device_release, - - .gem_close_object = i915_gem_close_object, - .gem_free_object_unlocked = i915_gem_free_object, }; static void release_dev(struct device *dev) Looks obviously fine. Reviewed-by: Tvrtko Ursulin Regards, Tvrtko