Re: [Freedreno] [PATCH RFC v1 00/52] drm/crtc: Rename struct drm_crtc::dev to drm_dev

2023-07-13 Thread Tvrtko Ursulin



On 13/07/2023 16:09, Thomas Zimmermann wrote:

Hi

Am 13.07.23 um 16:41 schrieb Sean Paul:

On Thu, Jul 13, 2023 at 9:04 AM Uwe Kleine-König
 wrote:


hello Sean,

On Wed, Jul 12, 2023 at 02:31:02PM -0400, Sean Paul wrote:

I'd really prefer this patch (series or single) is not accepted. This
will cause problems for everyone cherry-picking patches to a
downstream kernel (LTS or distro tree). I usually wouldn't expect
sympathy here, but the questionable benefit does not outweigh the cost
IM[biased]O.


I agree that for backports this isn't so nice. However with the split
approach (that was argumented against here) it's not soo bad. Patch #1
(and similar changes for the other affected structures) could be
trivially backported and with that it doesn't matter if you write dev or
drm (or whatever name is chosen in the end); both work in the same way.


Patch #1 avoids the need to backport the entire set, however every
change occuring after the rename patches will cause conflicts on
future cherry-picks. Downstream kernels will have to backport the
whole set. Backporting the entire set will create an epoch in
downstream kernels where cherry-picking patches preceding this set
will need to undergo conflict resolution as well. As mentioned in my
previous email, I don't expect sympathy here, it's part of maintaining
a downstream kernel, but there is a real cost to kernel consumers.



But even with the one-patch-per-rename approach I'd consider the
renaming a net win, because ease of understanding code has a big value.
It's value is not so easy measurable as "conflicts when backporting",
but it also matters in say two years from now, while backporting
shouldn't be an issue then any more.


You've rightly identified the conjecture in your statement. I've been
on both sides of the argument, having written/maintained drm code
upstream and cherry-picked changes to a downstream kernel. Perhaps
it's because drm's definition of dev is ingrained in my muscle memory,
or maybe it's because I don't do a lot of upstream development these
days, but I just have a hard time seeing the benefit here.


I can only second what Sean writes. I've done quite a bit of backporting 
of DRM code. It's hard already. And this kind of change is going to to 
affect almost every backported DRM patch in the coming years. Not just 
for distribution kernels, but also for upstream's stable series. It's 
really only possible to do this change over many releases while keeping 
compatible with the old name. So the more I think about it, the less I 
like this change.


I've done my share of backporting, and still am doing it, so I can say I 
dislike it as much as anyone, however.. Is this an argument which the 
kernel as a wider entity typically accepts? If not could it be a 
slippery slope to start a precedent?


It is a honest question - I am not familiar if there were or were not 
any similar discussions in the past.


My gut feeling is that *if* there is a consensus that something 
_improves_ the code base significantly, backporting pains should 
probably not be weighted very heavily as a contra argument.


Regards,

Tvrtko



Re: [patch 14/30] drm/i915/pmu: Replace open coded kstat_irqs() copy

2020-12-11 Thread Tvrtko Ursulin



On 10/12/2020 19:25, Thomas Gleixner wrote:

Driver code has no business with the internals of the irq descriptor.

Aside of that the count is per interrupt line and therefore takes
interrupts from other devices into account which share the interrupt line
and are not handled by the graphics driver.

Replace it with a pmu private count which only counts interrupts which
originate from the graphics card.

To avoid atomics or heuristics of some sort make the counter field
'unsigned long'. That limits the count to 4e9 on 32bit which is a lot and
postprocessing can easily deal with the occasional wraparound.


After my failed hasty sketch from last night I had a different one which 
was kind of heuristics based (re-reading the upper dword and retrying if 
it changed on 32-bit). But you are right - it is okay to at least start 
like this today and if later there is a need we can either do that or 
deal with wrap at PMU read time.


So thanks for dealing with it, some small comments below but overall it 
is fine.



Signed-off-by: Thomas Gleixner 
Cc: Tvrtko Ursulin 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: intel-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
---
  drivers/gpu/drm/i915/i915_irq.c |   34 ++
  drivers/gpu/drm/i915/i915_pmu.c |   18 +-
  drivers/gpu/drm/i915/i915_pmu.h |8 
  3 files changed, 43 insertions(+), 17 deletions(-)

--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -60,6 +60,24 @@
   * and related files, but that will be described in separate chapters.
   */
  
+/*

+ * Interrupt statistic for PMU. Increments the counter only if the
+ * interrupt originated from the the GPU so interrupts from a device which
+ * shares the interrupt line are not accounted.
+ */
+static inline void pmu_irq_stats(struct drm_i915_private *priv,


We never use priv as a local name, it should be either i915 or dev_priv.


+irqreturn_t res)
+{
+   if (unlikely(res != IRQ_HANDLED))
+   return;
+
+   /*
+* A clever compiler translates that into INC. A not so clever one
+* should at least prevent store tearing.
+*/
+   WRITE_ONCE(priv->pmu.irq_count, priv->pmu.irq_count + 1);


Curious, probably more educational for me - given x86_32 and x86_64, and 
the context of it getting called, what is the difference from just doing 
irq_count++?



+}
+
  typedef bool (*long_pulse_detect_func)(enum hpd_pin pin, u32 val);
  
  static const u32 hpd_ilk[HPD_NUM_PINS] = {

@@ -1599,6 +1617,8 @@ static irqreturn_t valleyview_irq_handle
valleyview_pipestat_irq_handler(dev_priv, pipe_stats);
} while (0);
  
+	pmu_irq_stats(dev_priv, ret);

+
enable_rpm_wakeref_asserts(_priv->runtime_pm);
  
  	return ret;

@@ -1676,6 +1696,8 @@ static irqreturn_t cherryview_irq_handle
valleyview_pipestat_irq_handler(dev_priv, pipe_stats);
} while (0);
  
+	pmu_irq_stats(dev_priv, ret);

+
enable_rpm_wakeref_asserts(_priv->runtime_pm);
  
  	return ret;

@@ -2103,6 +2125,8 @@ static irqreturn_t ilk_irq_handler(int i
if (sde_ier)
raw_reg_write(regs, SDEIER, sde_ier);
  
+	pmu_irq_stats(i915, ret);

+
/* IRQs are synced during runtime_suspend, we don't require a wakeref */
enable_rpm_wakeref_asserts(>runtime_pm);
  
@@ -2419,6 +2443,8 @@ static irqreturn_t gen8_irq_handler(int
  
  	gen8_master_intr_enable(regs);
  
+	pmu_irq_stats(dev_priv, IRQ_HANDLED);

+
return IRQ_HANDLED;
  }
  
@@ -2514,6 +2540,8 @@ static __always_inline irqreturn_t
  
  	gen11_gu_misc_irq_handler(gt, gu_misc_iir);
  
+	pmu_irq_stats(i915, IRQ_HANDLED);

+
return IRQ_HANDLED;
  }
  
@@ -3688,6 +3716,8 @@ static irqreturn_t i8xx_irq_handler(int

i8xx_pipestat_irq_handler(dev_priv, iir, pipe_stats);
} while (0);
  
+	pmu_irq_stats(dev_priv, ret);

+
enable_rpm_wakeref_asserts(_priv->runtime_pm);
  
  	return ret;

@@ -3796,6 +3826,8 @@ static irqreturn_t i915_irq_handler(int
i915_pipestat_irq_handler(dev_priv, iir, pipe_stats);
} while (0);
  
+	pmu_irq_stats(dev_priv, ret);

+
enable_rpm_wakeref_asserts(_priv->runtime_pm);
  
  	return ret;

@@ -3941,6 +3973,8 @@ static irqreturn_t i965_irq_handler(int
i965_pipestat_irq_handler(dev_priv, iir, pipe_stats);
} while (0);
  
+	pmu_irq_stats(dev_priv, IRQ_HANDLED);

+
enable_rpm_wakeref_asserts(_priv->runtime_pm);
  
  	return ret;

--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -423,22 +423,6 @@ static enum hrtimer_restart i915_sample(
return HRTIMER_RESTART;
  }


In this file you can also drop the #include  line.

  
-static u64 count_interrupts(struct drm_i915_private *i915)

-{
-   /* open-coded kstat_irqs() */

Re: [PATCH 06/11] drm/i915: use vmap in shmem_pin_map

2020-09-25 Thread Tvrtko Ursulin



On 24/09/2020 14:58, Christoph Hellwig wrote:

shmem_pin_map somewhat awkwardly reimplements vmap using
alloc_vm_area and manual pte setup.  The only practical difference
is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't
seem to be required here (and could be added to vmap using a flag if
actually required).  Switch to use vmap, and use vfree to free both the
vmalloc mapping and the page array, as well as dropping the references
to each page.

Signed-off-by: Christoph Hellwig 
---
  drivers/gpu/drm/i915/gt/shmem_utils.c | 76 +++
  1 file changed, 18 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c 
b/drivers/gpu/drm/i915/gt/shmem_utils.c
index 43c7acbdc79dea..f011ea42487e11 100644
--- a/drivers/gpu/drm/i915/gt/shmem_utils.c
+++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
@@ -49,80 +49,40 @@ struct file *shmem_create_from_object(struct 
drm_i915_gem_object *obj)
return file;
  }
  
-static size_t shmem_npte(struct file *file)

-{
-   return file->f_mapping->host->i_size >> PAGE_SHIFT;
-}
-
-static void __shmem_unpin_map(struct file *file, void *ptr, size_t n_pte)
-{
-   unsigned long pfn;
-
-   vunmap(ptr);
-
-   for (pfn = 0; pfn < n_pte; pfn++) {
-   struct page *page;
-
-   page = shmem_read_mapping_page_gfp(file->f_mapping, pfn,
-  GFP_KERNEL);
-   if (!WARN_ON(IS_ERR(page))) {
-   put_page(page);
-   put_page(page);
-   }
-   }
-}
-
  void *shmem_pin_map(struct file *file)
  {
-   const size_t n_pte = shmem_npte(file);
-   pte_t *stack[32], **ptes, **mem;
-   struct vm_struct *area;
-   unsigned long pfn;
-
-   mem = stack;
-   if (n_pte > ARRAY_SIZE(stack)) {
-   mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
-   if (!mem)
-   return NULL;
-   }
+   struct page **pages;
+   size_t n_pages, i;
+   void *vaddr;
  
-	area = alloc_vm_area(n_pte << PAGE_SHIFT, mem);

-   if (!area) {
-   if (mem != stack)
-   kvfree(mem);
+   n_pages = file->f_mapping->host->i_size >> PAGE_SHIFT;
+   pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
+   if (!pages)
return NULL;
-   }
  
-	ptes = mem;

-   for (pfn = 0; pfn < n_pte; pfn++) {
-   struct page *page;
-
-   page = shmem_read_mapping_page_gfp(file->f_mapping, pfn,
-  GFP_KERNEL);
-   if (IS_ERR(page))
+   for (i = 0; i < n_pages; i++) {
+   pages[i] = shmem_read_mapping_page_gfp(file->f_mapping, i,
+  GFP_KERNEL);
+   if (IS_ERR(pages[i]))
goto err_page;
-
-   **ptes++ = mk_pte(page,  PAGE_KERNEL);
}
  
-	if (mem != stack)

-   kvfree(mem);
-
+   vaddr = vmap(pages, n_pages, VM_MAP_PUT_PAGES, PAGE_KERNEL);
+   if (!vaddr)
+   goto err_page;
mapping_set_unevictable(file->f_mapping);
-   return area->addr;
-
+   return vaddr;
  err_page:
-   if (mem != stack)
-   kvfree(mem);
-
-   __shmem_unpin_map(file, area->addr, pfn);
+   while (--i >= 0)
+   put_page(pages[i]);
+   kvfree(pages);
return NULL;
  }
  
  void shmem_unpin_map(struct file *file, void *ptr)

  {
mapping_clear_unevictable(file->f_mapping);
-   __shmem_unpin_map(file, ptr, shmem_npte(file));
+   vfree(ptr);
  }
  
  static int __shmem_rw(struct file *file, loff_t off,




Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko



Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map

2020-09-24 Thread Tvrtko Ursulin



On 23/09/2020 15:44, Christoph Hellwig wrote:

On Wed, Sep 23, 2020 at 02:58:43PM +0100, Tvrtko Ursulin wrote:

Series did not get a CI run from our side because of a different base so I
don't know if you would like to have a run there? If so you would need to
rebase against git://anongit.freedesktop.org/drm-tip drm-tip and you could
even send a series to intel-gfx-try...@lists.freedesktop.org, suppressing
cc, to check it out without sending a copy to the real mailing list.


It doesn't seem like I can post to any freedesktop list, as I always
get rejection messages.  But I'll happily prepare a branch if one
of you an feed it into your CI.


That's fine, just ping me and I will forward it for testing, thanks!


 git://git.infradead.org/users/hch/misc.git i915-vmap-wip

Gitweb:

 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/i915-vmap-wip

note that this includes a new commit to clean up one of the recent
commits in the code.


CI says series looks good from the i915 perspective (*).

I don't know how will you handle it logistically, but when you have 
final version I am happy to re-read and r-b the i915 patches.



Regards,

Tvrtko

*)
https://patchwork.freedesktop.org/series/82051/



Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map

2020-09-23 Thread Tvrtko Ursulin



On 23/09/2020 14:41, Christoph Hellwig wrote:

On Wed, Sep 23, 2020 at 10:52:33AM +0100, Tvrtko Ursulin wrote:


On 18/09/2020 17:37, Christoph Hellwig wrote:

i915_gem_object_map implements fairly low-level vmap functionality in
a driver.  Split it into two helpers, one for remapping kernel memory
which can use vmap, and one for I/O memory that uses vmap_pfn.

The only practical difference is that alloc_vm_area prefeaults the
vmalloc area PTEs, which doesn't seem to be required here for the
kernel memory case (and could be added to vmap using a flag if actually
required).


Patch looks good to me.

Series did not get a CI run from our side because of a different base so I
don't know if you would like to have a run there? If so you would need to
rebase against git://anongit.freedesktop.org/drm-tip drm-tip and you could
even send a series to intel-gfx-try...@lists.freedesktop.org, suppressing
cc, to check it out without sending a copy to the real mailing list.


It doesn't seem like I can post to any freedesktop list, as I always
get rejection messages.  But I'll happily prepare a branch if one
of you an feed it into your CI.


That's fine, just ping me and I will forward it for testing, thanks!

Regards,

Tvrtko



Re: [Intel-gfx] [PATCH 4/6] drm/i915: use vmap in i915_gem_object_map

2020-09-23 Thread Tvrtko Ursulin



On 18/09/2020 17:37, Christoph Hellwig wrote:

i915_gem_object_map implements fairly low-level vmap functionality in
a driver.  Split it into two helpers, one for remapping kernel memory
which can use vmap, and one for I/O memory that uses vmap_pfn.

The only practical difference is that alloc_vm_area prefeaults the
vmalloc area PTEs, which doesn't seem to be required here for the
kernel memory case (and could be added to vmap using a flag if actually
required).


Patch looks good to me.

Series did not get a CI run from our side because of a different base so 
I don't know if you would like to have a run there? If so you would need 
to rebase against git://anongit.freedesktop.org/drm-tip drm-tip and you 
could even send a series to intel-gfx-try...@lists.freedesktop.org, 
suppressing cc, to check it out without sending a copy to the real 
mailing list.


Regards,

Tvrtko


Signed-off-by: Christoph Hellwig 
---
  drivers/gpu/drm/i915/Kconfig  |   1 +
  drivers/gpu/drm/i915/gem/i915_gem_pages.c | 101 ++
  2 files changed, 47 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 9afa5c4a6bf006..1e1cb245fca778 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -25,6 +25,7 @@ config DRM_I915
select CRC32
select SND_HDA_I915 if SND_HDA_CORE
select CEC_CORE if CEC_NOTIFIER
+   select VMAP_PFN
help
  Choose this option if you have a system that has "Intel Graphics
  Media Accelerator" or "HD Graphics" integrated graphics,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index e8a083743e0927..90029ea83aede9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -234,50 +234,24 @@ int __i915_gem_object_put_pages(struct 
drm_i915_gem_object *obj)
return err;
  }
  
-static inline pte_t iomap_pte(resource_size_t base,

- dma_addr_t offset,
- pgprot_t prot)
-{
-   return pte_mkspecial(pfn_pte((base + offset) >> PAGE_SHIFT, prot));
-}
-
  /* The 'mapping' part of i915_gem_object_pin_map() below */
-static void *i915_gem_object_map(struct drm_i915_gem_object *obj,
+static void *i915_gem_object_map_page(struct drm_i915_gem_object *obj,
 enum i915_map_type type)
  {
-   unsigned long n_pte = obj->base.size >> PAGE_SHIFT;
-   struct sg_table *sgt = obj->mm.pages;
-   pte_t *stack[32], **mem;
-   struct vm_struct *area;
+   unsigned long n_pages = obj->base.size >> PAGE_SHIFT, i;
+   struct page *stack[32], **pages = stack, *page;
+   struct sgt_iter iter;
pgprot_t pgprot;
-
-   if (!i915_gem_object_has_struct_page(obj) && type != I915_MAP_WC)
-   return NULL;
-
-   /* A single page can always be kmapped */
-   if (n_pte == 1 && type == I915_MAP_WB)
-   return kmap(sg_page(sgt->sgl));
-
-   mem = stack;
-   if (n_pte > ARRAY_SIZE(stack)) {
-   /* Too big for stack -- allocate temporary array instead */
-   mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
-   if (!mem)
-   return NULL;
-   }
-
-   area = alloc_vm_area(obj->base.size, mem);
-   if (!area) {
-   if (mem != stack)
-   kvfree(mem);
-   return NULL;
-   }
+   void *vaddr;
  
  	switch (type) {

default:
MISSING_CASE(type);
fallthrough;/* to use PAGE_KERNEL anyway */
case I915_MAP_WB:
+   /* A single page can always be kmapped */
+   if (n_pages == 1)
+   return kmap(sg_page(obj->mm.pages->sgl));
pgprot = PAGE_KERNEL;
break;
case I915_MAP_WC:
@@ -285,30 +259,44 @@ static void *i915_gem_object_map(struct 
drm_i915_gem_object *obj,
break;
}
  
-	if (i915_gem_object_has_struct_page(obj)) {

-   struct sgt_iter iter;
-   struct page *page;
-   pte_t **ptes = mem;
-
-   for_each_sgt_page(page, iter, sgt)
-   **ptes++ = mk_pte(page, pgprot);
-   } else {
-   resource_size_t iomap;
-   struct sgt_iter iter;
-   pte_t **ptes = mem;
-   dma_addr_t addr;
+   if (n_pages > ARRAY_SIZE(stack)) {
+   /* Too big for stack -- allocate temporary array instead */
+   pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
+   if (!pages)
+   return NULL;
+   }
  
-		iomap = obj->mm.region->iomap.base;

-   iomap -= obj->mm.region->region.start;
+   for_each_sgt_page(page, iter, obj->mm.pages)
+   pages[i++] = page;
+   vaddr = 

Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

2020-09-22 Thread Tvrtko Ursulin



On 22/09/2020 17:33, Christoph Hellwig wrote:

On Tue, Sep 22, 2020 at 05:13:45PM +0100, Tvrtko Ursulin wrote:

   void *shmem_pin_map(struct file *file)
   {
-   const size_t n_pte = shmem_npte(file);
-   pte_t *stack[32], **ptes, **mem;


Chris can comment how much he'd miss the 32 page stack shortcut.


I'd like to see a profile that claim that kmalloc matters in a
path that does a vmap and reads pages through the page cache.
Especially when the kmalloc saves doing another page cache lookup
on the free side.


Only reason I can come up with now is if mapping side is on a latency 
sensitive path, while un-mapping is lazy/delayed so can be more costly. 
Then fast map and extra cost on unmap may make sense.


It more applies to the other i915 patch, which implements a much more 
used API, but whether or not we can demonstrate any difference in the 
perf profiles I couldn't tell you without trying to collect some.



Is there something in vmap() preventing us from freeing the pages array
here? I can't spot anything that is holding on to the pointer. Or it was
just a sketch before you realized we could walk the vm_area?

Also, I may be totally misunderstanding something, but I think you need to
assign area->pages manually so shmem_unpin_map can access it below.


We need area->pages to hold the pages for the free side.  That being
said the patch I posted is broken because it never assigned to that.
As said it was a sketch.  This is the patch I just rebooted into on
my Laptop:

http://git.infradead.org/users/hch/misc.git/commitdiff/048522dfa26b6667adfb0371ff530dc263abe829

it needs extra prep patches from the series:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/alloc_vm_area


mapping_clear_unevictable(file->f_mapping);
-   __shmem_unpin_map(file, ptr, shmem_npte(file));
+   for (i = 0; i < shmem_npages(file); i++)
+   put_page(area->pages[i]);
+   kvfree(area->pages);
+   vunmap(ptr);


Is the verdict from mm experts that we can't use vfree due __free_pages vs
put_page differences?


Switched to vfree now.


Could we get from ptes to pages, so that we don't have to keep the
area->pages array allocated for the duration of the pin?


We could do vmalloc_to_page, but that is fairly expensive (not as bad
as reading from the page cache..).  Are you really worried about the
allocation?


Not so much given how we don't even use shmem_pin_map outside selftests.

If we start using it I expect it will be for tiny objects anyway. Only 
if they end up being pinned for the lifetime of the driver, it may be a 
pointless waste of memory compared to the downsides of vmalloc_to_page. 
But we can revisit this particular edge case optimization if the need 
arises.


I'll look at your other i915 patch tomorrow.

Regards,

Tvrtko



Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

2020-09-22 Thread Tvrtko Ursulin



On 22/09/2020 15:31, Christoph Hellwig wrote:

On Tue, Sep 22, 2020 at 09:23:59AM +0100, Tvrtko Ursulin wrote:

If I understood this sub-thread correctly, iterating and freeing the pages
via the vmapped ptes, so no need for a
shmem_read_mapping_page_gfp loop in shmem_unpin_map looks plausible to me.

I did not get the reference to kernel/dma/remap.c though,


What I mean is the code in dma_common_find_pages, which returns the
page array for freeing.


Got it.


and also not sure
how to do the error unwind path in shmem_pin_map at which point the
allocated vm area hasn't been fully populated yet. Hand-roll the loop
walking vm area struct in there?


Yes.  What I originally did (re-created as I didn't save it) would be
something like this:

---

From 5605e77cda246df6dd7ded99ec22cb3f341ef5d5 Mon Sep 17 00:00:00 2001

From: Christoph Hellwig 
Date: Wed, 16 Sep 2020 13:54:04 +0200
Subject: drm/i915: use vmap in shmem_pin_map

shmem_pin_map somewhat awkwardly reimplements vmap using
alloc_vm_area and manual pte setup.  The only practical difference
is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't
seem to be required here (and could be added to vmap using a flag
if actually required).

Signed-off-by: Christoph Hellwig 
---
  drivers/gpu/drm/i915/gt/shmem_utils.c | 81 +--
  1 file changed, 27 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c 
b/drivers/gpu/drm/i915/gt/shmem_utils.c
index 43c7acbdc79dea..7ec6ba4c1065b2 100644
--- a/drivers/gpu/drm/i915/gt/shmem_utils.c
+++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
@@ -49,80 +49,53 @@ struct file *shmem_create_from_object(struct 
drm_i915_gem_object *obj)
return file;
  }
  
-static size_t shmem_npte(struct file *file)

+static size_t shmem_npages(struct file *file)
  {
return file->f_mapping->host->i_size >> PAGE_SHIFT;
  }
  
-static void __shmem_unpin_map(struct file *file, void *ptr, size_t n_pte)

-{
-   unsigned long pfn;
-
-   vunmap(ptr);
-
-   for (pfn = 0; pfn < n_pte; pfn++) {
-   struct page *page;
-
-   page = shmem_read_mapping_page_gfp(file->f_mapping, pfn,
-  GFP_KERNEL);
-   if (!WARN_ON(IS_ERR(page))) {
-   put_page(page);
-   put_page(page);
-   }
-   }
-}
-
  void *shmem_pin_map(struct file *file)
  {
-   const size_t n_pte = shmem_npte(file);
-   pte_t *stack[32], **ptes, **mem;


Chris can comment how much he'd miss the 32 page stack shortcut.


-   struct vm_struct *area;
-   unsigned long pfn;
-
-   mem = stack;
-   if (n_pte > ARRAY_SIZE(stack)) {
-   mem = kvmalloc_array(n_pte, sizeof(*mem), GFP_KERNEL);
-   if (!mem)
-   return NULL;
-   }
+   size_t n_pages = shmem_npages(file), i;
+   struct page **pages;
+   void *vaddr;
  
-	area = alloc_vm_area(n_pte << PAGE_SHIFT, mem);

-   if (!area) {
-   if (mem != stack)
-   kvfree(mem);
+   pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
+   if (!pages)
return NULL;
-   }
-
-   ptes = mem;
-   for (pfn = 0; pfn < n_pte; pfn++) {
-   struct page *page;
  
-		page = shmem_read_mapping_page_gfp(file->f_mapping, pfn,

-  GFP_KERNEL);
-   if (IS_ERR(page))
+   for (i = 0; i < n_pages; i++) {
+   pages[i] = shmem_read_mapping_page_gfp(file->f_mapping, i,
+  GFP_KERNEL);
+   if (IS_ERR(pages[i]))
goto err_page;
-
-   **ptes++ = mk_pte(page,  PAGE_KERNEL);
}
  
-	if (mem != stack)

-   kvfree(mem);
-
+   vaddr = vmap(pages, n_pages, 0, PAGE_KERNEL);
+   if (!vaddr)
+   goto err_page;
mapping_set_unevictable(file->f_mapping);
-   return area->addr;
-
+   return vaddr;


Is there something in vmap() preventing us from freeing the pages array 
here? I can't spot anything that is holding on to the pointer. Or it was 
just a sketch before you realized we could walk the vm_area?


Also, I may be totally misunderstanding something, but I think you need 
to assign area->pages manually so shmem_unpin_map can access it below.



  err_page:
-   if (mem != stack)
-   kvfree(mem);
-
-   __shmem_unpin_map(file, area->addr, pfn);
+   while (--i >= 0)
+   put_page(pages[i]);
+   kvfree(pages);
return NULL;
  }
  
  void shmem_unpin_map(struct file *file, void *ptr)

  {
+   struct vm_struct *area = find_vm_area(ptr);
+   size_t i = shmem_npages(file);
+
+   if (WARN_ON_ONCE(!area || !area->pages))
+   return;
+
mapping_clear_unevi

Re: [Intel-gfx] [PATCH 3/6] drm/i915: use vmap in shmem_pin_map

2020-09-22 Thread Tvrtko Ursulin



On 22/09/2020 07:22, Christoph Hellwig wrote:

On Mon, Sep 21, 2020 at 08:11:57PM +0100, Matthew Wilcox wrote:

This is awkward.  I'd like it if we had a vfree() variant which called
put_page() instead of __free_pages().  I'd like it even more if we
used release_pages() instead of our own loop that called put_page().


Note that we don't need a new vfree variant, we can do this manually if
we want, take a look at kernel/dma/remap.c.  But I thought this code
intentionally doesn't want to do that to avoid locking in the memory
for the pages array.  Maybe the i915 maintainers can clarify.


+ Chris & Matt who were involved with this part of i915.

If I understood this sub-thread correctly, iterating and freeing the 
pages via the vmapped ptes, so no need for a

shmem_read_mapping_page_gfp loop in shmem_unpin_map looks plausible to me.

I did not get the reference to kernel/dma/remap.c though, and also not 
sure how to do the error unwind path in shmem_pin_map at which point the 
allocated vm area hasn't been fully populated yet. Hand-roll the loop 
walking vm area struct in there?


Regards,

Tvrtko



Re: [PATCH v2 06/21] drm/i915: Introduce GEM object functions

2020-09-15 Thread Tvrtko Ursulin



On 15/09/2020 15:59, Thomas Zimmermann wrote:

GEM object functions deprecate several similar callback interfaces in
struct drm_driver. This patch replaces the per-driver callbacks with
per-instance callbacks in i915.

v2:
* move object-function instance to i915_gem_object.c (Jani)

Signed-off-by: Thomas Zimmermann 
---
  drivers/gpu/drm/i915/gem/i915_gem_object.c| 21 ---
  drivers/gpu/drm/i915/gem/i915_gem_object.h|  3 ---
  drivers/gpu/drm/i915/i915_drv.c   |  4 
  .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 ---
  4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index c8421fd9d2dc..3389ac972d16 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -39,9 +39,18 @@ static struct i915_global_object {
struct kmem_cache *slab_objects;
  } global;
  
+static const struct drm_gem_object_funcs i915_gem_object_funcs;

+
  struct drm_i915_gem_object *i915_gem_object_alloc(void)
  {
-   return kmem_cache_zalloc(global.slab_objects, GFP_KERNEL);
+   struct drm_i915_gem_object *obj;
+
+   obj = kmem_cache_zalloc(global.slab_objects, GFP_KERNEL);
+   if (!obj)
+   return NULL;
+   obj->base.funcs = _gem_object_funcs;
+
+   return obj;
  }
  
  void i915_gem_object_free(struct drm_i915_gem_object *obj)

@@ -101,7 +110,7 @@ void i915_gem_object_set_cache_coherency(struct 
drm_i915_gem_object *obj,
!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE);
  }
  
-void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file)

+static void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file 
*file)
  {
struct drm_i915_gem_object *obj = to_intel_bo(gem);
struct drm_i915_file_private *fpriv = file->driver_priv;
@@ -264,7 +273,7 @@ static void __i915_gem_free_work(struct work_struct *work)
i915_gem_flush_free_objects(i915);
  }
  
-void i915_gem_free_object(struct drm_gem_object *gem_obj)

+static void i915_gem_free_object(struct drm_gem_object *gem_obj)
  {
struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
struct drm_i915_private *i915 = to_i915(obj->base.dev);
@@ -403,6 +412,12 @@ int __init i915_global_objects_init(void)
return 0;
  }
  
+static const struct drm_gem_object_funcs i915_gem_object_funcs = {

+   .free = i915_gem_free_object,
+   .close = i915_gem_close_object,
+   .export = i915_gem_prime_export,
+};
+
  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
  #include "selftests/huge_gem_object.c"
  #include "selftests/huge_pages.c"
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index d46db8d8f38e..eaf3d4147be0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -38,9 +38,6 @@ void __i915_gem_object_release_shmem(struct 
drm_i915_gem_object *obj,
  
  int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align);
  
-void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file);

-void i915_gem_free_object(struct drm_gem_object *obj);
-
  void i915_gem_flush_free_objects(struct drm_i915_private *i915);
  
  struct sg_table *

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 94e00e450683..011a3fb41ece 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1750,12 +1750,8 @@ static struct drm_driver driver = {
.lastclose = i915_driver_lastclose,
.postclose = i915_driver_postclose,
  
-	.gem_close_object = i915_gem_close_object,

-   .gem_free_object_unlocked = i915_gem_free_object,
-
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
-   .gem_prime_export = i915_gem_prime_export,
.gem_prime_import = i915_gem_prime_import,
  
  	.dumb_create = i915_gem_dumb_create,

diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c 
b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index f127e633f7ca..9244b5d6fb01 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -87,9 +87,6 @@ static struct drm_driver mock_driver = {
.name = "mock",
.driver_features = DRIVER_GEM,
.release = mock_device_release,
-
-   .gem_close_object = i915_gem_close_object,
-   .gem_free_object_unlocked = i915_gem_free_object,
  };
  
  static void release_dev(struct device *dev)




Looks obviously fine.

Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko