[PATCH 3/3] mm, vmscan: Prevent kswapd sleeping prematurely due to mismatched classzone_idx

2017-03-08 Thread Mel Gorman
kswapd is woken to reclaim a node based on a failed allocation request
from any eligible zone. Once reclaiming in balance_pgdat(), it will
continue reclaiming until there is an eligible zone available for the
zone it was woken for. kswapd tracks what zone it was recently woken for
in pgdat->kswapd_classzone_idx. If it has not been woken recently, this
zone will be 0.

However, the decision on whether to sleep is made on kswapd_classzone_idx
which is 0 without a recent wakeup request and that classzone does not
account for lowmem reserves.  This allows kswapd to sleep when a low
small zone such as ZONE_DMA is balanced for a GFP_DMA request even if
a stream of allocations cannot use that zone. While kswapd may be woken
again shortly in the near future there are two consequences -- the pgdat
bits that control congestion are cleared prematurely and direct reclaim
is more likely as kswapd slept prematurely.

This patch flips kswapd_classzone_idx to default to MAX_NR_ZONES (an invalid
index) when there has been no recent wakeups. If there are no wakeups,
it'll decide whether to sleep based on the highest possible zone available
(MAX_NR_ZONES - 1). It then becomes critical that the "pgdat balanced"
decisions during reclaim and when deciding to sleep are the same. If there is
a mismatch, kswapd can stay awake continually trying to balance tiny zones.

simoop was used to evaluate it again. Two of the preparation patches
regressed the workload so they are included as the second set of
results. Otherwise this patch looks artifically excellent

 4.11.0-rc14.11.0-rc1   
 4.11.0-rc1
vanilla  clear-v2   
   keepawake-v2
Ameanp50-Read 21670074.18 (  0.00%) 19786774.76 (  8.69%) 
22668332.52 ( -4.61%)
Ameanp95-Read 25456267.64 (  0.00%) 24101956.27 (  5.32%) 
26738688.00 ( -5.04%)
Ameanp99-Read 29369064.73 (  0.00%) 27691872.71 (  5.71%) 
30991404.52 ( -5.52%)
Ameanp50-Write1390.30 (  0.00%) 1011.91 ( 27.22%)  
924.91 ( 33.47%)
Ameanp95-Write  412901.57 (  0.00%)34874.98 ( 91.55%) 
1362.62 ( 99.67%)
Ameanp99-Write 6668722.09 (  0.00%)   575449.60 ( 91.37%)
16854.04 ( 99.75%)
Ameanp50-Allocation  78714.31 (  0.00%)84246.26 ( -7.03%)
74729.74 (  5.06%)
Ameanp95-Allocation 175533.51 (  0.00%)   400058.43 (-127.91%)   
101609.74 ( 42.11%)
Ameanp99-Allocation 247003.02 (  0.00%) 10905600.00 (-4315.17%)   
125765.57 ( 49.08%)

With this patch on top, write and allocation latencies are massively
improved.  The read latencies are slightly impaired but it's worth noting
that this is mostly due to the IO scheduler and not directly related to
reclaim. The vmstats are a bit of a mix but the relevant ones are as follows;

4.10.0-rc7  4.10.0-rc7  4.10.0-rc7
  mmots-20170209 clear-v1r25keepawake-v1r25
Swap Ins 0   0   0
Swap Outs0 608   0
Direct pages scanned   6910672 3132699 6357298
Kswapd pages scanned  570369468248866556986286
Kswapd pages reclaimed559934886347432955939113
Direct pages reclaimed 6905990 2964843 6352115
Kswapd efficiency  98% 76% 98%
Kswapd velocity  12494.375   17597.507   12488.065
Direct efficiency  99% 94% 99%
Direct velocity   1513.835 668.3061393.148
Page writes by reclaim   0.000 4410243.000   0.000
Page writes file 0 4409635   0
Page writes anon 0 608   0
Page reclaim immediate 103679214175203 1042571

4.11.0-rc1  4.11.0-rc1  4.11.0-rc1
   vanilla  clear-v2  keepawake-v2
Swap Ins 0  12   0
Swap Outs0 838   0
Direct pages scanned   6579706 3237270 6256811
Kswapd pages scanned  618537027996148654837791
Kswapd pages reclaimed607687646075578853849586
Direct pages reclaimed 6579055 2987453 6256151
Kswapd efficiency  98% 75% 98%
Page writes by reclaim   0.000 4389496.000   0.000
Page writes file 0 4388658   0
Page writes anon 0 838   0
Page reclaim immediate 107357314473009  982507

Swap-outs are equivalent to baseline.
Direct reclaim is reduced but not eliminated. It's worth noting
that there are two periods of direct reclaim for this workload. The
first is when it switches from preparing the files for 

[PATCH 3/3] mm, vmscan: Prevent kswapd sleeping prematurely due to mismatched classzone_idx

2017-03-08 Thread Mel Gorman
kswapd is woken to reclaim a node based on a failed allocation request
from any eligible zone. Once reclaiming in balance_pgdat(), it will
continue reclaiming until there is an eligible zone available for the
zone it was woken for. kswapd tracks what zone it was recently woken for
in pgdat->kswapd_classzone_idx. If it has not been woken recently, this
zone will be 0.

However, the decision on whether to sleep is made on kswapd_classzone_idx
which is 0 without a recent wakeup request and that classzone does not
account for lowmem reserves.  This allows kswapd to sleep when a low
small zone such as ZONE_DMA is balanced for a GFP_DMA request even if
a stream of allocations cannot use that zone. While kswapd may be woken
again shortly in the near future there are two consequences -- the pgdat
bits that control congestion are cleared prematurely and direct reclaim
is more likely as kswapd slept prematurely.

This patch flips kswapd_classzone_idx to default to MAX_NR_ZONES (an invalid
index) when there has been no recent wakeups. If there are no wakeups,
it'll decide whether to sleep based on the highest possible zone available
(MAX_NR_ZONES - 1). It then becomes critical that the "pgdat balanced"
decisions during reclaim and when deciding to sleep are the same. If there is
a mismatch, kswapd can stay awake continually trying to balance tiny zones.

simoop was used to evaluate it again. Two of the preparation patches
regressed the workload so they are included as the second set of
results. Otherwise this patch looks artifically excellent

 4.11.0-rc14.11.0-rc1   
 4.11.0-rc1
vanilla  clear-v2   
   keepawake-v2
Ameanp50-Read 21670074.18 (  0.00%) 19786774.76 (  8.69%) 
22668332.52 ( -4.61%)
Ameanp95-Read 25456267.64 (  0.00%) 24101956.27 (  5.32%) 
26738688.00 ( -5.04%)
Ameanp99-Read 29369064.73 (  0.00%) 27691872.71 (  5.71%) 
30991404.52 ( -5.52%)
Ameanp50-Write1390.30 (  0.00%) 1011.91 ( 27.22%)  
924.91 ( 33.47%)
Ameanp95-Write  412901.57 (  0.00%)34874.98 ( 91.55%) 
1362.62 ( 99.67%)
Ameanp99-Write 6668722.09 (  0.00%)   575449.60 ( 91.37%)
16854.04 ( 99.75%)
Ameanp50-Allocation  78714.31 (  0.00%)84246.26 ( -7.03%)
74729.74 (  5.06%)
Ameanp95-Allocation 175533.51 (  0.00%)   400058.43 (-127.91%)   
101609.74 ( 42.11%)
Ameanp99-Allocation 247003.02 (  0.00%) 10905600.00 (-4315.17%)   
125765.57 ( 49.08%)

With this patch on top, write and allocation latencies are massively
improved.  The read latencies are slightly impaired but it's worth noting
that this is mostly due to the IO scheduler and not directly related to
reclaim. The vmstats are a bit of a mix but the relevant ones are as follows;

4.10.0-rc7  4.10.0-rc7  4.10.0-rc7
  mmots-20170209 clear-v1r25keepawake-v1r25
Swap Ins 0   0   0
Swap Outs0 608   0
Direct pages scanned   6910672 3132699 6357298
Kswapd pages scanned  570369468248866556986286
Kswapd pages reclaimed559934886347432955939113
Direct pages reclaimed 6905990 2964843 6352115
Kswapd efficiency  98% 76% 98%
Kswapd velocity  12494.375   17597.507   12488.065
Direct efficiency  99% 94% 99%
Direct velocity   1513.835 668.3061393.148
Page writes by reclaim   0.000 4410243.000   0.000
Page writes file 0 4409635   0
Page writes anon 0 608   0
Page reclaim immediate 103679214175203 1042571

4.11.0-rc1  4.11.0-rc1  4.11.0-rc1
   vanilla  clear-v2  keepawake-v2
Swap Ins 0  12   0
Swap Outs0 838   0
Direct pages scanned   6579706 3237270 6256811
Kswapd pages scanned  618537027996148654837791
Kswapd pages reclaimed607687646075578853849586
Direct pages reclaimed 6579055 2987453 6256151
Kswapd efficiency  98% 75% 98%
Page writes by reclaim   0.000 4389496.000   0.000
Page writes file 0 4388658   0
Page writes anon 0 838   0
Page reclaim immediate 107357314473009  982507

Swap-outs are equivalent to baseline.
Direct reclaim is reduced but not eliminated. It's worth noting
that there are two periods of direct reclaim for this workload. The
first is when it switches from preparing the files for 

[PATCH 0/3] Reduce amount of time kswapd sleeps prematurely v2

2017-03-08 Thread Mel Gorman
Changelog since v1
o Rebase to 4.11-rc1
o Add small clarifying comment based on review

The series is unusual in that the first patch fixes one problem and
introduces of other issues that are noted in the changelog. Patch 2 makes
a minor modification that is worth considering on its own but leaves the
kernel in a state where it behaves badly. It's not until patch 3 that
there is an improvement against baseline.

This was mostly motivated by examining Chris Mason's "simoop" benchmark
which puts the VM under similar pressure to HADOOP. It has been reported
that the benchmark has regressed severely during the last number of
releases. While I cannot reproduce all the same problems Chris experienced
due to hardware limitations, there was a number of problems on a 2-socket
machine with a single disk.

simoop latencies
 4.11.0-rc14.11.0-rc1
vanilla  keepawake-v2
Ameanp50-Read 21670074.18 (  0.00%) 22668332.52 ( -4.61%)
Ameanp95-Read 25456267.64 (  0.00%) 26738688.00 ( -5.04%)
Ameanp99-Read 29369064.73 (  0.00%) 30991404.52 ( -5.52%)
Ameanp50-Write1390.30 (  0.00%)  924.91 ( 33.47%)
Ameanp95-Write  412901.57 (  0.00%) 1362.62 ( 99.67%)
Ameanp99-Write 6668722.09 (  0.00%)16854.04 ( 99.75%)
Ameanp50-Allocation  78714.31 (  0.00%)74729.74 (  5.06%)
Ameanp95-Allocation 175533.51 (  0.00%)   101609.74 ( 42.11%)
Ameanp99-Allocation 247003.02 (  0.00%)   125765.57 ( 49.08%)

These are latencies. Read/write are threads reading fixed-size random blocks
from a simulated database. The allocation latency is mmaping and faulting
regions of memory. The p50, 95 and p99 reports the worst latencies for 50%
of the samples, 95% and 99% respectively.

For example, the report indicates that while the test was running 99% of
writes completed 99.75% faster. It's worth noting that on a UMA machine that
no difference in performance with simoop was observed so milage will vary.

It's noted that there is a slight impact to read latencies but it's mostly
due to IO scheduler decisions and offset by the large reduction in other
latencies.

 mm/memory_hotplug.c |   2 +-
 mm/vmscan.c | 136 ++--
 2 files changed, 79 insertions(+), 59 deletions(-)

-- 
2.11.0



[PATCH 0/3] Reduce amount of time kswapd sleeps prematurely v2

2017-03-08 Thread Mel Gorman
Changelog since v1
o Rebase to 4.11-rc1
o Add small clarifying comment based on review

The series is unusual in that the first patch fixes one problem and
introduces of other issues that are noted in the changelog. Patch 2 makes
a minor modification that is worth considering on its own but leaves the
kernel in a state where it behaves badly. It's not until patch 3 that
there is an improvement against baseline.

This was mostly motivated by examining Chris Mason's "simoop" benchmark
which puts the VM under similar pressure to HADOOP. It has been reported
that the benchmark has regressed severely during the last number of
releases. While I cannot reproduce all the same problems Chris experienced
due to hardware limitations, there was a number of problems on a 2-socket
machine with a single disk.

simoop latencies
 4.11.0-rc14.11.0-rc1
vanilla  keepawake-v2
Ameanp50-Read 21670074.18 (  0.00%) 22668332.52 ( -4.61%)
Ameanp95-Read 25456267.64 (  0.00%) 26738688.00 ( -5.04%)
Ameanp99-Read 29369064.73 (  0.00%) 30991404.52 ( -5.52%)
Ameanp50-Write1390.30 (  0.00%)  924.91 ( 33.47%)
Ameanp95-Write  412901.57 (  0.00%) 1362.62 ( 99.67%)
Ameanp99-Write 6668722.09 (  0.00%)16854.04 ( 99.75%)
Ameanp50-Allocation  78714.31 (  0.00%)74729.74 (  5.06%)
Ameanp95-Allocation 175533.51 (  0.00%)   101609.74 ( 42.11%)
Ameanp99-Allocation 247003.02 (  0.00%)   125765.57 ( 49.08%)

These are latencies. Read/write are threads reading fixed-size random blocks
from a simulated database. The allocation latency is mmaping and faulting
regions of memory. The p50, 95 and p99 reports the worst latencies for 50%
of the samples, 95% and 99% respectively.

For example, the report indicates that while the test was running 99% of
writes completed 99.75% faster. It's worth noting that on a UMA machine that
no difference in performance with simoop was observed so milage will vary.

It's noted that there is a slight impact to read latencies but it's mostly
due to IO scheduler decisions and offset by the large reduction in other
latencies.

 mm/memory_hotplug.c |   2 +-
 mm/vmscan.c | 136 ++--
 2 files changed, 79 insertions(+), 59 deletions(-)

-- 
2.11.0



[PATCH 1/3] mm, vmscan: fix zone balance check in prepare_kswapd_sleep

2017-03-08 Thread Mel Gorman
From: Shantanu Goel 

The check in prepare_kswapd_sleep needs to match the one in balance_pgdat
since the latter will return as soon as any one of the zones in the
classzone is above the watermark.  This is specially important for higher
order allocations since balance_pgdat will typically reset the order to
zero relying on compaction to create the higher order pages.  Without this
patch, prepare_kswapd_sleep fails to wake up kcompactd since the zone
balance check fails.

It was first reported against 4.9.7 that kswapd is failing to wake up
kcompactd due to a mismatch in the zone balance check between balance_pgdat()
and prepare_kswapd_sleep().  balance_pgdat() returns as soon as a single
zone satisfies the allocation but prepare_kswapd_sleep() requires all zones
to do +the same.  This causes prepare_kswapd_sleep() to never succeed except
in the order == 0 case and consequently, wakeup_kcompactd() is never called.
For the machine that originally motivated this patch, the state of compaction
from /proc/vmstat looked this way after a day and a half +of uptime:

compact_migrate_scanned 240496
compact_free_scanned 76238632
compact_isolated 123472
compact_stall 1791
compact_fail 29
compact_success 1762
compact_daemon_wake 0

After applying the patch and about 10 hours of uptime the state looks
like this:

compact_migrate_scanned 59927299
compact_free_scanned 2021075136
compact_isolated 640926
compact_stall 4
compact_fail 2
compact_success 2
compact_daemon_wake 5160

Further notes from Mel that motivated him to pick this patch up and
resend it;

It was observed for the simoop workload (pressures the VM similar to HADOOP)
that kswapd was failing to keep ahead of direct reclaim. The investigation
noted that there was a need to rationalise kswapd decisions to reclaim
with kswapd decisions to sleep. With this patch on a 2-socket box, there
was a 49% reduction in direct reclaim scanning.

However, the impact otherwise is extremely negative. Kswapd reclaim
efficiency dropped from 98% to 76%. simoop has three latency-related
metrics for read, write and allocation (an anonymous mmap and fault).

 4.11.0-rc14.11.0-rc1
vanilla   fixcheck-v2
Ameanp50-Read 21670074.18 (  0.00%) 20464344.18 (  5.56%)
Ameanp95-Read 25456267.64 (  0.00%) 25721423.64 ( -1.04%)
Ameanp99-Read 29369064.73 (  0.00%) 30174230.76 ( -2.74%)
Ameanp50-Write1390.30 (  0.00%) 1395.28 ( -0.36%)
Ameanp95-Write  412901.57 (  0.00%)37737.74 ( 90.86%)
Ameanp99-Write 6668722.09 (  0.00%)   666489.04 ( 90.01%)
Ameanp50-Allocation  78714.31 (  0.00%)86286.22 ( -9.62%)
Ameanp95-Allocation 175533.51 (  0.00%)   351812.27 (-100.42%)
Ameanp99-Allocation 247003.02 (  0.00%)  6291171.56 (-2447.00%)

Of greater concern is that the patch causes swapping and page writes
from kswapd context rose from 0 pages to 4189753 pages during the hour
the workload ran for. By and large, the patch has very bad behaviour but
easily missed as the impact on a UMA machine is negligible.

This patch is included with the data in case a bisection leads to this area.
This patch is also a pre-requisite for the rest of the series.

Signed-off-by: Shantanu Goel 
Signed-off-by: Mel Gorman 
Acked-by: Hillf Danton 
---
 mm/vmscan.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index bc8031ef994d..4ea444142c2e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3134,11 +3134,11 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int 
order, int classzone_idx)
if (!managed_zone(zone))
continue;
 
-   if (!zone_balanced(zone, order, classzone_idx))
-   return false;
+   if (zone_balanced(zone, order, classzone_idx))
+   return true;
}
 
-   return true;
+   return false;
 }
 
 /*
@@ -3331,7 +3331,13 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int 
alloc_order, int reclaim_o
 
prepare_to_wait(>kswapd_wait, , TASK_INTERRUPTIBLE);
 
-   /* Try to sleep for a short interval */
+   /*
+* Try to sleep for a short interval. Note that kcompactd will only be
+* woken if it is possible to sleep for a short interval. This is
+* deliberate on the assumption that if reclaim cannot keep an
+* eligible zone balanced that it's also unlikely that compaction will
+* succeed.
+*/
if (prepare_kswapd_sleep(pgdat, reclaim_order, classzone_idx)) {
/*
 * Compaction records what page blocks it recently failed to
-- 
2.11.0



[PATCH 2/3] mm, vmscan: Only clear pgdat congested/dirty/writeback state when balanced

2017-03-08 Thread Mel Gorman
A pgdat tracks if recent reclaim encountered too many dirty, writeback
or congested pages. The flags control whether kswapd writes pages back
from reclaim context, tags pages for immediate reclaim when IO completes,
whether processes block on wait_iff_congested and whether kswapd blocks
when too many pages marked for immediate reclaim are encountered.

The state is cleared in a check function with side-effects. With the patch
"mm, vmscan: fix zone balance check in prepare_kswapd_sleep", the timing
of when the bits get cleared changed. Due to the way the check works,
it'll clear the bits if ZONE_DMA is balanced for a GFP_DMA allocation
because it does not account for lowmem reserves properly.

For the simoop workload, kswapd is not stalling when it should due to
the premature clearing, writing pages from reclaim context like crazy and
generally being unhelpful.

This patch resets the pgdat bits related to page reclaim only when kswapd
is going to sleep. The comparison with simoop is then

 4.11.0-rc14.11.0-rc1   
 4.11.0-rc1
vanilla   fixcheck-v2   
   clear-v2
Ameanp50-Read 21670074.18 (  0.00%) 20464344.18 (  5.56%) 
19786774.76 (  8.69%)
Ameanp95-Read 25456267.64 (  0.00%) 25721423.64 ( -1.04%) 
24101956.27 (  5.32%)
Ameanp99-Read 29369064.73 (  0.00%) 30174230.76 ( -2.74%) 
27691872.71 (  5.71%)
Ameanp50-Write1390.30 (  0.00%) 1395.28 ( -0.36%) 
1011.91 ( 27.22%)
Ameanp95-Write  412901.57 (  0.00%)37737.74 ( 90.86%)
34874.98 ( 91.55%)
Ameanp99-Write 6668722.09 (  0.00%)   666489.04 ( 90.01%)   
575449.60 ( 91.37%)
Ameanp50-Allocation  78714.31 (  0.00%)86286.22 ( -9.62%)
84246.26 ( -7.03%)
Ameanp95-Allocation 175533.51 (  0.00%)   351812.27 (-100.42%)   
400058.43 (-127.91%)
Ameanp99-Allocation 247003.02 (  0.00%)  6291171.56 (-2447.00%) 
10905600.00 (-4315.17%)

Read latency is improved, write latency is mostly improved but allocation
latency is regressed.  kswapd is still reclaiming inefficiently,
pages are being written back from writeback context and a host of other
issues. However, given the change, it needed to be spelled out why the
side-effect was moved.

Signed-off-by: Mel Gorman 
---
 mm/vmscan.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4ea444142c2e..17b1afbce88e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3091,17 +3091,17 @@ static bool zone_balanced(struct zone *zone, int order, 
int classzone_idx)
if (!zone_watermark_ok_safe(zone, order, mark, classzone_idx))
return false;
 
-   /*
-* If any eligible zone is balanced then the node is not considered
-* to be congested or dirty
-*/
-   clear_bit(PGDAT_CONGESTED, >zone_pgdat->flags);
-   clear_bit(PGDAT_DIRTY, >zone_pgdat->flags);
-   clear_bit(PGDAT_WRITEBACK, >zone_pgdat->flags);
-
return true;
 }
 
+/* Clear pgdat state for congested, dirty or under writeback. */
+static void clear_pgdat_congested(pg_data_t *pgdat)
+{
+   clear_bit(PGDAT_CONGESTED, >flags);
+   clear_bit(PGDAT_DIRTY, >flags);
+   clear_bit(PGDAT_WRITEBACK, >flags);
+}
+
 /*
  * Prepare kswapd for sleeping. This verifies that there are no processes
  * waiting in throttle_direct_reclaim() and that watermarks have been met.
@@ -3134,8 +3134,10 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int 
order, int classzone_idx)
if (!managed_zone(zone))
continue;
 
-   if (zone_balanced(zone, order, classzone_idx))
+   if (zone_balanced(zone, order, classzone_idx)) {
+   clear_pgdat_congested(pgdat);
return true;
+   }
}
 
return false;
-- 
2.11.0



[PATCH 1/3] mm, vmscan: fix zone balance check in prepare_kswapd_sleep

2017-03-08 Thread Mel Gorman
From: Shantanu Goel 

The check in prepare_kswapd_sleep needs to match the one in balance_pgdat
since the latter will return as soon as any one of the zones in the
classzone is above the watermark.  This is specially important for higher
order allocations since balance_pgdat will typically reset the order to
zero relying on compaction to create the higher order pages.  Without this
patch, prepare_kswapd_sleep fails to wake up kcompactd since the zone
balance check fails.

It was first reported against 4.9.7 that kswapd is failing to wake up
kcompactd due to a mismatch in the zone balance check between balance_pgdat()
and prepare_kswapd_sleep().  balance_pgdat() returns as soon as a single
zone satisfies the allocation but prepare_kswapd_sleep() requires all zones
to do +the same.  This causes prepare_kswapd_sleep() to never succeed except
in the order == 0 case and consequently, wakeup_kcompactd() is never called.
For the machine that originally motivated this patch, the state of compaction
from /proc/vmstat looked this way after a day and a half +of uptime:

compact_migrate_scanned 240496
compact_free_scanned 76238632
compact_isolated 123472
compact_stall 1791
compact_fail 29
compact_success 1762
compact_daemon_wake 0

After applying the patch and about 10 hours of uptime the state looks
like this:

compact_migrate_scanned 59927299
compact_free_scanned 2021075136
compact_isolated 640926
compact_stall 4
compact_fail 2
compact_success 2
compact_daemon_wake 5160

Further notes from Mel that motivated him to pick this patch up and
resend it;

It was observed for the simoop workload (pressures the VM similar to HADOOP)
that kswapd was failing to keep ahead of direct reclaim. The investigation
noted that there was a need to rationalise kswapd decisions to reclaim
with kswapd decisions to sleep. With this patch on a 2-socket box, there
was a 49% reduction in direct reclaim scanning.

However, the impact otherwise is extremely negative. Kswapd reclaim
efficiency dropped from 98% to 76%. simoop has three latency-related
metrics for read, write and allocation (an anonymous mmap and fault).

 4.11.0-rc14.11.0-rc1
vanilla   fixcheck-v2
Ameanp50-Read 21670074.18 (  0.00%) 20464344.18 (  5.56%)
Ameanp95-Read 25456267.64 (  0.00%) 25721423.64 ( -1.04%)
Ameanp99-Read 29369064.73 (  0.00%) 30174230.76 ( -2.74%)
Ameanp50-Write1390.30 (  0.00%) 1395.28 ( -0.36%)
Ameanp95-Write  412901.57 (  0.00%)37737.74 ( 90.86%)
Ameanp99-Write 6668722.09 (  0.00%)   666489.04 ( 90.01%)
Ameanp50-Allocation  78714.31 (  0.00%)86286.22 ( -9.62%)
Ameanp95-Allocation 175533.51 (  0.00%)   351812.27 (-100.42%)
Ameanp99-Allocation 247003.02 (  0.00%)  6291171.56 (-2447.00%)

Of greater concern is that the patch causes swapping and page writes
from kswapd context rose from 0 pages to 4189753 pages during the hour
the workload ran for. By and large, the patch has very bad behaviour but
easily missed as the impact on a UMA machine is negligible.

This patch is included with the data in case a bisection leads to this area.
This patch is also a pre-requisite for the rest of the series.

Signed-off-by: Shantanu Goel 
Signed-off-by: Mel Gorman 
Acked-by: Hillf Danton 
---
 mm/vmscan.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index bc8031ef994d..4ea444142c2e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3134,11 +3134,11 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int 
order, int classzone_idx)
if (!managed_zone(zone))
continue;
 
-   if (!zone_balanced(zone, order, classzone_idx))
-   return false;
+   if (zone_balanced(zone, order, classzone_idx))
+   return true;
}
 
-   return true;
+   return false;
 }
 
 /*
@@ -3331,7 +3331,13 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int 
alloc_order, int reclaim_o
 
prepare_to_wait(>kswapd_wait, , TASK_INTERRUPTIBLE);
 
-   /* Try to sleep for a short interval */
+   /*
+* Try to sleep for a short interval. Note that kcompactd will only be
+* woken if it is possible to sleep for a short interval. This is
+* deliberate on the assumption that if reclaim cannot keep an
+* eligible zone balanced that it's also unlikely that compaction will
+* succeed.
+*/
if (prepare_kswapd_sleep(pgdat, reclaim_order, classzone_idx)) {
/*
 * Compaction records what page blocks it recently failed to
-- 
2.11.0



[PATCH 2/3] mm, vmscan: Only clear pgdat congested/dirty/writeback state when balanced

2017-03-08 Thread Mel Gorman
A pgdat tracks if recent reclaim encountered too many dirty, writeback
or congested pages. The flags control whether kswapd writes pages back
from reclaim context, tags pages for immediate reclaim when IO completes,
whether processes block on wait_iff_congested and whether kswapd blocks
when too many pages marked for immediate reclaim are encountered.

The state is cleared in a check function with side-effects. With the patch
"mm, vmscan: fix zone balance check in prepare_kswapd_sleep", the timing
of when the bits get cleared changed. Due to the way the check works,
it'll clear the bits if ZONE_DMA is balanced for a GFP_DMA allocation
because it does not account for lowmem reserves properly.

For the simoop workload, kswapd is not stalling when it should due to
the premature clearing, writing pages from reclaim context like crazy and
generally being unhelpful.

This patch resets the pgdat bits related to page reclaim only when kswapd
is going to sleep. The comparison with simoop is then

 4.11.0-rc14.11.0-rc1   
 4.11.0-rc1
vanilla   fixcheck-v2   
   clear-v2
Ameanp50-Read 21670074.18 (  0.00%) 20464344.18 (  5.56%) 
19786774.76 (  8.69%)
Ameanp95-Read 25456267.64 (  0.00%) 25721423.64 ( -1.04%) 
24101956.27 (  5.32%)
Ameanp99-Read 29369064.73 (  0.00%) 30174230.76 ( -2.74%) 
27691872.71 (  5.71%)
Ameanp50-Write1390.30 (  0.00%) 1395.28 ( -0.36%) 
1011.91 ( 27.22%)
Ameanp95-Write  412901.57 (  0.00%)37737.74 ( 90.86%)
34874.98 ( 91.55%)
Ameanp99-Write 6668722.09 (  0.00%)   666489.04 ( 90.01%)   
575449.60 ( 91.37%)
Ameanp50-Allocation  78714.31 (  0.00%)86286.22 ( -9.62%)
84246.26 ( -7.03%)
Ameanp95-Allocation 175533.51 (  0.00%)   351812.27 (-100.42%)   
400058.43 (-127.91%)
Ameanp99-Allocation 247003.02 (  0.00%)  6291171.56 (-2447.00%) 
10905600.00 (-4315.17%)

Read latency is improved, write latency is mostly improved but allocation
latency is regressed.  kswapd is still reclaiming inefficiently,
pages are being written back from writeback context and a host of other
issues. However, given the change, it needed to be spelled out why the
side-effect was moved.

Signed-off-by: Mel Gorman 
---
 mm/vmscan.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4ea444142c2e..17b1afbce88e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3091,17 +3091,17 @@ static bool zone_balanced(struct zone *zone, int order, 
int classzone_idx)
if (!zone_watermark_ok_safe(zone, order, mark, classzone_idx))
return false;
 
-   /*
-* If any eligible zone is balanced then the node is not considered
-* to be congested or dirty
-*/
-   clear_bit(PGDAT_CONGESTED, >zone_pgdat->flags);
-   clear_bit(PGDAT_DIRTY, >zone_pgdat->flags);
-   clear_bit(PGDAT_WRITEBACK, >zone_pgdat->flags);
-
return true;
 }
 
+/* Clear pgdat state for congested, dirty or under writeback. */
+static void clear_pgdat_congested(pg_data_t *pgdat)
+{
+   clear_bit(PGDAT_CONGESTED, >flags);
+   clear_bit(PGDAT_DIRTY, >flags);
+   clear_bit(PGDAT_WRITEBACK, >flags);
+}
+
 /*
  * Prepare kswapd for sleeping. This verifies that there are no processes
  * waiting in throttle_direct_reclaim() and that watermarks have been met.
@@ -3134,8 +3134,10 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int 
order, int classzone_idx)
if (!managed_zone(zone))
continue;
 
-   if (zone_balanced(zone, order, classzone_idx))
+   if (zone_balanced(zone, order, classzone_idx)) {
+   clear_pgdat_congested(pgdat);
return true;
+   }
}
 
return false;
-- 
2.11.0



Re: [TRIVIAL PATCH 2/2] kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL

2017-03-08 Thread Masami Hiramatsu
On Wed,  8 Mar 2017 22:34:15 +0530
"Naveen N. Rao"  wrote:

> commit fc62d0207ae0 ("kprobes: Introduce weak variant of
> kprobe_exceptions_notify()") used the __kprobes annotation to exclude
> kprobe_exceptions_notify from being probed. Since NOKPROBE_SYMBOL() is a
> better way to do this enabling the symbol to be discovered as being
> blacklisted, change over to using NOKPROBE_SYMBOL().
> 

Oops, yes it should be.

Acked-by: Masami Hiramatsu 

Thanks,

> Signed-off-by: Naveen N. Rao 
> ---
>  kernel/kprobes.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 699c5bc51a92..b52d952d6d41 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1740,11 +1740,12 @@ void unregister_kprobes(struct kprobe **kps, int num)
>  }
>  EXPORT_SYMBOL_GPL(unregister_kprobes);
>  
> -int __weak __kprobes kprobe_exceptions_notify(struct notifier_block *self,
> -   unsigned long val, void *data)
> +int __weak kprobe_exceptions_notify(struct notifier_block *self,
> + unsigned long val, void *data)
>  {
>   return NOTIFY_DONE;
>  }
> +NOKPROBE_SYMBOL(kprobe_exceptions_notify);
>  
>  static struct notifier_block kprobe_exceptions_nb = {
>   .notifier_call = kprobe_exceptions_notify,
> -- 
> 2.11.1
> 


-- 
Masami Hiramatsu 


Re: [TRIVIAL PATCH 2/2] kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL

2017-03-08 Thread Masami Hiramatsu
On Wed,  8 Mar 2017 22:34:15 +0530
"Naveen N. Rao"  wrote:

> commit fc62d0207ae0 ("kprobes: Introduce weak variant of
> kprobe_exceptions_notify()") used the __kprobes annotation to exclude
> kprobe_exceptions_notify from being probed. Since NOKPROBE_SYMBOL() is a
> better way to do this enabling the symbol to be discovered as being
> blacklisted, change over to using NOKPROBE_SYMBOL().
> 

Oops, yes it should be.

Acked-by: Masami Hiramatsu 

Thanks,

> Signed-off-by: Naveen N. Rao 
> ---
>  kernel/kprobes.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 699c5bc51a92..b52d952d6d41 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1740,11 +1740,12 @@ void unregister_kprobes(struct kprobe **kps, int num)
>  }
>  EXPORT_SYMBOL_GPL(unregister_kprobes);
>  
> -int __weak __kprobes kprobe_exceptions_notify(struct notifier_block *self,
> -   unsigned long val, void *data)
> +int __weak kprobe_exceptions_notify(struct notifier_block *self,
> + unsigned long val, void *data)
>  {
>   return NOTIFY_DONE;
>  }
> +NOKPROBE_SYMBOL(kprobe_exceptions_notify);
>  
>  static struct notifier_block kprobe_exceptions_nb = {
>   .notifier_call = kprobe_exceptions_notify,
> -- 
> 2.11.1
> 


-- 
Masami Hiramatsu 


Re: [RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-08 Thread David Howells
David Miller  wrote:

> I guess this is fine, but I think you can use one of the two "sk_padding"
> bits in struct sock instead of making the structure larger.

It shouldn't make the structure larger since there's a hole in the structure:

unsigned intsk_padding : 2,
sk_no_check_tx : 1,
sk_no_check_rx : 1,
sk_userlocks : 4,
sk_protocol  : 8,
sk_type  : 16;
#define SK_PROTOCOL_MAX U8_MAX
kmemcheck_bitfield_end(flags);

u16 sk_gso_max_segs;
  ---> 2-byte hole here
unsigned long   sk_lingertime;

but I'll quite happily shift it to sk_padding.

Thanks,
David


Re: [RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-08 Thread David Howells
David Miller  wrote:

> I guess this is fine, but I think you can use one of the two "sk_padding"
> bits in struct sock instead of making the structure larger.

It shouldn't make the structure larger since there's a hole in the structure:

unsigned intsk_padding : 2,
sk_no_check_tx : 1,
sk_no_check_rx : 1,
sk_userlocks : 4,
sk_protocol  : 8,
sk_type  : 16;
#define SK_PROTOCOL_MAX U8_MAX
kmemcheck_bitfield_end(flags);

u16 sk_gso_max_segs;
  ---> 2-byte hole here
unsigned long   sk_lingertime;

but I'll quite happily shift it to sk_padding.

Thanks,
David


Re: [PATCH v7 kernel 5/5] This patch contains two parts:

2017-03-08 Thread Wei Wang
On 03/06/2017 09:23 PM, David Hildenbrand wrote:Am 03.03.2017 um 06:40 
schrieb Wei Wang:

From: Liang Li 

Sorry, I just saw the message due to an email issue.


I'd prefer to split this into two parts then and to create proper subjects.

Agree, will do.



If I remember correctly, the general concept was accepted by most reviewers.



Yes, that's also what I was told.

Best,
Wei


Re: [PATCH v7 kernel 5/5] This patch contains two parts:

2017-03-08 Thread Wei Wang
On 03/06/2017 09:23 PM, David Hildenbrand wrote:Am 03.03.2017 um 06:40 
schrieb Wei Wang:

From: Liang Li 

Sorry, I just saw the message due to an email issue.


I'd prefer to split this into two parts then and to create proper subjects.

Agree, will do.



If I remember correctly, the general concept was accepted by most reviewers.



Yes, that's also what I was told.

Best,
Wei


[PATCH] drivers, scsi: convert fc_fcp_pkt.ref_cnt from atomic_t to refcount_t

2017-03-08 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
Acked-by: Johannes Thumshirn 
---
 drivers/scsi/libfc/fc_fcp.c | 6 +++---
 include/scsi/libfc.h| 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
index 0e67621..a808e8e 100644
--- a/drivers/scsi/libfc/fc_fcp.c
+++ b/drivers/scsi/libfc/fc_fcp.c
@@ -154,7 +154,7 @@ static struct fc_fcp_pkt *fc_fcp_pkt_alloc(struct fc_lport 
*lport, gfp_t gfp)
memset(fsp, 0, sizeof(*fsp));
fsp->lp = lport;
fsp->xfer_ddp = FC_XID_UNKNOWN;
-   atomic_set(>ref_cnt, 1);
+   refcount_set(>ref_cnt, 1);
init_timer(>timer);
fsp->timer.data = (unsigned long)fsp;
INIT_LIST_HEAD(>list);
@@ -175,7 +175,7 @@ static struct fc_fcp_pkt *fc_fcp_pkt_alloc(struct fc_lport 
*lport, gfp_t gfp)
  */
 static void fc_fcp_pkt_release(struct fc_fcp_pkt *fsp)
 {
-   if (atomic_dec_and_test(>ref_cnt)) {
+   if (refcount_dec_and_test(>ref_cnt)) {
struct fc_fcp_internal *si = fc_get_scsi_internal(fsp->lp);
 
mempool_free(fsp, si->scsi_pkt_pool);
@@ -188,7 +188,7 @@ static void fc_fcp_pkt_release(struct fc_fcp_pkt *fsp)
  */
 static void fc_fcp_pkt_hold(struct fc_fcp_pkt *fsp)
 {
-   atomic_inc(>ref_cnt);
+   refcount_inc(>ref_cnt);
 }
 
 /**
diff --git a/include/scsi/libfc.h b/include/scsi/libfc.h
index da5033d..2109844 100644
--- a/include/scsi/libfc.h
+++ b/include/scsi/libfc.h
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -321,7 +322,7 @@ struct fc_seq_els_data {
  */
 struct fc_fcp_pkt {
spinlock_tscsi_pkt_lock;
-   atomic_t  ref_cnt;
+   refcount_tref_cnt;
 
/* SCSI command and data transfer information */
u32   data_len;
-- 
2.7.4



[PATCH] drivers, scsi: convert fc_fcp_pkt.ref_cnt from atomic_t to refcount_t

2017-03-08 Thread Elena Reshetova
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova 
Signed-off-by: Hans Liljestrand 
Signed-off-by: Kees Cook 
Signed-off-by: David Windsor 
Acked-by: Johannes Thumshirn 
---
 drivers/scsi/libfc/fc_fcp.c | 6 +++---
 include/scsi/libfc.h| 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
index 0e67621..a808e8e 100644
--- a/drivers/scsi/libfc/fc_fcp.c
+++ b/drivers/scsi/libfc/fc_fcp.c
@@ -154,7 +154,7 @@ static struct fc_fcp_pkt *fc_fcp_pkt_alloc(struct fc_lport 
*lport, gfp_t gfp)
memset(fsp, 0, sizeof(*fsp));
fsp->lp = lport;
fsp->xfer_ddp = FC_XID_UNKNOWN;
-   atomic_set(>ref_cnt, 1);
+   refcount_set(>ref_cnt, 1);
init_timer(>timer);
fsp->timer.data = (unsigned long)fsp;
INIT_LIST_HEAD(>list);
@@ -175,7 +175,7 @@ static struct fc_fcp_pkt *fc_fcp_pkt_alloc(struct fc_lport 
*lport, gfp_t gfp)
  */
 static void fc_fcp_pkt_release(struct fc_fcp_pkt *fsp)
 {
-   if (atomic_dec_and_test(>ref_cnt)) {
+   if (refcount_dec_and_test(>ref_cnt)) {
struct fc_fcp_internal *si = fc_get_scsi_internal(fsp->lp);
 
mempool_free(fsp, si->scsi_pkt_pool);
@@ -188,7 +188,7 @@ static void fc_fcp_pkt_release(struct fc_fcp_pkt *fsp)
  */
 static void fc_fcp_pkt_hold(struct fc_fcp_pkt *fsp)
 {
-   atomic_inc(>ref_cnt);
+   refcount_inc(>ref_cnt);
 }
 
 /**
diff --git a/include/scsi/libfc.h b/include/scsi/libfc.h
index da5033d..2109844 100644
--- a/include/scsi/libfc.h
+++ b/include/scsi/libfc.h
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -321,7 +322,7 @@ struct fc_seq_els_data {
  */
 struct fc_fcp_pkt {
spinlock_tscsi_pkt_lock;
-   atomic_t  ref_cnt;
+   refcount_tref_cnt;
 
/* SCSI command and data transfer information */
u32   data_len;
-- 
2.7.4



Re: kexec regression since 4.9 caused by efi

2017-03-08 Thread Omar Sandoval
On Thu, Mar 09, 2017 at 10:21:36AM +0800, Dave Young wrote:
> I have no esrt machine to test, can you share the full kernel log with
> efi=debug in kernel cmdline?
> 
> *) normal boot kernel log without the reverting
> *) kexec boot log with and without the reverting

Attached.
[0.00] Linux version 4.11.0-rc1 (osan...@devbig561.prn1.facebook.com) 
(gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #53 SMP Wed Mar 8 
12:07:16 PST 2017
[0.00] Command line: BOOT_IMAGE=/vmlinuz-4.11.0-rc1 ro root=LABEL=/ 
ipv6.autoconf=0 erst_disable biosdevname=0 net.ifnames=0 fsck.repair=yes 
pcie_pme=nomsi 
netconsole=+@2401:db00:0011:b03e:face::0009:/eth0,1514@2401:db00:eef0:a59::/02:90:fb:5b:b7:1e
 crashkernel=128M console=tty0 console=ttyS1,57600 efi=debug
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009] usable
[0.00] BIOS-e820: [mem 0x0010-0x750bdfff] usable
[0.00] BIOS-e820: [mem 0x750be000-0x75ddbfff] reserved
[0.00] BIOS-e820: [mem 0x75ddc000-0x75e32fff] ACPI data
[0.00] BIOS-e820: [mem 0x75e33000-0x7642dfff] ACPI NVS
[0.00] BIOS-e820: [mem 0x7642e000-0x7bcd9fff] reserved
[0.00] BIOS-e820: [mem 0x7bcda000-0x7bcdafff] usable
[0.00] BIOS-e820: [mem 0x7bcdb000-0x7bd60fff] reserved
[0.00] BIOS-e820: [mem 0x7bd61000-0x7bff] usable
[0.00] BIOS-e820: [mem 0x7c00-0x8fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed44fff] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00407fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] extended physical RAM map:
[0.00] reserve setup_data: [mem 0x-0x0009] 
usable
[0.00] reserve setup_data: [mem 0x0010-0x709fb017] 
usable
[0.00] reserve setup_data: [mem 0x709fb018-0x70a4b857] 
usable
[0.00] reserve setup_data: [mem 0x70a4b858-0x70a4c017] 
usable
[0.00] reserve setup_data: [mem 0x70a4c018-0x70a53c57] 
usable
[0.00] reserve setup_data: [mem 0x70a53c58-0x750bdfff] 
usable
[0.00] reserve setup_data: [mem 0x750be000-0x75ddbfff] 
reserved
[0.00] reserve setup_data: [mem 0x75ddc000-0x75e32fff] 
ACPI data
[0.00] reserve setup_data: [mem 0x75e33000-0x7642dfff] 
ACPI NVS
[0.00] reserve setup_data: [mem 0x7642e000-0x7bcd9fff] 
reserved
[0.00] reserve setup_data: [mem 0x7bcda000-0x7bcdafff] 
usable
[0.00] reserve setup_data: [mem 0x7bcdb000-0x7bd60fff] 
reserved
[0.00] reserve setup_data: [mem 0x7bd61000-0x7bff] 
usable
[0.00] reserve setup_data: [mem 0x7c00-0x8fff] 
reserved
[0.00] reserve setup_data: [mem 0xfed1c000-0xfed44fff] 
reserved
[0.00] reserve setup_data: [mem 0xff00-0x] 
reserved
[0.00] reserve setup_data: [mem 0x0001-0x00407fff] 
usable
[0.00] efi: EFI v2.40 by American Megatrends
[0.00] efi:  ACPI=0x75f5c000  ACPI 2.0=0x75f5c000  ESRT=0x7bc4d018  
SMBIOS=0xf05e0  SMBIOS 3.0=0x7bb2f000  MPS=0xfc9e0 
[0.00] efi: mem00: [Boot Code  |   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x-0x7fff] (0MB)
[0.00] efi: mem01: [Loader Data|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x8000-0x8fff] (0MB)
[0.00] efi: mem02: [Conventional Memory|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x9000-0x0003] (0MB)
[0.00] efi: mem03: [Boot Code  |   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x0004-0x0009] (0MB)
[0.00] efi: mem04: [Conventional Memory|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x0010-0x00ff] (15MB)
[0.00] efi: mem05: [Loader Data|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x0100-0x024cdfff] (20MB)
[0.00] efi: mem06: [Conventional Memory|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] 

Re: kexec regression since 4.9 caused by efi

2017-03-08 Thread Omar Sandoval
On Thu, Mar 09, 2017 at 10:21:36AM +0800, Dave Young wrote:
> I have no esrt machine to test, can you share the full kernel log with
> efi=debug in kernel cmdline?
> 
> *) normal boot kernel log without the reverting
> *) kexec boot log with and without the reverting

Attached.
[0.00] Linux version 4.11.0-rc1 (osan...@devbig561.prn1.facebook.com) 
(gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #53 SMP Wed Mar 8 
12:07:16 PST 2017
[0.00] Command line: BOOT_IMAGE=/vmlinuz-4.11.0-rc1 ro root=LABEL=/ 
ipv6.autoconf=0 erst_disable biosdevname=0 net.ifnames=0 fsck.repair=yes 
pcie_pme=nomsi 
netconsole=+@2401:db00:0011:b03e:face::0009:/eth0,1514@2401:db00:eef0:a59::/02:90:fb:5b:b7:1e
 crashkernel=128M console=tty0 console=ttyS1,57600 efi=debug
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009] usable
[0.00] BIOS-e820: [mem 0x0010-0x750bdfff] usable
[0.00] BIOS-e820: [mem 0x750be000-0x75ddbfff] reserved
[0.00] BIOS-e820: [mem 0x75ddc000-0x75e32fff] ACPI data
[0.00] BIOS-e820: [mem 0x75e33000-0x7642dfff] ACPI NVS
[0.00] BIOS-e820: [mem 0x7642e000-0x7bcd9fff] reserved
[0.00] BIOS-e820: [mem 0x7bcda000-0x7bcdafff] usable
[0.00] BIOS-e820: [mem 0x7bcdb000-0x7bd60fff] reserved
[0.00] BIOS-e820: [mem 0x7bd61000-0x7bff] usable
[0.00] BIOS-e820: [mem 0x7c00-0x8fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed44fff] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00407fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] extended physical RAM map:
[0.00] reserve setup_data: [mem 0x-0x0009] 
usable
[0.00] reserve setup_data: [mem 0x0010-0x709fb017] 
usable
[0.00] reserve setup_data: [mem 0x709fb018-0x70a4b857] 
usable
[0.00] reserve setup_data: [mem 0x70a4b858-0x70a4c017] 
usable
[0.00] reserve setup_data: [mem 0x70a4c018-0x70a53c57] 
usable
[0.00] reserve setup_data: [mem 0x70a53c58-0x750bdfff] 
usable
[0.00] reserve setup_data: [mem 0x750be000-0x75ddbfff] 
reserved
[0.00] reserve setup_data: [mem 0x75ddc000-0x75e32fff] 
ACPI data
[0.00] reserve setup_data: [mem 0x75e33000-0x7642dfff] 
ACPI NVS
[0.00] reserve setup_data: [mem 0x7642e000-0x7bcd9fff] 
reserved
[0.00] reserve setup_data: [mem 0x7bcda000-0x7bcdafff] 
usable
[0.00] reserve setup_data: [mem 0x7bcdb000-0x7bd60fff] 
reserved
[0.00] reserve setup_data: [mem 0x7bd61000-0x7bff] 
usable
[0.00] reserve setup_data: [mem 0x7c00-0x8fff] 
reserved
[0.00] reserve setup_data: [mem 0xfed1c000-0xfed44fff] 
reserved
[0.00] reserve setup_data: [mem 0xff00-0x] 
reserved
[0.00] reserve setup_data: [mem 0x0001-0x00407fff] 
usable
[0.00] efi: EFI v2.40 by American Megatrends
[0.00] efi:  ACPI=0x75f5c000  ACPI 2.0=0x75f5c000  ESRT=0x7bc4d018  
SMBIOS=0xf05e0  SMBIOS 3.0=0x7bb2f000  MPS=0xfc9e0 
[0.00] efi: mem00: [Boot Code  |   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x-0x7fff] (0MB)
[0.00] efi: mem01: [Loader Data|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x8000-0x8fff] (0MB)
[0.00] efi: mem02: [Conventional Memory|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x9000-0x0003] (0MB)
[0.00] efi: mem03: [Boot Code  |   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x0004-0x0009] (0MB)
[0.00] efi: mem04: [Conventional Memory|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x0010-0x00ff] (15MB)
[0.00] efi: mem05: [Loader Data|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] range=[0x0100-0x024cdfff] (20MB)
[0.00] efi: mem06: [Conventional Memory|   |  |  |  |  |  |  |   
|WB|WT|WC|UC] 

Re: [PATCH 2/4] ubifs: Fix unlink code wrt. double hash lookups

2017-03-08 Thread Hyunchul Lee
Richard,

this patch works well. but i found some trivial mistakes.

On Thu, Feb 09, 2017 at 10:28:35PM +0100, Richard Weinberger wrote:
> When removing an encrypted file with a long anem and without having
> the key we have to be able to locate and remove the directory entry
> via a double hash. This corner case was simply forgotten.
> 
> Reported-by: David Oberhollenzer 
> Signed-off-by: Richard Weinberger 
> ---
>  fs/ubifs/journal.c |  10 -
>  fs/ubifs/tnc.c | 129 
> -
>  fs/ubifs/ubifs.h   |   2 +
>  3 files changed, 117 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
> index f3b620cbdda4..7aef413ea2a9 100644
> --- a/fs/ubifs/journal.c
> +++ b/fs/ubifs/journal.c
> @@ -585,7 +585,10 @@ int ubifs_jnl_update(struct ubifs_info *c, const struct 
> inode *dir,
>  
>   if (!xent) {
>   dent->ch.node_type = UBIFS_DENT_NODE;
> - dent_key_init(c, _key, dir->i_ino, nm);
> + if (nm->hash)
> + dent_key_init_hash(c, _key, dir->i_ino, nm->hash);
> + else
> + dent_key_init(c, _key, dir->i_ino, nm);
>   } else {
>   dent->ch.node_type = UBIFS_XENT_NODE;
>   xent_key_init(c, _key, dir->i_ino, nm);
> @@ -629,7 +632,10 @@ int ubifs_jnl_update(struct ubifs_info *c, const struct 
> inode *dir,
>   kfree(dent);
>  
>   if (deletion) {
> - err = ubifs_tnc_remove_nm(c, _key, nm);
> + if (nm->hash)
> + err = ubifs_tnc_remove_dh(c, _key, nm->minor_hash);
> + else
> + err = ubifs_tnc_remove_nm(c, _key, nm);
>   if (err)
>   goto out_ro;
>   err = ubifs_add_dirt(c, lnum, dlen);
> diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c
> index 709aa098dd46..d84f4ba467a3 100644
> --- a/fs/ubifs/tnc.c
> +++ b/fs/ubifs/tnc.c
> @@ -1880,48 +1880,65 @@ int ubifs_tnc_lookup_nm(struct ubifs_info *c, const 
> union ubifs_key *key,
>   return do_lookup_nm(c, key, node, nm);
>  }
>  
> -static int do_lookup_dh(struct ubifs_info *c, const union ubifs_key *key,
> - struct ubifs_dent_node *dent, uint32_t cookie)
> +static int search_dh_cookie(struct ubifs_info *c, const union ubifs_key *key,
> + struct ubifs_dent_node *dent, uint32_t cookie,
> + struct ubifs_znode **zn, int *n)
>  {
> - int n, err, type = key_type(c, key);
> - struct ubifs_znode *znode;
> + int err;

i guess that err should be initialized with -ENOENT to avoid the first call of  
tnc_next(c, , n). 

> + struct ubifs_znode *znode = *zn;
>   struct ubifs_zbranch *zbr;
> - union ubifs_key *dkey, start_key;
> -
> - ubifs_assert(is_hash_key(c, key));
> -
> - lowest_dent_key(c, _key, key_inum(c, key));
> -
> - mutex_lock(>tnc_mutex);
> - err = ubifs_lookup_level0(c, _key, , );
> - if (unlikely(err < 0))
> - goto out_unlock;
> + union ubifs_key *dkey;
>  
>   for (;;) {
>   if (!err) {
> - err = tnc_next(c, , );
> + err = tnc_next(c, , n);
>   if (err)
> - goto out_unlock;
> + goto out;
>   }
>  
> - zbr = >zbranch[n];
> + zbr = >zbranch[*n];
>   dkey = >key;
>  
>   if (key_inum(c, dkey) != key_inum(c, key) ||
> - key_type(c, dkey) != type) {
> + key_type(c, dkey) != key_type(c, key)) {
>   err = -ENOENT;
> - goto out_unlock;
> + goto out;
>   }
>  
>   err = tnc_read_hashed_node(c, zbr, dent);
>   if (err)
> - goto out_unlock;
> + goto out;
>  
>   if (key_hash(c, key) == key_hash(c, dkey) &&
> - le32_to_cpu(dent->cookie) == cookie)
> - goto out_unlock;
> + le32_to_cpu(dent->cookie) == cookie) {
> + *zn = znode;
> + goto out;
> + }
>   }
>  
> +out:
> +
> + return err;
> +}
> +
> +static int do_lookup_dh(struct ubifs_info *c, const union ubifs_key *key,
> + struct ubifs_dent_node *dent, uint32_t cookie)
> +{
> + int n, err;
> + struct ubifs_znode *znode;
> + union ubifs_key start_key;
> +
> + ubifs_assert(is_hash_key(c, key));
> +
> + lowest_dent_key(c, _key, key_inum(c, key));
> +
> + mutex_lock(>tnc_mutex);
> + err = ubifs_lookup_level0(c, _key, , );
> + if (unlikely(err < 0))
> + goto out_unlock;
> +
> + err = search_dh_cookie(c, key, dent, cookie, , );
> +
>  out_unlock:
>   mutex_unlock(>tnc_mutex);
>   return err;

Re: linux-4.11-rc1/drivers/gpu/drm/exynos/exynos5433_drm_decon.c:681: suspicious mask ?

2017-03-08 Thread Inki Dae
Hello David,

Thanks for report.

2017년 03월 06일 19:05에 David Binderman 이(가) 쓴 글:
> Hello there,
> 
> linux-4.11-rc1/drivers/gpu/drm/exynos/exynos5433_drm_decon.c:681]: (warning) 
> Result of operator '|' is always true if one operand is non-zero. Did you 
> intend to use '&'?
> 

Right. this is known issue and below patch fixes this,
http://www.spinics.net/lists/dri-devel/msg132589.html

This patch will go to -fixes.

> Source code is
> 
> if (ctx->out_type | I80_HW_TRG) {
> 
> Also in the same file:
> 
> [drivers/gpu/drm/exynos/exynos5433_drm_decon.c:131]: (style) Same expression 
> on both sides of '|'.
> 
> Source code is
> 
>writel(TRIGCON_TE_AUTO_MASK | TRIGCON_SWTRIGEN
>| TRIGCON_TE_AUTO_MASK | TRIGCON_SWTRIGEN,

In this case, only problem is two flags are set in duplicate. This should be 
cleaned up. Thanks. :)

> 
> Regards
> 
> David Binderman
> 
> 


Re: [PATCH 2/4] ubifs: Fix unlink code wrt. double hash lookups

2017-03-08 Thread Hyunchul Lee
Richard,

this patch works well. but i found some trivial mistakes.

On Thu, Feb 09, 2017 at 10:28:35PM +0100, Richard Weinberger wrote:
> When removing an encrypted file with a long anem and without having
> the key we have to be able to locate and remove the directory entry
> via a double hash. This corner case was simply forgotten.
> 
> Reported-by: David Oberhollenzer 
> Signed-off-by: Richard Weinberger 
> ---
>  fs/ubifs/journal.c |  10 -
>  fs/ubifs/tnc.c | 129 
> -
>  fs/ubifs/ubifs.h   |   2 +
>  3 files changed, 117 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
> index f3b620cbdda4..7aef413ea2a9 100644
> --- a/fs/ubifs/journal.c
> +++ b/fs/ubifs/journal.c
> @@ -585,7 +585,10 @@ int ubifs_jnl_update(struct ubifs_info *c, const struct 
> inode *dir,
>  
>   if (!xent) {
>   dent->ch.node_type = UBIFS_DENT_NODE;
> - dent_key_init(c, _key, dir->i_ino, nm);
> + if (nm->hash)
> + dent_key_init_hash(c, _key, dir->i_ino, nm->hash);
> + else
> + dent_key_init(c, _key, dir->i_ino, nm);
>   } else {
>   dent->ch.node_type = UBIFS_XENT_NODE;
>   xent_key_init(c, _key, dir->i_ino, nm);
> @@ -629,7 +632,10 @@ int ubifs_jnl_update(struct ubifs_info *c, const struct 
> inode *dir,
>   kfree(dent);
>  
>   if (deletion) {
> - err = ubifs_tnc_remove_nm(c, _key, nm);
> + if (nm->hash)
> + err = ubifs_tnc_remove_dh(c, _key, nm->minor_hash);
> + else
> + err = ubifs_tnc_remove_nm(c, _key, nm);
>   if (err)
>   goto out_ro;
>   err = ubifs_add_dirt(c, lnum, dlen);
> diff --git a/fs/ubifs/tnc.c b/fs/ubifs/tnc.c
> index 709aa098dd46..d84f4ba467a3 100644
> --- a/fs/ubifs/tnc.c
> +++ b/fs/ubifs/tnc.c
> @@ -1880,48 +1880,65 @@ int ubifs_tnc_lookup_nm(struct ubifs_info *c, const 
> union ubifs_key *key,
>   return do_lookup_nm(c, key, node, nm);
>  }
>  
> -static int do_lookup_dh(struct ubifs_info *c, const union ubifs_key *key,
> - struct ubifs_dent_node *dent, uint32_t cookie)
> +static int search_dh_cookie(struct ubifs_info *c, const union ubifs_key *key,
> + struct ubifs_dent_node *dent, uint32_t cookie,
> + struct ubifs_znode **zn, int *n)
>  {
> - int n, err, type = key_type(c, key);
> - struct ubifs_znode *znode;
> + int err;

i guess that err should be initialized with -ENOENT to avoid the first call of  
tnc_next(c, , n). 

> + struct ubifs_znode *znode = *zn;
>   struct ubifs_zbranch *zbr;
> - union ubifs_key *dkey, start_key;
> -
> - ubifs_assert(is_hash_key(c, key));
> -
> - lowest_dent_key(c, _key, key_inum(c, key));
> -
> - mutex_lock(>tnc_mutex);
> - err = ubifs_lookup_level0(c, _key, , );
> - if (unlikely(err < 0))
> - goto out_unlock;
> + union ubifs_key *dkey;
>  
>   for (;;) {
>   if (!err) {
> - err = tnc_next(c, , );
> + err = tnc_next(c, , n);
>   if (err)
> - goto out_unlock;
> + goto out;
>   }
>  
> - zbr = >zbranch[n];
> + zbr = >zbranch[*n];
>   dkey = >key;
>  
>   if (key_inum(c, dkey) != key_inum(c, key) ||
> - key_type(c, dkey) != type) {
> + key_type(c, dkey) != key_type(c, key)) {
>   err = -ENOENT;
> - goto out_unlock;
> + goto out;
>   }
>  
>   err = tnc_read_hashed_node(c, zbr, dent);
>   if (err)
> - goto out_unlock;
> + goto out;
>  
>   if (key_hash(c, key) == key_hash(c, dkey) &&
> - le32_to_cpu(dent->cookie) == cookie)
> - goto out_unlock;
> + le32_to_cpu(dent->cookie) == cookie) {
> + *zn = znode;
> + goto out;
> + }
>   }
>  
> +out:
> +
> + return err;
> +}
> +
> +static int do_lookup_dh(struct ubifs_info *c, const union ubifs_key *key,
> + struct ubifs_dent_node *dent, uint32_t cookie)
> +{
> + int n, err;
> + struct ubifs_znode *znode;
> + union ubifs_key start_key;
> +
> + ubifs_assert(is_hash_key(c, key));
> +
> + lowest_dent_key(c, _key, key_inum(c, key));
> +
> + mutex_lock(>tnc_mutex);
> + err = ubifs_lookup_level0(c, _key, , );
> + if (unlikely(err < 0))
> + goto out_unlock;
> +
> + err = search_dh_cookie(c, key, dent, cookie, , );
> +
>  out_unlock:
>   mutex_unlock(>tnc_mutex);
>   return err;
> @@ -2663,6 +2680,74 @@ int 

Re: linux-4.11-rc1/drivers/gpu/drm/exynos/exynos5433_drm_decon.c:681: suspicious mask ?

2017-03-08 Thread Inki Dae
Hello David,

Thanks for report.

2017년 03월 06일 19:05에 David Binderman 이(가) 쓴 글:
> Hello there,
> 
> linux-4.11-rc1/drivers/gpu/drm/exynos/exynos5433_drm_decon.c:681]: (warning) 
> Result of operator '|' is always true if one operand is non-zero. Did you 
> intend to use '&'?
> 

Right. this is known issue and below patch fixes this,
http://www.spinics.net/lists/dri-devel/msg132589.html

This patch will go to -fixes.

> Source code is
> 
> if (ctx->out_type | I80_HW_TRG) {
> 
> Also in the same file:
> 
> [drivers/gpu/drm/exynos/exynos5433_drm_decon.c:131]: (style) Same expression 
> on both sides of '|'.
> 
> Source code is
> 
>writel(TRIGCON_TE_AUTO_MASK | TRIGCON_SWTRIGEN
>| TRIGCON_TE_AUTO_MASK | TRIGCON_SWTRIGEN,

In this case, only problem is two flags are set in duplicate. This should be 
cleaned up. Thanks. :)

> 
> Regards
> 
> David Binderman
> 
> 


Re: [RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-08 Thread David Miller
From: David Howells 
Date: Mon, 06 Mar 2017 15:04:44 +

> Fix the general case by:
> 
>  (1) Double up all the locking keys used in sockets so that one set are
>  used if the socket is created by userspace and the other set is used
>  if the socket is created by the kernel.
> 
>  (2) Store the kern parameter passed to sk_alloc() in a variable in the
>  sock struct (sk_kern_sock).  This informs sock_lock_init(),
>  sock_init_data() and sk_clone_lock() as to the lock keys to be used.
> 
>  Note that the child created by sk_clone_lock() inherits the parent's
>  kern setting.
> 
>  (3) Add a 'kern' parameter to ->accept() that is analogous to the one
>  passed in to ->create() that distinguishes whether kernel_accept() or
>  sys_accept4() was the caller and can be passed to sk_alloc().
> 
>  Note that a lot of accept functions merely dequeue an already
>  allocated socket.  I haven't touched these as the new socket already
>  exists before we get the parameter.
> 
>  Note also that there are a couple of places where I've made the accepted
>  socket unconditionally kernel-based:
> 
>   irda_accept()
>   rds_rcp_accept_one()
>   tcp_accept_from_sock()
> 
>  because they follow a sock_create_kern() and accept off of that.

I guess this is fine, but I think you can use one of the two "sk_padding"
bits in struct sock instead of making the structure larger.


Re: [RFC PATCH net] net: Work around lockdep limitation in sockets that use sockets

2017-03-08 Thread David Miller
From: David Howells 
Date: Mon, 06 Mar 2017 15:04:44 +

> Fix the general case by:
> 
>  (1) Double up all the locking keys used in sockets so that one set are
>  used if the socket is created by userspace and the other set is used
>  if the socket is created by the kernel.
> 
>  (2) Store the kern parameter passed to sk_alloc() in a variable in the
>  sock struct (sk_kern_sock).  This informs sock_lock_init(),
>  sock_init_data() and sk_clone_lock() as to the lock keys to be used.
> 
>  Note that the child created by sk_clone_lock() inherits the parent's
>  kern setting.
> 
>  (3) Add a 'kern' parameter to ->accept() that is analogous to the one
>  passed in to ->create() that distinguishes whether kernel_accept() or
>  sys_accept4() was the caller and can be passed to sk_alloc().
> 
>  Note that a lot of accept functions merely dequeue an already
>  allocated socket.  I haven't touched these as the new socket already
>  exists before we get the parameter.
> 
>  Note also that there are a couple of places where I've made the accepted
>  socket unconditionally kernel-based:
> 
>   irda_accept()
>   rds_rcp_accept_one()
>   tcp_accept_from_sock()
> 
>  because they follow a sock_create_kern() and accept off of that.

I guess this is fine, but I think you can use one of the two "sk_padding"
bits in struct sock instead of making the structure larger.


RE: [PATCH 22/29] drivers, scsi: convert iscsi_task.refcount from atomic_t to refcount_t

2017-03-08 Thread Reshetova, Elena
> On Mon, Mar 06, 2017 at 04:21:09PM +0200, Elena Reshetova wrote:
> > refcount_t type and corresponding API should be
> > used instead of atomic_t when the variable is used as
> > a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free
> > situations.
> >
> > Signed-off-by: Elena Reshetova 
> > Signed-off-by: Hans Liljestrand 
> > Signed-off-by: Kees Cook 
> > Signed-off-by: David Windsor 
> 
> This looks OK to me.
> 
> Acked-by: Chris Leech 

Thank you for review! Do you have a tree that can take this change? 

Best Regards,
Elena.

> 
> > ---
> >  drivers/scsi/libiscsi.c| 8 
> >  drivers/scsi/qedi/qedi_iscsi.c | 2 +-
> >  include/scsi/libiscsi.h| 3 ++-
> >  3 files changed, 7 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> > index 834d121..7eb1d2c 100644
> > --- a/drivers/scsi/libiscsi.c
> > +++ b/drivers/scsi/libiscsi.c
> > @@ -516,13 +516,13 @@ static void iscsi_free_task(struct iscsi_task *task)
> >
> >  void __iscsi_get_task(struct iscsi_task *task)
> >  {
> > -   atomic_inc(>refcount);
> > +   refcount_inc(>refcount);
> >  }
> >  EXPORT_SYMBOL_GPL(__iscsi_get_task);
> >
> >  void __iscsi_put_task(struct iscsi_task *task)
> >  {
> > -   if (atomic_dec_and_test(>refcount))
> > +   if (refcount_dec_and_test(>refcount))
> > iscsi_free_task(task);
> >  }
> >  EXPORT_SYMBOL_GPL(__iscsi_put_task);
> > @@ -744,7 +744,7 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct
> iscsi_hdr *hdr,
> >  * released by the lld when it has transmitted the task for
> >  * pdus we do not expect a response for.
> >  */
> > -   atomic_set(>refcount, 1);
> > +   refcount_set(>refcount, 1);
> > task->conn = conn;
> > task->sc = NULL;
> > INIT_LIST_HEAD(>running);
> > @@ -1616,7 +1616,7 @@ static inline struct iscsi_task 
> > *iscsi_alloc_task(struct
> iscsi_conn *conn,
> > sc->SCp.phase = conn->session->age;
> > sc->SCp.ptr = (char *) task;
> >
> > -   atomic_set(>refcount, 1);
> > +   refcount_set(>refcount, 1);
> > task->state = ISCSI_TASK_PENDING;
> > task->conn = conn;
> > task->sc = sc;
> > diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
> > index b9f79d3..3895bd5 100644
> > --- a/drivers/scsi/qedi/qedi_iscsi.c
> > +++ b/drivers/scsi/qedi/qedi_iscsi.c
> > @@ -1372,7 +1372,7 @@ static void qedi_cleanup_task(struct iscsi_task *task)
> >  {
> > if (!task->sc || task->state == ISCSI_TASK_PENDING) {
> > QEDI_INFO(NULL, QEDI_LOG_IO, "Returning
> ref_cnt=%d\n",
> > - atomic_read(>refcount));
> > + refcount_read(>refcount));
> > return;
> > }
> >
> > diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
> > index b0e275d..24d74b5 100644
> > --- a/include/scsi/libiscsi.h
> > +++ b/include/scsi/libiscsi.h
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -139,7 +140,7 @@ struct iscsi_task {
> >
> > /* state set/tested under session->lock */
> > int state;
> > -   atomic_trefcount;
> > +   refcount_t  refcount;
> > struct list_headrunning;/* running cmd list */
> > void*dd_data;   /*
> driver/transport data */
> >  };
> > --
> > 2.7.4
> >


RE: [Xen-devel] [PATCH 29/29] drivers, xen: convert grant_map.users from atomic_t to refcount_t

2017-03-08 Thread Reshetova, Elena

> On 03/08/2017 08:49 AM, Reshetova, Elena wrote:
> >> On 03/06/2017 09:21 AM, Elena Reshetova wrote:
> >>> refcount_t type and corresponding API should be
> >>> used instead of atomic_t when the variable is used as
> >>> a reference counter. This allows to avoid accidental
> >>> refcounter overflows that might lead to use-after-free
> >>> situations.
> >>>
> >>> Signed-off-by: Elena Reshetova 
> >>> Signed-off-by: Hans Liljestrand 
> >>> Signed-off-by: Kees Cook 
> >>> Signed-off-by: David Windsor 
> >>> ---
> >>>  drivers/xen/gntdev.c | 11 ++-
> >>>  1 file changed, 6 insertions(+), 5 deletions(-)
> >> Reviewed-by: Boris Ostrovsky 
> > Is there a tree that can take this change? Turns out it is better to 
> > propagate
> changes via separate trees and only leftovers can be taken via Greg's tree.
> >
> 
> Sure, we can take it via Xen tree for rc3.

Thank you very much!

Best Regards,
Elena.

> 
> -boris


RE: [PATCH 22/29] drivers, scsi: convert iscsi_task.refcount from atomic_t to refcount_t

2017-03-08 Thread Reshetova, Elena
> On Mon, Mar 06, 2017 at 04:21:09PM +0200, Elena Reshetova wrote:
> > refcount_t type and corresponding API should be
> > used instead of atomic_t when the variable is used as
> > a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free
> > situations.
> >
> > Signed-off-by: Elena Reshetova 
> > Signed-off-by: Hans Liljestrand 
> > Signed-off-by: Kees Cook 
> > Signed-off-by: David Windsor 
> 
> This looks OK to me.
> 
> Acked-by: Chris Leech 

Thank you for review! Do you have a tree that can take this change? 

Best Regards,
Elena.

> 
> > ---
> >  drivers/scsi/libiscsi.c| 8 
> >  drivers/scsi/qedi/qedi_iscsi.c | 2 +-
> >  include/scsi/libiscsi.h| 3 ++-
> >  3 files changed, 7 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> > index 834d121..7eb1d2c 100644
> > --- a/drivers/scsi/libiscsi.c
> > +++ b/drivers/scsi/libiscsi.c
> > @@ -516,13 +516,13 @@ static void iscsi_free_task(struct iscsi_task *task)
> >
> >  void __iscsi_get_task(struct iscsi_task *task)
> >  {
> > -   atomic_inc(>refcount);
> > +   refcount_inc(>refcount);
> >  }
> >  EXPORT_SYMBOL_GPL(__iscsi_get_task);
> >
> >  void __iscsi_put_task(struct iscsi_task *task)
> >  {
> > -   if (atomic_dec_and_test(>refcount))
> > +   if (refcount_dec_and_test(>refcount))
> > iscsi_free_task(task);
> >  }
> >  EXPORT_SYMBOL_GPL(__iscsi_put_task);
> > @@ -744,7 +744,7 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct
> iscsi_hdr *hdr,
> >  * released by the lld when it has transmitted the task for
> >  * pdus we do not expect a response for.
> >  */
> > -   atomic_set(>refcount, 1);
> > +   refcount_set(>refcount, 1);
> > task->conn = conn;
> > task->sc = NULL;
> > INIT_LIST_HEAD(>running);
> > @@ -1616,7 +1616,7 @@ static inline struct iscsi_task 
> > *iscsi_alloc_task(struct
> iscsi_conn *conn,
> > sc->SCp.phase = conn->session->age;
> > sc->SCp.ptr = (char *) task;
> >
> > -   atomic_set(>refcount, 1);
> > +   refcount_set(>refcount, 1);
> > task->state = ISCSI_TASK_PENDING;
> > task->conn = conn;
> > task->sc = sc;
> > diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
> > index b9f79d3..3895bd5 100644
> > --- a/drivers/scsi/qedi/qedi_iscsi.c
> > +++ b/drivers/scsi/qedi/qedi_iscsi.c
> > @@ -1372,7 +1372,7 @@ static void qedi_cleanup_task(struct iscsi_task *task)
> >  {
> > if (!task->sc || task->state == ISCSI_TASK_PENDING) {
> > QEDI_INFO(NULL, QEDI_LOG_IO, "Returning
> ref_cnt=%d\n",
> > - atomic_read(>refcount));
> > + refcount_read(>refcount));
> > return;
> > }
> >
> > diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
> > index b0e275d..24d74b5 100644
> > --- a/include/scsi/libiscsi.h
> > +++ b/include/scsi/libiscsi.h
> > @@ -29,6 +29,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -139,7 +140,7 @@ struct iscsi_task {
> >
> > /* state set/tested under session->lock */
> > int state;
> > -   atomic_trefcount;
> > +   refcount_t  refcount;
> > struct list_headrunning;/* running cmd list */
> > void*dd_data;   /*
> driver/transport data */
> >  };
> > --
> > 2.7.4
> >


RE: [Xen-devel] [PATCH 29/29] drivers, xen: convert grant_map.users from atomic_t to refcount_t

2017-03-08 Thread Reshetova, Elena

> On 03/08/2017 08:49 AM, Reshetova, Elena wrote:
> >> On 03/06/2017 09:21 AM, Elena Reshetova wrote:
> >>> refcount_t type and corresponding API should be
> >>> used instead of atomic_t when the variable is used as
> >>> a reference counter. This allows to avoid accidental
> >>> refcounter overflows that might lead to use-after-free
> >>> situations.
> >>>
> >>> Signed-off-by: Elena Reshetova 
> >>> Signed-off-by: Hans Liljestrand 
> >>> Signed-off-by: Kees Cook 
> >>> Signed-off-by: David Windsor 
> >>> ---
> >>>  drivers/xen/gntdev.c | 11 ++-
> >>>  1 file changed, 6 insertions(+), 5 deletions(-)
> >> Reviewed-by: Boris Ostrovsky 
> > Is there a tree that can take this change? Turns out it is better to 
> > propagate
> changes via separate trees and only leftovers can be taken via Greg's tree.
> >
> 
> Sure, we can take it via Xen tree for rc3.

Thank you very much!

Best Regards,
Elena.

> 
> -boris


Re: [PATCH 1/3] usb: dwc3-omap: Fix missing break in dwc3_omap_set_mailbox()

2017-03-08 Thread Roger Quadros
Felipe,

On 15/02/17 13:38, Roger Quadros wrote:
> We need to break from all cases if we want to treat
> each one of them separately.
> 
> Reported-by: Gustavo A. R. Silva 
> Fixes: d2728fb3e01f ("usb: dwc3: omap: Pass VBUS and ID events transparently")
> Cc:  #v4.8+
> Signed-off-by: Roger Quadros 

Can you please pick this one for v4.11-rc? Thanks.

> ---
>  drivers/usb/dwc3/dwc3-omap.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/usb/dwc3/dwc3-omap.c b/drivers/usb/dwc3/dwc3-omap.c
> index eb1b9cb..35b6351 100644
> --- a/drivers/usb/dwc3/dwc3-omap.c
> +++ b/drivers/usb/dwc3/dwc3-omap.c
> @@ -250,6 +250,7 @@ static void dwc3_omap_set_mailbox(struct dwc3_omap *omap,
>   val = dwc3_omap_read_utmi_ctrl(omap);
>   val |= USBOTGSS_UTMI_OTG_CTRL_IDDIG;
>   dwc3_omap_write_utmi_ctrl(omap, val);
> + break;
>  
>   case OMAP_DWC3_VBUS_OFF:
>   val = dwc3_omap_read_utmi_ctrl(omap);
> 

-- 
cheers,
-roger


Re: [PATCH 1/3] usb: dwc3-omap: Fix missing break in dwc3_omap_set_mailbox()

2017-03-08 Thread Roger Quadros
Felipe,

On 15/02/17 13:38, Roger Quadros wrote:
> We need to break from all cases if we want to treat
> each one of them separately.
> 
> Reported-by: Gustavo A. R. Silva 
> Fixes: d2728fb3e01f ("usb: dwc3: omap: Pass VBUS and ID events transparently")
> Cc:  #v4.8+
> Signed-off-by: Roger Quadros 

Can you please pick this one for v4.11-rc? Thanks.

> ---
>  drivers/usb/dwc3/dwc3-omap.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/usb/dwc3/dwc3-omap.c b/drivers/usb/dwc3/dwc3-omap.c
> index eb1b9cb..35b6351 100644
> --- a/drivers/usb/dwc3/dwc3-omap.c
> +++ b/drivers/usb/dwc3/dwc3-omap.c
> @@ -250,6 +250,7 @@ static void dwc3_omap_set_mailbox(struct dwc3_omap *omap,
>   val = dwc3_omap_read_utmi_ctrl(omap);
>   val |= USBOTGSS_UTMI_OTG_CTRL_IDDIG;
>   dwc3_omap_write_utmi_ctrl(omap, val);
> + break;
>  
>   case OMAP_DWC3_VBUS_OFF:
>   val = dwc3_omap_read_utmi_ctrl(omap);
> 

-- 
cheers,
-roger


Re: [PATCH v2] selinux: check for address length in selinux_socket_bind()

2017-03-08 Thread David Miller
From: Alexander Potapenko 
Date: Mon,  6 Mar 2017 19:46:14 +0100

> KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
> uninitialized memory in selinux_socket_bind():
 ...
> (the line numbers are relative to 4.8-rc6, but the bug persists upstream)
> 
> , when I run the following program as root:
 ...
> (for different values of |size| other error reports are printed).
> 
> This happens because bind() unconditionally copies |size| bytes of
> |addr| to the kernel, leaving the rest uninitialized. Then
> security_socket_bind() reads the IP address bytes, including the
> uninitialized ones, to determine the port, or e.g. pass them further to
> sel_netnode_find(), which uses them to calculate a hash.
> 
> Signed-off-by: Alexander Potapenko 

Are the SELINUX folks going to pick this up or should I?


Re: [PATCH v5 04/19] net: e100: Replace PCI pool old API

2017-03-08 Thread Peter Senna Tschudin
On Wed, Mar 08, 2017 at 02:40:25PM -0800, Jeff Kirsher wrote:
> On Wed, 2017-03-08 at 17:19 +0100, Romain Perier wrote:
> > The PCI pool API is deprecated. This commit replaces the PCI pool old
> > API by the appropriate function with the DMA pool API.
> > 
> > Signed-off-by: Romain Perier 
> > Acked-by: Peter Senna Tschudin 
> > Tested-by: Peter Senna Tschudin 
> > ---
> >  drivers/net/ethernet/intel/e100.c | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> Acked-by: Jeff Kirsher 
> 
> My only concern is:
> - what hardware did this get tested with?  Since this affects all e100
> parts, it would be hard to believe that all the affected hardware was
> used in testing.

This was tested by compilation only(See
https://lkml.org/lkml/2017/2/8/661). However this series removes macro
definitions of the old pci_pool interface and replace call sites by what
the macra was calling.

Here are the macros that this series removes from include/pci.h:

#define pci_pool dma_pool
#define pci_pool_create(name, pdev, size, align, allocation) \
dma_pool_create(name, >dev, size, align, allocation)
#define pci_pool_destroy(pool) dma_pool_destroy(pool)
#define pci_pool_alloc(pool, flags, handle) dma_pool_alloc(pool, flags, handle)
#define pci_pool_zalloc(pool, flags, handle) \
dma_pool_zalloc(pool, flags, handle)
#define pci_pool_free(pool, vaddr, addr) dma_pool_free(pool, vaddr, add

So this should not affect run time.


Re: [PATCH v2] selinux: check for address length in selinux_socket_bind()

2017-03-08 Thread David Miller
From: Alexander Potapenko 
Date: Mon,  6 Mar 2017 19:46:14 +0100

> KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
> uninitialized memory in selinux_socket_bind():
 ...
> (the line numbers are relative to 4.8-rc6, but the bug persists upstream)
> 
> , when I run the following program as root:
 ...
> (for different values of |size| other error reports are printed).
> 
> This happens because bind() unconditionally copies |size| bytes of
> |addr| to the kernel, leaving the rest uninitialized. Then
> security_socket_bind() reads the IP address bytes, including the
> uninitialized ones, to determine the port, or e.g. pass them further to
> sel_netnode_find(), which uses them to calculate a hash.
> 
> Signed-off-by: Alexander Potapenko 

Are the SELINUX folks going to pick this up or should I?


Re: [PATCH v5 04/19] net: e100: Replace PCI pool old API

2017-03-08 Thread Peter Senna Tschudin
On Wed, Mar 08, 2017 at 02:40:25PM -0800, Jeff Kirsher wrote:
> On Wed, 2017-03-08 at 17:19 +0100, Romain Perier wrote:
> > The PCI pool API is deprecated. This commit replaces the PCI pool old
> > API by the appropriate function with the DMA pool API.
> > 
> > Signed-off-by: Romain Perier 
> > Acked-by: Peter Senna Tschudin 
> > Tested-by: Peter Senna Tschudin 
> > ---
> >  drivers/net/ethernet/intel/e100.c | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> Acked-by: Jeff Kirsher 
> 
> My only concern is:
> - what hardware did this get tested with?  Since this affects all e100
> parts, it would be hard to believe that all the affected hardware was
> used in testing.

This was tested by compilation only(See
https://lkml.org/lkml/2017/2/8/661). However this series removes macro
definitions of the old pci_pool interface and replace call sites by what
the macra was calling.

Here are the macros that this series removes from include/pci.h:

#define pci_pool dma_pool
#define pci_pool_create(name, pdev, size, align, allocation) \
dma_pool_create(name, >dev, size, align, allocation)
#define pci_pool_destroy(pool) dma_pool_destroy(pool)
#define pci_pool_alloc(pool, flags, handle) dma_pool_alloc(pool, flags, handle)
#define pci_pool_zalloc(pool, flags, handle) \
dma_pool_zalloc(pool, flags, handle)
#define pci_pool_free(pool, vaddr, addr) dma_pool_free(pool, vaddr, add

So this should not affect run time.


[PATCH 3/6] mm/migrate: Add copy_pages_mthread function

2017-03-08 Thread Anshuman Khandual
From: Zi Yan 

This change adds a new function copy_pages_mthread to enable multi threaded
page copy which can be utilized during migration. This function splits the
page copy request into multiple threads which will handle individual chunk
and send them as jobs to system_highpri_wq work queue.

Signed-off-by: Zi Yan 
Signed-off-by: Anshuman Khandual 
---
* Updated cthread calculations, taking care of divide by zero issues,
  picking up the right single thread, defining NR_COPYTHREADS, fixing
  the build problem on i386 etc.

 include/linux/highmem.h |  2 ++
 mm/Makefile |  2 ++
 mm/copy_pages_mthread.c | 95 +
 3 files changed, 99 insertions(+)
 create mode 100644 mm/copy_pages_mthread.c

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index bb3f329..e1f4f1b 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -236,6 +236,8 @@ static inline void copy_user_highpage(struct page *to, 
struct page *from,
 
 #endif
 
+int copy_pages_mthread(struct page *to, struct page *from, int nr_pages);
+
 static inline void copy_highpage(struct page *to, struct page *from)
 {
char *vfrom, *vto;
diff --git a/mm/Makefile b/mm/Makefile
index 295bd7a..cc27e76 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -41,6 +41,8 @@ obj-y := filemap.o mempool.o oom_kill.o \
 
 obj-y += init-mm.o
 
+obj-y += copy_pages_mthread.o
+
 ifdef CONFIG_NO_BOOTMEM
obj-y   += nobootmem.o
 else
diff --git a/mm/copy_pages_mthread.c b/mm/copy_pages_mthread.c
new file mode 100644
index 000..5af861c
--- /dev/null
+++ b/mm/copy_pages_mthread.c
@@ -0,0 +1,95 @@
+/*
+ * This implements parallel page copy function through multi
+ * threaded work queues.
+ *
+ * Copyright (C) Zi Yan , Nov 2016
+ *
+ * Licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * NR_COPYTHREADS can be the highest number of threads for given
+ * node on any architecture. The actual number of copy threads
+ * will be limited by the cpumask weight of the target node.
+ */
+#define NR_COPYTHREADS 8
+
+struct copy_info {
+   struct work_struct copy_work;
+   char *to;
+   char *from;
+   unsigned long chunk_size;
+};
+
+static void copy_pages(char *vto, char *vfrom, unsigned long size)
+{
+   memcpy(vto, vfrom, size);
+}
+
+static void copythread(struct work_struct *work)
+{
+   struct copy_info *info = (struct copy_info *) work;
+
+   copy_pages(info->to, info->from, info->chunk_size);
+}
+
+int copy_pages_mthread(struct page *to, struct page *from, int nr_pages)
+{
+   struct cpumask *cpumask;
+   struct copy_info *work_items;
+   char *vto, *vfrom;
+   unsigned long i, cthreads, cpu, node, chunk_size;
+   int cpu_id_list[NR_COPYTHREADS] = {0};
+
+   node = page_to_nid(to);
+   cpumask = (struct cpumask *) cpumask_of_node(node);
+   cthreads = min_t(unsigned int, NR_COPYTHREADS, cpumask_weight(cpumask));
+   cthreads = (cthreads / 2) * 2;
+   if (!cthreads)
+   cthreads = 1;
+
+   work_items = kcalloc(cthreads, sizeof(struct copy_info), GFP_KERNEL);
+   if (!work_items)
+   return -ENOMEM;
+
+   /*
+* XXX: On a memory-only CPU-less NUMA node it will
+* just fallback using cpu[0] in a single threaded
+* manner to do the page copy. On single CPU target
+* node that CPU will be used for the page copy.
+*/
+   i = 0;
+   for_each_cpu(cpu, cpumask) {
+   if (i >= cthreads)
+   break;
+   cpu_id_list[i] = cpu;
+   ++i;
+   }
+
+   vfrom = kmap(from);
+   vto = kmap(to);
+   chunk_size = PAGE_SIZE * nr_pages / cthreads;
+
+   for (i = 0; i < cthreads; ++i) {
+   INIT_WORK((struct work_struct *) _items[i], copythread);
+
+   work_items[i].to = vto + i * chunk_size;
+   work_items[i].from = vfrom + i * chunk_size;
+   work_items[i].chunk_size = chunk_size;
+
+   queue_work_on(cpu_id_list[i], system_highpri_wq,
+   (struct work_struct *) _items[i]);
+   }
+
+   for (i = 0; i < cthreads; ++i)
+   flush_work((struct work_struct *) _items[i]);
+
+   kunmap(to);
+   kunmap(from);
+   kfree(work_items);
+   return 0;
+}
-- 
2.1.4



Re: [PATCH v7 kernel 2/5] virtio-balloon: VIRTIO_BALLOON_F_CHUNK_TRANSFER

2017-03-08 Thread Wei Wang

On 03/08/2017 12:01 PM, Michael S. Tsirkin wrote:

On Fri, Mar 03, 2017 at 01:40:27PM +0800, Wei Wang wrote:

From: Liang Li 

Add a new feature bit, VIRTIO_BALLOON_F_CHUNK_TRANSFER. Please check
the implementation patch commit for details about this feature.


better squash into next patch.


OK, will do.

Best,
Wei


[PATCH 3/6] mm/migrate: Add copy_pages_mthread function

2017-03-08 Thread Anshuman Khandual
From: Zi Yan 

This change adds a new function copy_pages_mthread to enable multi threaded
page copy which can be utilized during migration. This function splits the
page copy request into multiple threads which will handle individual chunk
and send them as jobs to system_highpri_wq work queue.

Signed-off-by: Zi Yan 
Signed-off-by: Anshuman Khandual 
---
* Updated cthread calculations, taking care of divide by zero issues,
  picking up the right single thread, defining NR_COPYTHREADS, fixing
  the build problem on i386 etc.

 include/linux/highmem.h |  2 ++
 mm/Makefile |  2 ++
 mm/copy_pages_mthread.c | 95 +
 3 files changed, 99 insertions(+)
 create mode 100644 mm/copy_pages_mthread.c

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index bb3f329..e1f4f1b 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -236,6 +236,8 @@ static inline void copy_user_highpage(struct page *to, 
struct page *from,
 
 #endif
 
+int copy_pages_mthread(struct page *to, struct page *from, int nr_pages);
+
 static inline void copy_highpage(struct page *to, struct page *from)
 {
char *vfrom, *vto;
diff --git a/mm/Makefile b/mm/Makefile
index 295bd7a..cc27e76 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -41,6 +41,8 @@ obj-y := filemap.o mempool.o oom_kill.o \
 
 obj-y += init-mm.o
 
+obj-y += copy_pages_mthread.o
+
 ifdef CONFIG_NO_BOOTMEM
obj-y   += nobootmem.o
 else
diff --git a/mm/copy_pages_mthread.c b/mm/copy_pages_mthread.c
new file mode 100644
index 000..5af861c
--- /dev/null
+++ b/mm/copy_pages_mthread.c
@@ -0,0 +1,95 @@
+/*
+ * This implements parallel page copy function through multi
+ * threaded work queues.
+ *
+ * Copyright (C) Zi Yan , Nov 2016
+ *
+ * Licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * NR_COPYTHREADS can be the highest number of threads for given
+ * node on any architecture. The actual number of copy threads
+ * will be limited by the cpumask weight of the target node.
+ */
+#define NR_COPYTHREADS 8
+
+struct copy_info {
+   struct work_struct copy_work;
+   char *to;
+   char *from;
+   unsigned long chunk_size;
+};
+
+static void copy_pages(char *vto, char *vfrom, unsigned long size)
+{
+   memcpy(vto, vfrom, size);
+}
+
+static void copythread(struct work_struct *work)
+{
+   struct copy_info *info = (struct copy_info *) work;
+
+   copy_pages(info->to, info->from, info->chunk_size);
+}
+
+int copy_pages_mthread(struct page *to, struct page *from, int nr_pages)
+{
+   struct cpumask *cpumask;
+   struct copy_info *work_items;
+   char *vto, *vfrom;
+   unsigned long i, cthreads, cpu, node, chunk_size;
+   int cpu_id_list[NR_COPYTHREADS] = {0};
+
+   node = page_to_nid(to);
+   cpumask = (struct cpumask *) cpumask_of_node(node);
+   cthreads = min_t(unsigned int, NR_COPYTHREADS, cpumask_weight(cpumask));
+   cthreads = (cthreads / 2) * 2;
+   if (!cthreads)
+   cthreads = 1;
+
+   work_items = kcalloc(cthreads, sizeof(struct copy_info), GFP_KERNEL);
+   if (!work_items)
+   return -ENOMEM;
+
+   /*
+* XXX: On a memory-only CPU-less NUMA node it will
+* just fallback using cpu[0] in a single threaded
+* manner to do the page copy. On single CPU target
+* node that CPU will be used for the page copy.
+*/
+   i = 0;
+   for_each_cpu(cpu, cpumask) {
+   if (i >= cthreads)
+   break;
+   cpu_id_list[i] = cpu;
+   ++i;
+   }
+
+   vfrom = kmap(from);
+   vto = kmap(to);
+   chunk_size = PAGE_SIZE * nr_pages / cthreads;
+
+   for (i = 0; i < cthreads; ++i) {
+   INIT_WORK((struct work_struct *) _items[i], copythread);
+
+   work_items[i].to = vto + i * chunk_size;
+   work_items[i].from = vfrom + i * chunk_size;
+   work_items[i].chunk_size = chunk_size;
+
+   queue_work_on(cpu_id_list[i], system_highpri_wq,
+   (struct work_struct *) _items[i]);
+   }
+
+   for (i = 0; i < cthreads; ++i)
+   flush_work((struct work_struct *) _items[i]);
+
+   kunmap(to);
+   kunmap(from);
+   kfree(work_items);
+   return 0;
+}
-- 
2.1.4



Re: [PATCH v7 kernel 2/5] virtio-balloon: VIRTIO_BALLOON_F_CHUNK_TRANSFER

2017-03-08 Thread Wei Wang

On 03/08/2017 12:01 PM, Michael S. Tsirkin wrote:

On Fri, Mar 03, 2017 at 01:40:27PM +0800, Wei Wang wrote:

From: Liang Li 

Add a new feature bit, VIRTIO_BALLOON_F_CHUNK_TRANSFER. Please check
the implementation patch commit for details about this feature.


better squash into next patch.


OK, will do.

Best,
Wei


Re: [PATCH 3/6] dax: add tracepoint infrastructure, PMD tracing

2017-03-08 Thread Mike Marshall
This is a reply to a thread from back in Nov 2016...

Linus> The thing is, with function tracing, you *can* get the return value
Linus> and arguments. Sure, you'll probably need to write eBPF and just
Linus> attach it to that fentry call point, and yes, if something is inlined
Linus> you're just screwed, but Christ, if you do debugging that way you
Linus> shouldn't be writing kernel code in the first place.

I've been thinking about this thread ever since then... lately I've been
reading about eBPF and looking at Brendan Gregg's bcc tool, and
reading about kprobes and I made some test tracepoints too...

Orangefs has a function:

orangefs_create(struct inode *dir,
struct dentry *dentry,
umode_t mode,
bool exclusive)

I found it was easy to use a kprobe to get a function's return code:

echo 'r:orangefs_create_retprobe orangefs_create $retval' >>
/sys/kernel/debug/tracing/kprobe_events

It was also easy to use a kprobe to print the arguments to a function,
but since arguments are often pointers, maybe the pointer value isn't
really what you want to see, rather you might wish you could "see into"
a structure that you have a pointer to...

Here's a kprobe for looking at the arguments to a function, the funny
di, si, dx, stack
stuff has to do with a particular architecture's parameter passing mechanism as
it relates to registers and the stack... I mention that because the
example in kprobetrace.txt
seems to be for a 32-bit computer and I'm (like most of us?) on a
64-bit computer.

echo 'p:orangefs_create_probe orangefs_create dir=%di dentry=%si
mode=%dx exclusive=+4($stack)' >
/sys/kernel/debug/tracing/kprobe_events

I wrote a bcc program to extract the name of an object from the dentry struct
passed into orangefs_create. The bcc program is kind of verbose, but is
mostly "templatey", here it is... is this the kind of thing Linus was
talking about?

Either way, I think it is kind of cool ...

#!/usr/bin/python
#
# Brendan Gregg's mdflush used as a template.
#

from __future__ import print_function
from bcc import BPF
from time import strftime
import ctypes as ct

# load BPF program
b = BPF(text="""
#include 
#include 
#include 
#include 
#include 

struct data_t {
u64 pid;
char comm[TASK_COMM_LEN];
char objname[NAME_MAX];
};
BPF_PERF_OUTPUT(events);

int kprobe__orangefs_create(struct pt_regs *ctx,
void *dir,
struct dentry *dentry)
{
struct data_t data = {};
u32 pid = bpf_get_current_pid_tgid();
data.pid = pid;
bpf_get_current_comm(, sizeof(data.comm));
bpf_probe_read(,
   sizeof(data.objname),
   (void *)dentry->d_name.name);
events.perf_submit(ctx, , sizeof(data));
return 0;
}
""")

# event data
TASK_COMM_LEN = 16  # linux/sched.h
NAME_MAX = 255  # uapi/linux/limits.h
class Data(ct.Structure):
_fields_ = [
("pid", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN),
("objname", ct.c_char * NAME_MAX)
]

# header
print("orangefs creates... Hit Ctrl-C to end.")
print("%-8s %-6s %-16s %s" % ("TIME", "PID", "COMM", "OBJNAME"))

# process event
# print_event is the callback invoked from open_perf_buffers...
# cpu refers to a set of per-cpu ring buffers that will receive the event data.
def print_event(cpu, data, size):
event = ct.cast(data, ct.POINTER(Data)).contents
print("%-8s %-6d %-16s %s" % (strftime("%H:%M:%S"), event.pid, event.comm,
event.objname))

# read events
b["events"].open_perf_buffer(print_event)
while 1:
b.kprobe_poll()




-Mike

On Mon, Nov 28, 2016 at 9:02 PM, Dave Chinner  wrote:
> On Mon, Nov 28, 2016 at 03:46:51PM -0700, Ross Zwisler wrote:
>> On Fri, Nov 25, 2016 at 02:00:59PM +1100, Dave Chinner wrote:
>> > On Wed, Nov 23, 2016 at 11:44:19AM -0700, Ross Zwisler wrote:
>> > > Tracepoints are the standard way to capture debugging and tracing
>> > > information in many parts of the kernel, including the XFS and ext4
>> > > filesystems.  Create a tracepoint header for FS DAX and add the first DAX
>> > > tracepoints to the PMD fault handler.  This allows the tracing for DAX to
>> > > be done in the same way as the filesystem tracing so that developers can
>> > > look at them together and get a coherent idea of what the system is 
>> > > doing.
>> > >
>> > > I added both an entry and exit tracepoint because future patches will add
>> > > tracepoints to child functions of dax_iomap_pmd_fault() like
>> > > dax_pmd_load_hole() and dax_pmd_insert_mapping(). We want those messages 
>> > > to
>> > > be wrapped by the parent function tracepoints so the code flow is more
>> > > easily understood.  Having entry and exit tracepoints for faults also
>> > > allows us to easily see what filesystems functions were called during the
>> > > fault.  These filesystem functions get executed via iomap_begin() and
>> > > iomap_end() calls, for 

Re: [PATCH 3/6] dax: add tracepoint infrastructure, PMD tracing

2017-03-08 Thread Mike Marshall
This is a reply to a thread from back in Nov 2016...

Linus> The thing is, with function tracing, you *can* get the return value
Linus> and arguments. Sure, you'll probably need to write eBPF and just
Linus> attach it to that fentry call point, and yes, if something is inlined
Linus> you're just screwed, but Christ, if you do debugging that way you
Linus> shouldn't be writing kernel code in the first place.

I've been thinking about this thread ever since then... lately I've been
reading about eBPF and looking at Brendan Gregg's bcc tool, and
reading about kprobes and I made some test tracepoints too...

Orangefs has a function:

orangefs_create(struct inode *dir,
struct dentry *dentry,
umode_t mode,
bool exclusive)

I found it was easy to use a kprobe to get a function's return code:

echo 'r:orangefs_create_retprobe orangefs_create $retval' >>
/sys/kernel/debug/tracing/kprobe_events

It was also easy to use a kprobe to print the arguments to a function,
but since arguments are often pointers, maybe the pointer value isn't
really what you want to see, rather you might wish you could "see into"
a structure that you have a pointer to...

Here's a kprobe for looking at the arguments to a function, the funny
di, si, dx, stack
stuff has to do with a particular architecture's parameter passing mechanism as
it relates to registers and the stack... I mention that because the
example in kprobetrace.txt
seems to be for a 32-bit computer and I'm (like most of us?) on a
64-bit computer.

echo 'p:orangefs_create_probe orangefs_create dir=%di dentry=%si
mode=%dx exclusive=+4($stack)' >
/sys/kernel/debug/tracing/kprobe_events

I wrote a bcc program to extract the name of an object from the dentry struct
passed into orangefs_create. The bcc program is kind of verbose, but is
mostly "templatey", here it is... is this the kind of thing Linus was
talking about?

Either way, I think it is kind of cool ...

#!/usr/bin/python
#
# Brendan Gregg's mdflush used as a template.
#

from __future__ import print_function
from bcc import BPF
from time import strftime
import ctypes as ct

# load BPF program
b = BPF(text="""
#include 
#include 
#include 
#include 
#include 

struct data_t {
u64 pid;
char comm[TASK_COMM_LEN];
char objname[NAME_MAX];
};
BPF_PERF_OUTPUT(events);

int kprobe__orangefs_create(struct pt_regs *ctx,
void *dir,
struct dentry *dentry)
{
struct data_t data = {};
u32 pid = bpf_get_current_pid_tgid();
data.pid = pid;
bpf_get_current_comm(, sizeof(data.comm));
bpf_probe_read(,
   sizeof(data.objname),
   (void *)dentry->d_name.name);
events.perf_submit(ctx, , sizeof(data));
return 0;
}
""")

# event data
TASK_COMM_LEN = 16  # linux/sched.h
NAME_MAX = 255  # uapi/linux/limits.h
class Data(ct.Structure):
_fields_ = [
("pid", ct.c_ulonglong),
("comm", ct.c_char * TASK_COMM_LEN),
("objname", ct.c_char * NAME_MAX)
]

# header
print("orangefs creates... Hit Ctrl-C to end.")
print("%-8s %-6s %-16s %s" % ("TIME", "PID", "COMM", "OBJNAME"))

# process event
# print_event is the callback invoked from open_perf_buffers...
# cpu refers to a set of per-cpu ring buffers that will receive the event data.
def print_event(cpu, data, size):
event = ct.cast(data, ct.POINTER(Data)).contents
print("%-8s %-6d %-16s %s" % (strftime("%H:%M:%S"), event.pid, event.comm,
event.objname))

# read events
b["events"].open_perf_buffer(print_event)
while 1:
b.kprobe_poll()




-Mike

On Mon, Nov 28, 2016 at 9:02 PM, Dave Chinner  wrote:
> On Mon, Nov 28, 2016 at 03:46:51PM -0700, Ross Zwisler wrote:
>> On Fri, Nov 25, 2016 at 02:00:59PM +1100, Dave Chinner wrote:
>> > On Wed, Nov 23, 2016 at 11:44:19AM -0700, Ross Zwisler wrote:
>> > > Tracepoints are the standard way to capture debugging and tracing
>> > > information in many parts of the kernel, including the XFS and ext4
>> > > filesystems.  Create a tracepoint header for FS DAX and add the first DAX
>> > > tracepoints to the PMD fault handler.  This allows the tracing for DAX to
>> > > be done in the same way as the filesystem tracing so that developers can
>> > > look at them together and get a coherent idea of what the system is 
>> > > doing.
>> > >
>> > > I added both an entry and exit tracepoint because future patches will add
>> > > tracepoints to child functions of dax_iomap_pmd_fault() like
>> > > dax_pmd_load_hole() and dax_pmd_insert_mapping(). We want those messages 
>> > > to
>> > > be wrapped by the parent function tracepoints so the code flow is more
>> > > easily understood.  Having entry and exit tracepoints for faults also
>> > > allows us to easily see what filesystems functions were called during the
>> > > fault.  These filesystem functions get executed via iomap_begin() and
>> > > iomap_end() calls, for example, and will have 

RE: [PATCH 3/5] fs, xfs: convert xlog_ticket.t_ref from atomic_t to refcount_t

2017-03-08 Thread Reshetova, Elena

> On Wed, Mar 8, 2017 at 7:50 AM, Christoph Hellwig  wrote:
> >> - ASSERT(atomic_read(>t_ref) > 0);
> >> - atomic_inc(>t_ref);
> >> + ASSERT(refcount_read(>t_ref) > 0);
> >> + refcount_inc(>t_ref);
> >
> > With strict refcount semantics refcount_inc should check that
> > the count is larger than 0, otherwise we'd need to use
> > recount_inc_not_zero or whatever you're going to call it.
> >
> > Is that something the recount code does / could do?
> 
> Yes, refcount_inc() will not increment from 0 and WARNs. It looks like
> xfs's ASSERT is also a warn (though with XFS-specific formatting), so
> perhaps the ASSERT could be dropped? IIUC, Elena's approach to these
> changes was to be conservative about removing the existing checks.

I am removing the existing WARNs now where they are 100% not needed, but 
leaving the ones like this ASSERT, 
because I am thinking that you might have test cases that are dependent on 
these ASSERTs with specific formatting and I don't want to break them. 
If you don't need them, I can remove them also. 

Best Regards,
Elena

> 
> -Kees
> 
> --
> Kees Cook
> Pixel Security


RE: [PATCH 3/5] fs, xfs: convert xlog_ticket.t_ref from atomic_t to refcount_t

2017-03-08 Thread Reshetova, Elena

> On Wed, Mar 8, 2017 at 7:50 AM, Christoph Hellwig  wrote:
> >> - ASSERT(atomic_read(>t_ref) > 0);
> >> - atomic_inc(>t_ref);
> >> + ASSERT(refcount_read(>t_ref) > 0);
> >> + refcount_inc(>t_ref);
> >
> > With strict refcount semantics refcount_inc should check that
> > the count is larger than 0, otherwise we'd need to use
> > recount_inc_not_zero or whatever you're going to call it.
> >
> > Is that something the recount code does / could do?
> 
> Yes, refcount_inc() will not increment from 0 and WARNs. It looks like
> xfs's ASSERT is also a warn (though with XFS-specific formatting), so
> perhaps the ASSERT could be dropped? IIUC, Elena's approach to these
> changes was to be conservative about removing the existing checks.

I am removing the existing WARNs now where they are 100% not needed, but 
leaving the ones like this ASSERT, 
because I am thinking that you might have test cases that are dependent on 
these ASSERTs with specific formatting and I don't want to break them. 
If you don't need them, I can remove them also. 

Best Regards,
Elena

> 
> -Kees
> 
> --
> Kees Cook
> Pixel Security


[PATCH 3/3] Documentation: cpu-freq: cpu-drivers: Fix repetition of word 'to'

2017-03-08 Thread sayli karnik
The patch replaces 'to to' with 'to' in the documentation.

Signed-off-by: sayli karnik 
---
 Documentation/cpu-freq/cpu-drivers.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/cpu-freq/cpu-drivers.txt 
b/Documentation/cpu-freq/cpu-drivers.txt
index f71e6be..434c49c 100644
--- a/Documentation/cpu-freq/cpu-drivers.txt
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -231,7 +231,7 @@ the reference implementation in drivers/cpufreq/longrun.c
 Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
 
 get_intermediate should return a stable intermediate frequency platform wants 
to
-switch to, and target_intermediate() should set CPU to to that frequency, 
before
+switch to, and target_intermediate() should set CPU to that frequency, before
 jumping to the frequency corresponding to 'index'. Core will take care of
 sending notifications and driver doesn't have to handle them in
 target_intermediate() or target_index().
-- 
2.7.4



[PATCH 3/3] Documentation: cpu-freq: cpu-drivers: Fix repetition of word 'to'

2017-03-08 Thread sayli karnik
The patch replaces 'to to' with 'to' in the documentation.

Signed-off-by: sayli karnik 
---
 Documentation/cpu-freq/cpu-drivers.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/cpu-freq/cpu-drivers.txt 
b/Documentation/cpu-freq/cpu-drivers.txt
index f71e6be..434c49c 100644
--- a/Documentation/cpu-freq/cpu-drivers.txt
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -231,7 +231,7 @@ the reference implementation in drivers/cpufreq/longrun.c
 Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
 
 get_intermediate should return a stable intermediate frequency platform wants 
to
-switch to, and target_intermediate() should set CPU to to that frequency, 
before
+switch to, and target_intermediate() should set CPU to that frequency, before
 jumping to the frequency corresponding to 'index'. Core will take care of
 sending notifications and driver doesn't have to handle them in
 target_intermediate() or target_index().
-- 
2.7.4



[PATCH 0/3] Fix unintentional repetition of words

2017-03-08 Thread sayli karnik
Fix typos in the form of consecutive repetition of words.

sayli karnik (3):
  Documentation: phy: Fix repetition of word 'the'
  Documentation: ABI: testing: sysfs-bus-pci: Fix repetition of word
'the'
  Documentation: cpu-freq: cpu-drivers: Fix repetition of word 'to'

 Documentation/ABI/testing/sysfs-bus-pci | 2 +-
 Documentation/cpu-freq/cpu-drivers.txt  | 2 +-
 Documentation/phy.txt   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

-- 
2.7.4



[PATCH 0/3] Fix unintentional repetition of words

2017-03-08 Thread sayli karnik
Fix typos in the form of consecutive repetition of words.

sayli karnik (3):
  Documentation: phy: Fix repetition of word 'the'
  Documentation: ABI: testing: sysfs-bus-pci: Fix repetition of word
'the'
  Documentation: cpu-freq: cpu-drivers: Fix repetition of word 'to'

 Documentation/ABI/testing/sysfs-bus-pci | 2 +-
 Documentation/cpu-freq/cpu-drivers.txt  | 2 +-
 Documentation/phy.txt   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

-- 
2.7.4



[PATCH 2/3] Documentation: ABI: testing: sysfs-bus-pci: Fix repetition of word 'the'

2017-03-08 Thread sayli karnik
The patch replaces 'the the' with 'the' in the documentation.

Signed-off-by: sayli karnik 
---
 Documentation/ABI/testing/sysfs-bus-pci | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
b/Documentation/ABI/testing/sysfs-bus-pci
index 5a1732b..e4e9010 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -299,5 +299,5 @@ What:   /sys/bus/pci/devices/.../revision
 Date:  November 2016
 Contact:   Emil Velikov 
 Description:
-   This file contains the revision field of the the PCI device.
+   This file contains the revision field of the PCI device.
The value comes from device config space. The file is read only.
-- 
2.7.4



[PATCH 2/3] Documentation: ABI: testing: sysfs-bus-pci: Fix repetition of word 'the'

2017-03-08 Thread sayli karnik
The patch replaces 'the the' with 'the' in the documentation.

Signed-off-by: sayli karnik 
---
 Documentation/ABI/testing/sysfs-bus-pci | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
b/Documentation/ABI/testing/sysfs-bus-pci
index 5a1732b..e4e9010 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -299,5 +299,5 @@ What:   /sys/bus/pci/devices/.../revision
 Date:  November 2016
 Contact:   Emil Velikov 
 Description:
-   This file contains the revision field of the the PCI device.
+   This file contains the revision field of the PCI device.
The value comes from device config space. The file is read only.
-- 
2.7.4



[PATCH 1/3] Documentation: phy: Fix repetition of word 'the'

2017-03-08 Thread sayli karnik
The patch replaces 'the the' with 'the' in the documantation.

Signed-off-by: sayli karnik 
---
 Documentation/phy.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/phy.txt b/Documentation/phy.txt
index 0aa994b..383cdd8 100644
--- a/Documentation/phy.txt
+++ b/Documentation/phy.txt
@@ -97,7 +97,7 @@ should contain the phy name as given in the dt data and in 
the case of
 non-dt boot, it should contain the label of the PHY.  The two
 devm_phy_get associates the device with the PHY using devres on
 successful PHY get. On driver detach, release function is invoked on
-the the devres data and devres data is freed. phy_optional_get and
+the devres data and devres data is freed. phy_optional_get and
 devm_phy_optional_get should be used when the phy is optional. These
 two functions will never return -ENODEV, but instead returns NULL when
 the phy cannot be found.Some generic drivers, such as ehci, may use multiple
-- 
2.7.4



[PATCH 1/3] Documentation: phy: Fix repetition of word 'the'

2017-03-08 Thread sayli karnik
The patch replaces 'the the' with 'the' in the documantation.

Signed-off-by: sayli karnik 
---
 Documentation/phy.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/phy.txt b/Documentation/phy.txt
index 0aa994b..383cdd8 100644
--- a/Documentation/phy.txt
+++ b/Documentation/phy.txt
@@ -97,7 +97,7 @@ should contain the phy name as given in the dt data and in 
the case of
 non-dt boot, it should contain the label of the PHY.  The two
 devm_phy_get associates the device with the PHY using devres on
 successful PHY get. On driver detach, release function is invoked on
-the the devres data and devres data is freed. phy_optional_get and
+the devres data and devres data is freed. phy_optional_get and
 devm_phy_optional_get should be used when the phy is optional. These
 two functions will never return -ENODEV, but instead returns NULL when
 the phy cannot be found.Some generic drivers, such as ehci, may use multiple
-- 
2.7.4



Re: [PATCH] pinctrl: samsung: fix segfault when using external interrupts on s3c24xx

2017-03-08 Thread Krzysztof Kozlowski
On Thu, Mar 9, 2017 at 7:56 AM, Tomasz Figa  wrote:
> 2017-03-09 1:34 GMT+09:00 Krzysztof Kozlowski :
>> On Mon, Mar 06, 2017 at 09:15:16AM -0400, Sergio Prado wrote:
>>> Hi Krzysztof,
>>>
>>> > > This is a regression from commit 
>>> > > 8b1bd11c1f8f529057369c5b3702d13fd24e2765.
>>> >
>>> > Checkpatch should complain here about commit format.
>>> >
>>> > >
>>> > > Tested on FriendlyARM mini2440.
>>> > >
>>> >
>>> > Please add:
>>> >   Fixes: 8b1bd11c1f8f ("pinctrl: samsung: Add the support the multiple 
>>> > IORESOURCE_MEM for one pin-bank")
>>> >   Cc: 
>>> >
>>>
>>> OK.
>>>
>>> > > Signed-off-by: Sergio Prado 
>>> > > ---
>>> > >  drivers/pinctrl/samsung/pinctrl-s3c24xx.c | 4 ++--
>>> > >  1 file changed, 2 insertions(+), 2 deletions(-)
>>> > >
>>> > > diff --git a/drivers/pinctrl/samsung/pinctrl-s3c24xx.c 
>>> > > b/drivers/pinctrl/samsung/pinctrl-s3c24xx.c
>>> > > index b82a003546ae..1b8d887796e8 100644
>>> > > --- a/drivers/pinctrl/samsung/pinctrl-s3c24xx.c
>>> > > +++ b/drivers/pinctrl/samsung/pinctrl-s3c24xx.c
>>> > > @@ -356,8 +356,8 @@ static inline void s3c24xx_demux_eint(struct 
>>> > > irq_desc *desc,
>>> > >  {
>>> > >   struct s3c24xx_eint_data *data = irq_desc_get_handler_data(desc);
>>> > >   struct irq_chip *chip = irq_desc_get_chip(desc);
>>> > > - struct irq_data *irqd = irq_desc_get_irq_data(desc);
>>> > > - struct samsung_pin_bank *bank = irq_data_get_irq_chip_data(irqd);
>>> > > + struct samsung_pinctrl_drv_data *d = data->drvdata;
>>> > > + struct samsung_pin_bank *bank = d->pin_banks;
>>> >
>>> > I think 'pin_banks' point to all banks of given controller not to the
>>> > currently accessed one.
>>>
>>> Understood. I think it worked in my tests because on s3c2440 all banks
>>> have the same eint base address.
>>>
>>> So what do you think is the best approach to solve this problem?
>>
>> Maybe you can get to this through:
>> s3c24xx_eint_domain_data = 
>> s3c24xx_eint_data->domains[virq].host_data;
>> s3c24xx_eint_domain_data->bank
>>
>> It is getting slightly more complicated...
>
> How about the suggestions I made in my reply from March 4 (JST)?

Yes, this also looks like solution. I am not sure how much you would
like to revert but wouldn't it create duplicated members in pinctrl
structures? One for Exynos and other for S3C?

Best regards,
Krzysztof


Re: [PATCH] pinctrl: samsung: fix segfault when using external interrupts on s3c24xx

2017-03-08 Thread Krzysztof Kozlowski
On Thu, Mar 9, 2017 at 7:56 AM, Tomasz Figa  wrote:
> 2017-03-09 1:34 GMT+09:00 Krzysztof Kozlowski :
>> On Mon, Mar 06, 2017 at 09:15:16AM -0400, Sergio Prado wrote:
>>> Hi Krzysztof,
>>>
>>> > > This is a regression from commit 
>>> > > 8b1bd11c1f8f529057369c5b3702d13fd24e2765.
>>> >
>>> > Checkpatch should complain here about commit format.
>>> >
>>> > >
>>> > > Tested on FriendlyARM mini2440.
>>> > >
>>> >
>>> > Please add:
>>> >   Fixes: 8b1bd11c1f8f ("pinctrl: samsung: Add the support the multiple 
>>> > IORESOURCE_MEM for one pin-bank")
>>> >   Cc: 
>>> >
>>>
>>> OK.
>>>
>>> > > Signed-off-by: Sergio Prado 
>>> > > ---
>>> > >  drivers/pinctrl/samsung/pinctrl-s3c24xx.c | 4 ++--
>>> > >  1 file changed, 2 insertions(+), 2 deletions(-)
>>> > >
>>> > > diff --git a/drivers/pinctrl/samsung/pinctrl-s3c24xx.c 
>>> > > b/drivers/pinctrl/samsung/pinctrl-s3c24xx.c
>>> > > index b82a003546ae..1b8d887796e8 100644
>>> > > --- a/drivers/pinctrl/samsung/pinctrl-s3c24xx.c
>>> > > +++ b/drivers/pinctrl/samsung/pinctrl-s3c24xx.c
>>> > > @@ -356,8 +356,8 @@ static inline void s3c24xx_demux_eint(struct 
>>> > > irq_desc *desc,
>>> > >  {
>>> > >   struct s3c24xx_eint_data *data = irq_desc_get_handler_data(desc);
>>> > >   struct irq_chip *chip = irq_desc_get_chip(desc);
>>> > > - struct irq_data *irqd = irq_desc_get_irq_data(desc);
>>> > > - struct samsung_pin_bank *bank = irq_data_get_irq_chip_data(irqd);
>>> > > + struct samsung_pinctrl_drv_data *d = data->drvdata;
>>> > > + struct samsung_pin_bank *bank = d->pin_banks;
>>> >
>>> > I think 'pin_banks' point to all banks of given controller not to the
>>> > currently accessed one.
>>>
>>> Understood. I think it worked in my tests because on s3c2440 all banks
>>> have the same eint base address.
>>>
>>> So what do you think is the best approach to solve this problem?
>>
>> Maybe you can get to this through:
>> s3c24xx_eint_domain_data = 
>> s3c24xx_eint_data->domains[virq].host_data;
>> s3c24xx_eint_domain_data->bank
>>
>> It is getting slightly more complicated...
>
> How about the suggestions I made in my reply from March 4 (JST)?

Yes, this also looks like solution. I am not sure how much you would
like to revert but wouldn't it create duplicated members in pinctrl
structures? One for Exynos and other for S3C?

Best regards,
Krzysztof


Re: [RFC 08/11] mm: make ttu's return boolean

2017-03-08 Thread John Hubbard

On 03/08/2017 10:37 PM, Minchan Kim wrote:
>[...]


I think it's the matter of taste.

if (try_to_unmap(xxx))
something
else
something

It's perfectly understandable to me. IOW, if try_to_unmap returns true,
it means it did unmap successfully. Otherwise, failed.

IMHO, SWAP_SUCCESS or TTU_RESULT_* seems to be an over-engineering.
If the user want it, user can do it by introducing right variable name
in his context. See below.


I'm OK with that approach. Just something to avoid the "what does !ret mean in this 
function call" is what I was looking for...




[...]

forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL);
-   kill_procs(, forcekill, trapno,
- ret != SWAP_SUCCESS, p, pfn, flags);
+   kill_procs(, forcekill, trapno, !ret , p, pfn, flags);


The kill_procs() invocation was a little more readable before.


Indeed but I think it's not a problem of try_to_unmap but ret variable name
isn't good any more. How about this?

bool unmap_success;

unmap_success = try_to_unmap(hpage, ttu);

..

kill_procs(, forcekill, trapno, !unmap_success , p, pfn, flags);

..

return unmap_success;

My point is user can introduce whatever variable name depends on his
context. No need to make return variable complicated, IMHO.


Yes, the local variable basically achieves what I was hoping for, so sure, works for 
me.



[...]

-   case SWAP_FAIL:


Again: the SWAP_FAIL makes it crystal clear which case we're in.


To me, I don't feel it.
To me, below is perfectly understandable.

if (try_to_unmap())
do something

That's why I think it's matter of taste. Okay, I admit I might be
biased, too so I will consider what you suggested if others votes
it.


Yes, if it's really just a matter of taste, then not worth debating. Your change 
above is fine I think.


thanks
john h



Thanks.



Re: [RFC 08/11] mm: make ttu's return boolean

2017-03-08 Thread John Hubbard

On 03/08/2017 10:37 PM, Minchan Kim wrote:
>[...]


I think it's the matter of taste.

if (try_to_unmap(xxx))
something
else
something

It's perfectly understandable to me. IOW, if try_to_unmap returns true,
it means it did unmap successfully. Otherwise, failed.

IMHO, SWAP_SUCCESS or TTU_RESULT_* seems to be an over-engineering.
If the user want it, user can do it by introducing right variable name
in his context. See below.


I'm OK with that approach. Just something to avoid the "what does !ret mean in this 
function call" is what I was looking for...




[...]

forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL);
-   kill_procs(, forcekill, trapno,
- ret != SWAP_SUCCESS, p, pfn, flags);
+   kill_procs(, forcekill, trapno, !ret , p, pfn, flags);


The kill_procs() invocation was a little more readable before.


Indeed but I think it's not a problem of try_to_unmap but ret variable name
isn't good any more. How about this?

bool unmap_success;

unmap_success = try_to_unmap(hpage, ttu);

..

kill_procs(, forcekill, trapno, !unmap_success , p, pfn, flags);

..

return unmap_success;

My point is user can introduce whatever variable name depends on his
context. No need to make return variable complicated, IMHO.


Yes, the local variable basically achieves what I was hoping for, so sure, works for 
me.



[...]

-   case SWAP_FAIL:


Again: the SWAP_FAIL makes it crystal clear which case we're in.


To me, I don't feel it.
To me, below is perfectly understandable.

if (try_to_unmap())
do something

That's why I think it's matter of taste. Okay, I admit I might be
biased, too so I will consider what you suggested if others votes
it.


Yes, if it's really just a matter of taste, then not worth debating. Your change 
above is fine I think.


thanks
john h



Thanks.



[RESEND PATCH v3 6/7] PCI: dwc: designware: Move _unroll configurations to a separate function

2017-03-08 Thread Kishon Vijay Abraham I
No functional change. Rename dw_pcie_writel_unroll/dw_pcie_readl_unroll
to dw_pcie_writel_ob_unroll/dw_pcie_readl_ob_unroll respectively as these
functions are used to perform only outbound configurations. Also move
these _unroll configurations to a separate function.

Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pcie-designware.c |  112 ++---
 1 file changed, 67 insertions(+), 45 deletions(-)

diff --git a/drivers/pci/dwc/pcie-designware.c 
b/drivers/pci/dwc/pcie-designware.c
index 557ee53..6657a84 100644
--- a/drivers/pci/dwc/pcie-designware.c
+++ b/drivers/pci/dwc/pcie-designware.c
@@ -92,22 +92,64 @@ void dw_pcie_write_dbi(struct dw_pcie *pci, void __iomem 
*base, u32 reg,
dev_err(pci->dev, "write DBI address failed\n");
 }
 
-static u32 dw_pcie_readl_unroll(struct dw_pcie *pci, void __iomem *base,
-   u32 index, u32 reg)
+static u32 dw_pcie_readl_ob_unroll(struct dw_pcie *pci, void __iomem *base,
+  u32 index, u32 reg)
 {
u32 offset = PCIE_GET_ATU_OUTB_UNR_REG_OFFSET(index);
 
return dw_pcie_read_dbi(pci, base, offset + reg, 0x4);
 }
 
-static void dw_pcie_writel_unroll(struct dw_pcie *pci, void __iomem *base,
- u32 index, u32 reg, u32 val)
+static void dw_pcie_writel_ob_unroll(struct dw_pcie *pci, void __iomem *base,
+u32 index, u32 reg, u32 val)
 {
u32 offset = PCIE_GET_ATU_OUTB_UNR_REG_OFFSET(index);
 
dw_pcie_write_dbi(pci, base, offset + reg, 0x4, val);
 }
 
+void dw_pcie_prog_outbound_atu_unroll(struct dw_pcie *pci, int index, int type,
+ u64 cpu_addr, u64 pci_addr, u32 size)
+{
+   u32 retries, val;
+   void __iomem *base = pci->dbi_base;
+
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_LOWER_BASE,
+lower_32_bits(cpu_addr));
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_UPPER_BASE,
+upper_32_bits(cpu_addr));
+   dw_pcie_writel_ob_unroll(pci, base, index, PCIE_ATU_UNR_LIMIT,
+lower_32_bits(cpu_addr + size - 1));
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_LOWER_TARGET,
+lower_32_bits(pci_addr));
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_UPPER_TARGET,
+upper_32_bits(pci_addr));
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_REGION_CTRL1,
+type);
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_REGION_CTRL2,
+PCIE_ATU_ENABLE);
+
+   /*
+* Make sure ATU enable takes effect before any subsequent config
+* and I/O accesses.
+*/
+   for (retries = 0; retries < LINK_WAIT_MAX_IATU_RETRIES; retries++) {
+   val = dw_pcie_readl_ob_unroll(pci, base, index,
+ PCIE_ATU_UNR_REGION_CTRL2);
+   if (val & PCIE_ATU_ENABLE)
+   return;
+
+   usleep_range(LINK_WAIT_IATU_MIN, LINK_WAIT_IATU_MAX);
+   }
+   dev_err(pci->dev, "outbound iATU is not being enabled\n");
+}
+
 void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int index, int type,
   u64 cpu_addr, u64 pci_addr, u32 size)
 {
@@ -118,59 +160,39 @@ void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int 
index, int type,
cpu_addr = pci->ops->cpu_addr_fixup(cpu_addr);
 
if (pci->iatu_unroll_enabled) {
-   dw_pcie_writel_unroll(pci, base, index, PCIE_ATU_UNR_LOWER_BASE,
- lower_32_bits(cpu_addr));
-   dw_pcie_writel_unroll(pci, base, index, PCIE_ATU_UNR_UPPER_BASE,
- upper_32_bits(cpu_addr));
-   dw_pcie_writel_unroll(pci, base, index, PCIE_ATU_UNR_LIMIT,
- lower_32_bits(cpu_addr + size - 1));
-   dw_pcie_writel_unroll(pci, base, index,
- PCIE_ATU_UNR_LOWER_TARGET,
- lower_32_bits(pci_addr));
-   dw_pcie_writel_unroll(pci, base, index,
- PCIE_ATU_UNR_UPPER_TARGET,
- upper_32_bits(pci_addr));
-   dw_pcie_writel_unroll(pci, base, index,
- PCIE_ATU_UNR_REGION_CTRL1,
- type);
-   dw_pcie_writel_unroll(pci, base, index,
- 

[RESEND PATCH v3 6/7] PCI: dwc: designware: Move _unroll configurations to a separate function

2017-03-08 Thread Kishon Vijay Abraham I
No functional change. Rename dw_pcie_writel_unroll/dw_pcie_readl_unroll
to dw_pcie_writel_ob_unroll/dw_pcie_readl_ob_unroll respectively as these
functions are used to perform only outbound configurations. Also move
these _unroll configurations to a separate function.

Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pcie-designware.c |  112 ++---
 1 file changed, 67 insertions(+), 45 deletions(-)

diff --git a/drivers/pci/dwc/pcie-designware.c 
b/drivers/pci/dwc/pcie-designware.c
index 557ee53..6657a84 100644
--- a/drivers/pci/dwc/pcie-designware.c
+++ b/drivers/pci/dwc/pcie-designware.c
@@ -92,22 +92,64 @@ void dw_pcie_write_dbi(struct dw_pcie *pci, void __iomem 
*base, u32 reg,
dev_err(pci->dev, "write DBI address failed\n");
 }
 
-static u32 dw_pcie_readl_unroll(struct dw_pcie *pci, void __iomem *base,
-   u32 index, u32 reg)
+static u32 dw_pcie_readl_ob_unroll(struct dw_pcie *pci, void __iomem *base,
+  u32 index, u32 reg)
 {
u32 offset = PCIE_GET_ATU_OUTB_UNR_REG_OFFSET(index);
 
return dw_pcie_read_dbi(pci, base, offset + reg, 0x4);
 }
 
-static void dw_pcie_writel_unroll(struct dw_pcie *pci, void __iomem *base,
- u32 index, u32 reg, u32 val)
+static void dw_pcie_writel_ob_unroll(struct dw_pcie *pci, void __iomem *base,
+u32 index, u32 reg, u32 val)
 {
u32 offset = PCIE_GET_ATU_OUTB_UNR_REG_OFFSET(index);
 
dw_pcie_write_dbi(pci, base, offset + reg, 0x4, val);
 }
 
+void dw_pcie_prog_outbound_atu_unroll(struct dw_pcie *pci, int index, int type,
+ u64 cpu_addr, u64 pci_addr, u32 size)
+{
+   u32 retries, val;
+   void __iomem *base = pci->dbi_base;
+
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_LOWER_BASE,
+lower_32_bits(cpu_addr));
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_UPPER_BASE,
+upper_32_bits(cpu_addr));
+   dw_pcie_writel_ob_unroll(pci, base, index, PCIE_ATU_UNR_LIMIT,
+lower_32_bits(cpu_addr + size - 1));
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_LOWER_TARGET,
+lower_32_bits(pci_addr));
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_UPPER_TARGET,
+upper_32_bits(pci_addr));
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_REGION_CTRL1,
+type);
+   dw_pcie_writel_ob_unroll(pci, base, index,
+PCIE_ATU_UNR_REGION_CTRL2,
+PCIE_ATU_ENABLE);
+
+   /*
+* Make sure ATU enable takes effect before any subsequent config
+* and I/O accesses.
+*/
+   for (retries = 0; retries < LINK_WAIT_MAX_IATU_RETRIES; retries++) {
+   val = dw_pcie_readl_ob_unroll(pci, base, index,
+ PCIE_ATU_UNR_REGION_CTRL2);
+   if (val & PCIE_ATU_ENABLE)
+   return;
+
+   usleep_range(LINK_WAIT_IATU_MIN, LINK_WAIT_IATU_MAX);
+   }
+   dev_err(pci->dev, "outbound iATU is not being enabled\n");
+}
+
 void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int index, int type,
   u64 cpu_addr, u64 pci_addr, u32 size)
 {
@@ -118,59 +160,39 @@ void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int 
index, int type,
cpu_addr = pci->ops->cpu_addr_fixup(cpu_addr);
 
if (pci->iatu_unroll_enabled) {
-   dw_pcie_writel_unroll(pci, base, index, PCIE_ATU_UNR_LOWER_BASE,
- lower_32_bits(cpu_addr));
-   dw_pcie_writel_unroll(pci, base, index, PCIE_ATU_UNR_UPPER_BASE,
- upper_32_bits(cpu_addr));
-   dw_pcie_writel_unroll(pci, base, index, PCIE_ATU_UNR_LIMIT,
- lower_32_bits(cpu_addr + size - 1));
-   dw_pcie_writel_unroll(pci, base, index,
- PCIE_ATU_UNR_LOWER_TARGET,
- lower_32_bits(pci_addr));
-   dw_pcie_writel_unroll(pci, base, index,
- PCIE_ATU_UNR_UPPER_TARGET,
- upper_32_bits(pci_addr));
-   dw_pcie_writel_unroll(pci, base, index,
- PCIE_ATU_UNR_REGION_CTRL1,
- type);
-   dw_pcie_writel_unroll(pci, base, index,
- PCIE_ATU_UNR_REGION_CTRL2,
- 

Re: [PATCH] mm: Do not use double negation for testing page flags

2017-03-08 Thread Minchan Kim
Hi Vlastimil,

On Wed, Mar 08, 2017 at 08:51:23AM +0100, Vlastimil Babka wrote:
> On 03/08/2017 06:25 AM, Minchan Kim wrote:
> > Hi Anshuman,
> > 
> > On Tue, Mar 07, 2017 at 09:31:18PM +0530, Anshuman Khandual wrote:
> >> On 03/07/2017 12:06 PM, Minchan Kim wrote:
> >>> With the discussion[1], I found it seems there are every PageFlags
> >>> functions return bool at this moment so we don't need double
> >>> negation any more.
> >>> Although it's not a problem to keep it, it makes future users
> >>> confused to use dobule negation for them, too.
> >>>
> >>> Remove such possibility.
> >>
> >> A quick search of '!!Page' in the source tree does not show any other
> >> place having this double negation. So I guess this is all which need
> >> to be fixed.
> > 
> > Yeb. That's the why my patch includes only khugepagd part but my
> > concern is PageFlags returns int type not boolean so user might
> > be confused easily and tempted to use dobule negation.
> > 
> > Other side is they who create new custom PageXXX(e.g., PageMovable)
> > should keep it in mind that they should return 0 or 1 although
> > fucntion prototype's return value is int type.
> 
> > It shouldn't be
> > documented nowhere.
> 
> Was this double negation intentional? :P

Nice catch!
It seems you have a crystal ball. ;-)

> 
> > Although we can add a little description
> > somewhere in page-flags.h, I believe changing to boolean is more
> > clear/not-error-prone so Chen's work is enough worth, I think.
> 
> Agree, unless some arches benefit from the int by performance
> for some reason (no idea if it's possible).
> 
> Anyway, to your original patch:
> 
> Acked-by: Vlastimil Babka 

Thanks!


Re: [PATCH] mm: Do not use double negation for testing page flags

2017-03-08 Thread Minchan Kim
Hi Vlastimil,

On Wed, Mar 08, 2017 at 08:51:23AM +0100, Vlastimil Babka wrote:
> On 03/08/2017 06:25 AM, Minchan Kim wrote:
> > Hi Anshuman,
> > 
> > On Tue, Mar 07, 2017 at 09:31:18PM +0530, Anshuman Khandual wrote:
> >> On 03/07/2017 12:06 PM, Minchan Kim wrote:
> >>> With the discussion[1], I found it seems there are every PageFlags
> >>> functions return bool at this moment so we don't need double
> >>> negation any more.
> >>> Although it's not a problem to keep it, it makes future users
> >>> confused to use dobule negation for them, too.
> >>>
> >>> Remove such possibility.
> >>
> >> A quick search of '!!Page' in the source tree does not show any other
> >> place having this double negation. So I guess this is all which need
> >> to be fixed.
> > 
> > Yeb. That's the why my patch includes only khugepagd part but my
> > concern is PageFlags returns int type not boolean so user might
> > be confused easily and tempted to use dobule negation.
> > 
> > Other side is they who create new custom PageXXX(e.g., PageMovable)
> > should keep it in mind that they should return 0 or 1 although
> > fucntion prototype's return value is int type.
> 
> > It shouldn't be
> > documented nowhere.
> 
> Was this double negation intentional? :P

Nice catch!
It seems you have a crystal ball. ;-)

> 
> > Although we can add a little description
> > somewhere in page-flags.h, I believe changing to boolean is more
> > clear/not-error-prone so Chen's work is enough worth, I think.
> 
> Agree, unless some arches benefit from the int by performance
> for some reason (no idea if it's possible).
> 
> Anyway, to your original patch:
> 
> Acked-by: Vlastimil Babka 

Thanks!


[RESEND PATCH v3 0/7] PCI: dwc: Miscellaneous fixes and cleanups

2017-03-08 Thread Kishon Vijay Abraham I
Resending since it bounced from quite a few lists.

This should be the final set of cleanups/fixes before endpoint
support can be merged.

Keerthy's patch is a general fix in dra7xx driver and is not
directly related to endpoint mode.

This v1 of this series was previously sent with a different
cover letter $subject [1]

Changes from v2:
*) Kconfig changes that was spilled into a patch is removed.
*) In addition to renaming _unroll() to _ob_unroll(), all the
   _unroll configurations is also moved a separate function.

Changes from v1:
*) included a patch to rename _unroll() to _ob_unroll() as
   similar thing has to be done for inbound window in the case
   of EP mode.
*) used 'size_t' instead of 'int' for specifying the size
   in read_dbi/write_dbi function arguments.
*) Populate cpu_addr_fixup ops for artpec6 as suggested by
   Niklas

This series is based on 4.11-rc1

[1] -> https://lkml.org/lkml/2017/2/16/270

Keerthy (1):
  PCI: dwc: dra7xx: Push request_irq call to the bottom of probe

Kishon Vijay Abraham I (6):
  PCI: dwc: designware: Add new *ops* for cpu addr fixup
  PCI: dwc: dra7xx: Populate cpu_addr_fixup ops
  PCI: dwc: artpec6: Populate cpu_addr_fixup ops
  PCI: dwc: all: Modify dbi accessors to take dbi_base as argument
  PCI: dwc: all: Modify dbi accessors to access data of 4/2/1  bytes
  PCI: dwc: designware: Move _unroll configurations to a separate
function

 drivers/pci/dwc/pci-dra7xx.c   |   35 +++
 drivers/pci/dwc/pci-exynos.c   |   14 +--
 drivers/pci/dwc/pci-imx6.c |   62 +++--
 drivers/pci/dwc/pci-keystone-dw.c  |   16 ++--
 drivers/pci/dwc/pcie-armada8k.c|   39 
 drivers/pci/dwc/pcie-artpec6.c |   22 +++--
 drivers/pci/dwc/pcie-designware-host.c |   20 ++--
 drivers/pci/dwc/pcie-designware.c  |  156 +---
 drivers/pci/dwc/pcie-designware.h  |   13 ++-
 drivers/pci/dwc/pcie-hisi.c|   17 ++--
 10 files changed, 241 insertions(+), 153 deletions(-)

-- 
1.7.9.5



[RESEND PATCH v3 0/7] PCI: dwc: Miscellaneous fixes and cleanups

2017-03-08 Thread Kishon Vijay Abraham I
Resending since it bounced from quite a few lists.

This should be the final set of cleanups/fixes before endpoint
support can be merged.

Keerthy's patch is a general fix in dra7xx driver and is not
directly related to endpoint mode.

This v1 of this series was previously sent with a different
cover letter $subject [1]

Changes from v2:
*) Kconfig changes that was spilled into a patch is removed.
*) In addition to renaming _unroll() to _ob_unroll(), all the
   _unroll configurations is also moved a separate function.

Changes from v1:
*) included a patch to rename _unroll() to _ob_unroll() as
   similar thing has to be done for inbound window in the case
   of EP mode.
*) used 'size_t' instead of 'int' for specifying the size
   in read_dbi/write_dbi function arguments.
*) Populate cpu_addr_fixup ops for artpec6 as suggested by
   Niklas

This series is based on 4.11-rc1

[1] -> https://lkml.org/lkml/2017/2/16/270

Keerthy (1):
  PCI: dwc: dra7xx: Push request_irq call to the bottom of probe

Kishon Vijay Abraham I (6):
  PCI: dwc: designware: Add new *ops* for cpu addr fixup
  PCI: dwc: dra7xx: Populate cpu_addr_fixup ops
  PCI: dwc: artpec6: Populate cpu_addr_fixup ops
  PCI: dwc: all: Modify dbi accessors to take dbi_base as argument
  PCI: dwc: all: Modify dbi accessors to access data of 4/2/1  bytes
  PCI: dwc: designware: Move _unroll configurations to a separate
function

 drivers/pci/dwc/pci-dra7xx.c   |   35 +++
 drivers/pci/dwc/pci-exynos.c   |   14 +--
 drivers/pci/dwc/pci-imx6.c |   62 +++--
 drivers/pci/dwc/pci-keystone-dw.c  |   16 ++--
 drivers/pci/dwc/pcie-armada8k.c|   39 
 drivers/pci/dwc/pcie-artpec6.c |   22 +++--
 drivers/pci/dwc/pcie-designware-host.c |   20 ++--
 drivers/pci/dwc/pcie-designware.c  |  156 +---
 drivers/pci/dwc/pcie-designware.h  |   13 ++-
 drivers/pci/dwc/pcie-hisi.c|   17 ++--
 10 files changed, 241 insertions(+), 153 deletions(-)

-- 
1.7.9.5



[PATCH v2] serial: 8250_dw: Honor clk_round_rate errors in dw8250_set_termios

2017-03-08 Thread Heiko Stuebner
clk_round_rate returns a signed long and may possibly return errors
in it, for example if there is no possible rate.

Till now dw8250_set_termios ignored any error, the signednes and would
just use the value as input to clk_set_rate. This of course falls apart
if there is an actual error, so check for errors and only try to set
a rate if the value is actually valid.

This turned up on some Rockchip platforms after commit
6a171b299379 ("serial: 8250_dw: Allow hardware flow control to be used")
enabled set_termios callback in all cases, not only ACPI.

Fixes: 6a171b299379 ("serial: 8250_dw: Allow hardware flow control to be used")
Signed-off-by: Heiko Stuebner 
Reviewed-by: Andy Shevchenko 
---
There is also another patch floating around, fixing a separate issue
on top of this one: "serial: 8250_dw: Fix breakage when HAVE_CLK=n"

changes in v2:
- adapt commit message to make it more explicit, that this is a
  somewhat critical fix
- add Andy's Reviewed-by

 drivers/tty/serial/8250/8250_dw.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_dw.c 
b/drivers/tty/serial/8250/8250_dw.c
index 6ee55a2d47bb..223ac234ddb2 100644
--- a/drivers/tty/serial/8250/8250_dw.c
+++ b/drivers/tty/serial/8250/8250_dw.c
@@ -257,7 +257,7 @@ static void dw8250_set_termios(struct uart_port *p, struct 
ktermios *termios,
 {
unsigned int baud = tty_termios_baud_rate(termios);
struct dw8250_data *d = p->private_data;
-   unsigned int rate;
+   long rate;
int ret;
 
if (IS_ERR(d->clk) || !old)
@@ -265,7 +265,10 @@ static void dw8250_set_termios(struct uart_port *p, struct 
ktermios *termios,
 
clk_disable_unprepare(d->clk);
rate = clk_round_rate(d->clk, baud * 16);
-   ret = clk_set_rate(d->clk, rate);
+   if (rate < 0)
+   ret = rate;
+   else
+   ret = clk_set_rate(d->clk, rate);
clk_prepare_enable(d->clk);
 
if (!ret)
-- 
2.11.0



[PATCH v2] serial: 8250_dw: Honor clk_round_rate errors in dw8250_set_termios

2017-03-08 Thread Heiko Stuebner
clk_round_rate returns a signed long and may possibly return errors
in it, for example if there is no possible rate.

Till now dw8250_set_termios ignored any error, the signednes and would
just use the value as input to clk_set_rate. This of course falls apart
if there is an actual error, so check for errors and only try to set
a rate if the value is actually valid.

This turned up on some Rockchip platforms after commit
6a171b299379 ("serial: 8250_dw: Allow hardware flow control to be used")
enabled set_termios callback in all cases, not only ACPI.

Fixes: 6a171b299379 ("serial: 8250_dw: Allow hardware flow control to be used")
Signed-off-by: Heiko Stuebner 
Reviewed-by: Andy Shevchenko 
---
There is also another patch floating around, fixing a separate issue
on top of this one: "serial: 8250_dw: Fix breakage when HAVE_CLK=n"

changes in v2:
- adapt commit message to make it more explicit, that this is a
  somewhat critical fix
- add Andy's Reviewed-by

 drivers/tty/serial/8250/8250_dw.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_dw.c 
b/drivers/tty/serial/8250/8250_dw.c
index 6ee55a2d47bb..223ac234ddb2 100644
--- a/drivers/tty/serial/8250/8250_dw.c
+++ b/drivers/tty/serial/8250/8250_dw.c
@@ -257,7 +257,7 @@ static void dw8250_set_termios(struct uart_port *p, struct 
ktermios *termios,
 {
unsigned int baud = tty_termios_baud_rate(termios);
struct dw8250_data *d = p->private_data;
-   unsigned int rate;
+   long rate;
int ret;
 
if (IS_ERR(d->clk) || !old)
@@ -265,7 +265,10 @@ static void dw8250_set_termios(struct uart_port *p, struct 
ktermios *termios,
 
clk_disable_unprepare(d->clk);
rate = clk_round_rate(d->clk, baud * 16);
-   ret = clk_set_rate(d->clk, rate);
+   if (rate < 0)
+   ret = rate;
+   else
+   ret = clk_set_rate(d->clk, rate);
clk_prepare_enable(d->clk);
 
if (!ret)
-- 
2.11.0



[RESEND PATCH v3 2/7] PCI: dwc: dra7xx: Populate cpu_addr_fixup ops

2017-03-08 Thread Kishon Vijay Abraham I
Populate cpu_addr_fixup ops to extract the least 28 bits of the
corresponding cpu address.

Acked-by: Joao Pinto 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pci-dra7xx.c |   11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
index 0984baf..07c45ec 100644
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -88,6 +88,11 @@ static inline void dra7xx_pcie_writel(struct dra7xx_pcie 
*pcie, u32 offset,
writel(value, pcie->base + offset);
 }
 
+static u64 dra7xx_pcie_cpu_addr_fixup(u64 pci_addr)
+{
+   return pci_addr & DRA7XX_CPU_TO_BUS_ADDR;
+}
+
 static int dra7xx_pcie_link_up(struct dw_pcie *pci)
 {
struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci);
@@ -152,11 +157,6 @@ static void dra7xx_pcie_host_init(struct pcie_port *pp)
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci);
 
-   pp->io_base &= DRA7XX_CPU_TO_BUS_ADDR;
-   pp->mem_base &= DRA7XX_CPU_TO_BUS_ADDR;
-   pp->cfg0_base &= DRA7XX_CPU_TO_BUS_ADDR;
-   pp->cfg1_base &= DRA7XX_CPU_TO_BUS_ADDR;
-
dw_pcie_setup_rc(pp);
 
dra7xx_pcie_establish_link(dra7xx);
@@ -329,6 +329,7 @@ static int __init dra7xx_add_pcie_port(struct dra7xx_pcie 
*dra7xx,
 }
 
 static const struct dw_pcie_ops dw_pcie_ops = {
+   .cpu_addr_fixup = dra7xx_pcie_cpu_addr_fixup,
.link_up = dra7xx_pcie_link_up,
 };
 
-- 
1.7.9.5



[RESEND PATCH v3 2/7] PCI: dwc: dra7xx: Populate cpu_addr_fixup ops

2017-03-08 Thread Kishon Vijay Abraham I
Populate cpu_addr_fixup ops to extract the least 28 bits of the
corresponding cpu address.

Acked-by: Joao Pinto 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pci-dra7xx.c |   11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
index 0984baf..07c45ec 100644
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -88,6 +88,11 @@ static inline void dra7xx_pcie_writel(struct dra7xx_pcie 
*pcie, u32 offset,
writel(value, pcie->base + offset);
 }
 
+static u64 dra7xx_pcie_cpu_addr_fixup(u64 pci_addr)
+{
+   return pci_addr & DRA7XX_CPU_TO_BUS_ADDR;
+}
+
 static int dra7xx_pcie_link_up(struct dw_pcie *pci)
 {
struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci);
@@ -152,11 +157,6 @@ static void dra7xx_pcie_host_init(struct pcie_port *pp)
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci);
 
-   pp->io_base &= DRA7XX_CPU_TO_BUS_ADDR;
-   pp->mem_base &= DRA7XX_CPU_TO_BUS_ADDR;
-   pp->cfg0_base &= DRA7XX_CPU_TO_BUS_ADDR;
-   pp->cfg1_base &= DRA7XX_CPU_TO_BUS_ADDR;
-
dw_pcie_setup_rc(pp);
 
dra7xx_pcie_establish_link(dra7xx);
@@ -329,6 +329,7 @@ static int __init dra7xx_add_pcie_port(struct dra7xx_pcie 
*dra7xx,
 }
 
 static const struct dw_pcie_ops dw_pcie_ops = {
+   .cpu_addr_fixup = dra7xx_pcie_cpu_addr_fixup,
.link_up = dra7xx_pcie_link_up,
 };
 
-- 
1.7.9.5



[RESEND PATCH v3 4/7] PCI: dwc: all: Modify dbi accessors to take dbi_base as argument

2017-03-08 Thread Kishon Vijay Abraham I
dwc has 2 dbi address space labeled dbics and dbics2. The existing
helper to access dbi address space can access only dbics. However
dbics2 has to be accessed for programming the BAR registers in the
case of EP mode. This is in preparation for adding EP mode support
to dwc driver.

Cc: Jingoo Han 
Cc: Richard Zhu 
Cc: Lucas Stach 
Cc: Murali Karicheri 
Cc: Thomas Petazzoni 
Cc: Niklas Cassel 
Cc: Jesper Nilsson 
Cc: Joao Pinto 
Cc: Zhou Wang 
Cc: Gabriele Paoloni 
Acked-by: Joao Pinto 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pci-dra7xx.c   |   10 +++--
 drivers/pci/dwc/pci-exynos.c   |   10 +++--
 drivers/pci/dwc/pci-imx6.c |   62 +++---
 drivers/pci/dwc/pci-keystone-dw.c  |   15 ---
 drivers/pci/dwc/pcie-armada8k.c|   39 +---
 drivers/pci/dwc/pcie-artpec6.c |7 +--
 drivers/pci/dwc/pcie-designware-host.c |   20 +
 drivers/pci/dwc/pcie-designware.c  |   76 ++--
 drivers/pci/dwc/pcie-designware.h  |   10 +++--
 drivers/pci/dwc/pcie-hisi.c|   17 ---
 10 files changed, 152 insertions(+), 114 deletions(-)

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
index 07c45ec..3708bd6 100644
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -495,12 +495,13 @@ static int dra7xx_pcie_suspend(struct device *dev)
 {
struct dra7xx_pcie *dra7xx = dev_get_drvdata(dev);
struct dw_pcie *pci = dra7xx->pci;
+   void __iomem *base = pci->dbi_base;
u32 val;
 
/* clear MSE */
-   val = dw_pcie_readl_dbi(pci, PCI_COMMAND);
+   val = dw_pcie_readl_dbi(pci, base, PCI_COMMAND);
val &= ~PCI_COMMAND_MEMORY;
-   dw_pcie_writel_dbi(pci, PCI_COMMAND, val);
+   dw_pcie_writel_dbi(pci, base, PCI_COMMAND, val);
 
return 0;
 }
@@ -509,12 +510,13 @@ static int dra7xx_pcie_resume(struct device *dev)
 {
struct dra7xx_pcie *dra7xx = dev_get_drvdata(dev);
struct dw_pcie *pci = dra7xx->pci;
+   void __iomem *base = pci->dbi_base;
u32 val;
 
/* set MSE */
-   val = dw_pcie_readl_dbi(pci, PCI_COMMAND);
+   val = dw_pcie_readl_dbi(pci, base, PCI_COMMAND);
val |= PCI_COMMAND_MEMORY;
-   dw_pcie_writel_dbi(pci, PCI_COMMAND, val);
+   dw_pcie_writel_dbi(pci, base, PCI_COMMAND, val);
 
return 0;
 }
diff --git a/drivers/pci/dwc/pci-exynos.c b/drivers/pci/dwc/pci-exynos.c
index 993b650..a0d40f7 100644
--- a/drivers/pci/dwc/pci-exynos.c
+++ b/drivers/pci/dwc/pci-exynos.c
@@ -521,23 +521,25 @@ static void exynos_pcie_enable_interrupts(struct 
exynos_pcie *ep)
exynos_pcie_msi_init(ep);
 }
 
-static u32 exynos_pcie_readl_dbi(struct dw_pcie *pci, u32 reg)
+static u32 exynos_pcie_readl_dbi(struct dw_pcie *pci, void __iomem *base,
+u32 reg)
 {
struct exynos_pcie *ep = to_exynos_pcie(pci);
u32 val;
 
exynos_pcie_sideband_dbi_r_mode(ep, true);
-   val = readl(pci->dbi_base + reg);
+   val = readl(base + reg);
exynos_pcie_sideband_dbi_r_mode(ep, false);
return val;
 }
 
-static void exynos_pcie_writel_dbi(struct dw_pcie *pci, u32 reg, u32 val)
+static void exynos_pcie_writel_dbi(struct dw_pcie *pci, void __iomem *base,
+  u32 reg, u32 val)
 {
struct exynos_pcie *ep = to_exynos_pcie(pci);
 
exynos_pcie_sideband_dbi_w_mode(ep, true);
-   writel(val, pci->dbi_base + reg);
+   writel(val, base + reg);
exynos_pcie_sideband_dbi_w_mode(ep, false);
 }
 
diff --git a/drivers/pci/dwc/pci-imx6.c b/drivers/pci/dwc/pci-imx6.c
index 801e46c..85dd901 100644
--- a/drivers/pci/dwc/pci-imx6.c
+++ b/drivers/pci/dwc/pci-imx6.c
@@ -98,12 +98,13 @@ struct imx6_pcie {
 static int pcie_phy_poll_ack(struct imx6_pcie *imx6_pcie, int exp_val)
 {
struct dw_pcie *pci = imx6_pcie->pci;
+   void __iomem *base = pci->dbi_base;
u32 val;
u32 max_iterations = 10;
u32 wait_counter = 0;
 
do {
-   val = dw_pcie_readl_dbi(pci, PCIE_PHY_STAT);
+   val = dw_pcie_readl_dbi(pci, base, PCIE_PHY_STAT);
val = (val >> PCIE_PHY_STAT_ACK_LOC) & 0x1;
wait_counter++;
 
@@ -119,21 +120,22 @@ static int pcie_phy_poll_ack(struct imx6_pcie *imx6_pcie, 
int exp_val)
 static int pcie_phy_wait_ack(struct imx6_pcie *imx6_pcie, int addr)
 {
struct dw_pcie *pci = imx6_pcie->pci;
+   void __iomem *base = pci->dbi_base;
u32 val;
int ret;
 
val = addr << PCIE_PHY_CTRL_DATA_LOC;
-   dw_pcie_writel_dbi(pci, 

[RESEND PATCH v3 4/7] PCI: dwc: all: Modify dbi accessors to take dbi_base as argument

2017-03-08 Thread Kishon Vijay Abraham I
dwc has 2 dbi address space labeled dbics and dbics2. The existing
helper to access dbi address space can access only dbics. However
dbics2 has to be accessed for programming the BAR registers in the
case of EP mode. This is in preparation for adding EP mode support
to dwc driver.

Cc: Jingoo Han 
Cc: Richard Zhu 
Cc: Lucas Stach 
Cc: Murali Karicheri 
Cc: Thomas Petazzoni 
Cc: Niklas Cassel 
Cc: Jesper Nilsson 
Cc: Joao Pinto 
Cc: Zhou Wang 
Cc: Gabriele Paoloni 
Acked-by: Joao Pinto 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pci-dra7xx.c   |   10 +++--
 drivers/pci/dwc/pci-exynos.c   |   10 +++--
 drivers/pci/dwc/pci-imx6.c |   62 +++---
 drivers/pci/dwc/pci-keystone-dw.c  |   15 ---
 drivers/pci/dwc/pcie-armada8k.c|   39 +---
 drivers/pci/dwc/pcie-artpec6.c |7 +--
 drivers/pci/dwc/pcie-designware-host.c |   20 +
 drivers/pci/dwc/pcie-designware.c  |   76 ++--
 drivers/pci/dwc/pcie-designware.h  |   10 +++--
 drivers/pci/dwc/pcie-hisi.c|   17 ---
 10 files changed, 152 insertions(+), 114 deletions(-)

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
index 07c45ec..3708bd6 100644
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -495,12 +495,13 @@ static int dra7xx_pcie_suspend(struct device *dev)
 {
struct dra7xx_pcie *dra7xx = dev_get_drvdata(dev);
struct dw_pcie *pci = dra7xx->pci;
+   void __iomem *base = pci->dbi_base;
u32 val;
 
/* clear MSE */
-   val = dw_pcie_readl_dbi(pci, PCI_COMMAND);
+   val = dw_pcie_readl_dbi(pci, base, PCI_COMMAND);
val &= ~PCI_COMMAND_MEMORY;
-   dw_pcie_writel_dbi(pci, PCI_COMMAND, val);
+   dw_pcie_writel_dbi(pci, base, PCI_COMMAND, val);
 
return 0;
 }
@@ -509,12 +510,13 @@ static int dra7xx_pcie_resume(struct device *dev)
 {
struct dra7xx_pcie *dra7xx = dev_get_drvdata(dev);
struct dw_pcie *pci = dra7xx->pci;
+   void __iomem *base = pci->dbi_base;
u32 val;
 
/* set MSE */
-   val = dw_pcie_readl_dbi(pci, PCI_COMMAND);
+   val = dw_pcie_readl_dbi(pci, base, PCI_COMMAND);
val |= PCI_COMMAND_MEMORY;
-   dw_pcie_writel_dbi(pci, PCI_COMMAND, val);
+   dw_pcie_writel_dbi(pci, base, PCI_COMMAND, val);
 
return 0;
 }
diff --git a/drivers/pci/dwc/pci-exynos.c b/drivers/pci/dwc/pci-exynos.c
index 993b650..a0d40f7 100644
--- a/drivers/pci/dwc/pci-exynos.c
+++ b/drivers/pci/dwc/pci-exynos.c
@@ -521,23 +521,25 @@ static void exynos_pcie_enable_interrupts(struct 
exynos_pcie *ep)
exynos_pcie_msi_init(ep);
 }
 
-static u32 exynos_pcie_readl_dbi(struct dw_pcie *pci, u32 reg)
+static u32 exynos_pcie_readl_dbi(struct dw_pcie *pci, void __iomem *base,
+u32 reg)
 {
struct exynos_pcie *ep = to_exynos_pcie(pci);
u32 val;
 
exynos_pcie_sideband_dbi_r_mode(ep, true);
-   val = readl(pci->dbi_base + reg);
+   val = readl(base + reg);
exynos_pcie_sideband_dbi_r_mode(ep, false);
return val;
 }
 
-static void exynos_pcie_writel_dbi(struct dw_pcie *pci, u32 reg, u32 val)
+static void exynos_pcie_writel_dbi(struct dw_pcie *pci, void __iomem *base,
+  u32 reg, u32 val)
 {
struct exynos_pcie *ep = to_exynos_pcie(pci);
 
exynos_pcie_sideband_dbi_w_mode(ep, true);
-   writel(val, pci->dbi_base + reg);
+   writel(val, base + reg);
exynos_pcie_sideband_dbi_w_mode(ep, false);
 }
 
diff --git a/drivers/pci/dwc/pci-imx6.c b/drivers/pci/dwc/pci-imx6.c
index 801e46c..85dd901 100644
--- a/drivers/pci/dwc/pci-imx6.c
+++ b/drivers/pci/dwc/pci-imx6.c
@@ -98,12 +98,13 @@ struct imx6_pcie {
 static int pcie_phy_poll_ack(struct imx6_pcie *imx6_pcie, int exp_val)
 {
struct dw_pcie *pci = imx6_pcie->pci;
+   void __iomem *base = pci->dbi_base;
u32 val;
u32 max_iterations = 10;
u32 wait_counter = 0;
 
do {
-   val = dw_pcie_readl_dbi(pci, PCIE_PHY_STAT);
+   val = dw_pcie_readl_dbi(pci, base, PCIE_PHY_STAT);
val = (val >> PCIE_PHY_STAT_ACK_LOC) & 0x1;
wait_counter++;
 
@@ -119,21 +120,22 @@ static int pcie_phy_poll_ack(struct imx6_pcie *imx6_pcie, 
int exp_val)
 static int pcie_phy_wait_ack(struct imx6_pcie *imx6_pcie, int addr)
 {
struct dw_pcie *pci = imx6_pcie->pci;
+   void __iomem *base = pci->dbi_base;
u32 val;
int ret;
 
val = addr << PCIE_PHY_CTRL_DATA_LOC;
-   dw_pcie_writel_dbi(pci, PCIE_PHY_CTRL, val);
+   dw_pcie_writel_dbi(pci, base, PCIE_PHY_CTRL, val);
 
val |= (0x1 << PCIE_PHY_CTRL_CAP_ADR_LOC);
-   dw_pcie_writel_dbi(pci, PCIE_PHY_CTRL, val);
+   dw_pcie_writel_dbi(pci, base, PCIE_PHY_CTRL, val);
 
ret = pcie_phy_poll_ack(imx6_pcie, 

[RESEND PATCH v3 1/7] PCI: dwc: designware: Add new *ops* for cpu addr fixup

2017-03-08 Thread Kishon Vijay Abraham I
Some platforms (like dra7xx) require only the least 28 bits of the
corresponding 32 bit CPU address to be programmed in the address
translation unit. This modified address is stored in io_base/mem_base/
cfg0_base/cfg1_base in dra7xx_pcie_host_init. While this is okay for
host mode where the address range is fixed, device mode requires
different addresses to be programmed based on the host buffer address.
Add a new ops to get the least 28 bits of the corresponding 32 bit
CPU address and invoke it before programming the address translation
unit.

Acked-by: Joao Pinto 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pcie-designware.c |3 +++
 drivers/pci/dwc/pcie-designware.h |1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/pci/dwc/pcie-designware.c 
b/drivers/pci/dwc/pcie-designware.c
index 7e1fb7d..14ee7a3 100644
--- a/drivers/pci/dwc/pcie-designware.c
+++ b/drivers/pci/dwc/pcie-designware.c
@@ -97,6 +97,9 @@ void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int 
index, int type,
 {
u32 retries, val;
 
+   if (pp->ops->cpu_addr_fixup)
+   cpu_addr = pp->ops->cpu_addr_fixup(cpu_addr);
+
if (pci->iatu_unroll_enabled) {
dw_pcie_writel_unroll(pci, index, PCIE_ATU_UNR_LOWER_BASE,
  lower_32_bits(cpu_addr));
diff --git a/drivers/pci/dwc/pcie-designware.h 
b/drivers/pci/dwc/pcie-designware.h
index cd3b871..8f3dcb2 100644
--- a/drivers/pci/dwc/pcie-designware.h
+++ b/drivers/pci/dwc/pcie-designware.h
@@ -143,6 +143,7 @@ struct pcie_port {
 };
 
 struct dw_pcie_ops {
+   u64 (*cpu_addr_fixup)(u64 cpu_addr);
u32 (*readl_dbi)(struct dw_pcie *pcie, u32 reg);
void(*writel_dbi)(struct dw_pcie *pcie, u32 reg, u32 val);
int (*link_up)(struct dw_pcie *pcie);
-- 
1.7.9.5



[RESEND PATCH v3 3/7] PCI: dwc: artpec6: Populate cpu_addr_fixup ops

2017-03-08 Thread Kishon Vijay Abraham I
Populate cpu_addr_fixup ops to extract the least 28 bits of the
corresponding cpu address.

Cc: Niklas Cassel 
Acked-by: Joao Pinto 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pcie-artpec6.c |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c
index fcd3ef8..5b3b3af 100644
--- a/drivers/pci/dwc/pcie-artpec6.c
+++ b/drivers/pci/dwc/pcie-artpec6.c
@@ -78,6 +78,11 @@ static void artpec6_pcie_writel(struct artpec6_pcie 
*artpec6_pcie, u32 offset, u
regmap_write(artpec6_pcie->regmap, offset, val);
 }
 
+static u64 artpec6_pcie_cpu_addr_fixup(u64 pci_addr)
+{
+   return pci_addr & ARTPEC6_CPU_TO_BUS_ADDR;
+}
+
 static int artpec6_pcie_establish_link(struct artpec6_pcie *artpec6_pcie)
 {
struct dw_pcie *pci = artpec6_pcie->pci;
@@ -142,11 +147,6 @@ static int artpec6_pcie_establish_link(struct artpec6_pcie 
*artpec6_pcie)
 */
dw_pcie_writel_dbi(pci, MISC_CONTROL_1_OFF, DBI_RO_WR_EN);
 
-   pp->io_base &= ARTPEC6_CPU_TO_BUS_ADDR;
-   pp->mem_base &= ARTPEC6_CPU_TO_BUS_ADDR;
-   pp->cfg0_base &= ARTPEC6_CPU_TO_BUS_ADDR;
-   pp->cfg1_base &= ARTPEC6_CPU_TO_BUS_ADDR;
-
/* setup root complex */
dw_pcie_setup_rc(pp);
 
@@ -234,6 +234,10 @@ static int artpec6_add_pcie_port(struct artpec6_pcie 
*artpec6_pcie,
return 0;
 }
 
+static const struct dw_pcie_ops dw_pcie_ops = {
+   .cpu_addr_fixup = artpec6_pcie_cpu_addr_fixup,
+};
+
 static int artpec6_pcie_probe(struct platform_device *pdev)
 {
struct device *dev = >dev;
@@ -252,6 +256,7 @@ static int artpec6_pcie_probe(struct platform_device *pdev)
return -ENOMEM;
 
pci->dev = dev;
+   pci->ops = _pcie_ops;
 
artpec6_pcie->pci = pci;
 
-- 
1.7.9.5



[RESEND PATCH v3 1/7] PCI: dwc: designware: Add new *ops* for cpu addr fixup

2017-03-08 Thread Kishon Vijay Abraham I
Some platforms (like dra7xx) require only the least 28 bits of the
corresponding 32 bit CPU address to be programmed in the address
translation unit. This modified address is stored in io_base/mem_base/
cfg0_base/cfg1_base in dra7xx_pcie_host_init. While this is okay for
host mode where the address range is fixed, device mode requires
different addresses to be programmed based on the host buffer address.
Add a new ops to get the least 28 bits of the corresponding 32 bit
CPU address and invoke it before programming the address translation
unit.

Acked-by: Joao Pinto 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pcie-designware.c |3 +++
 drivers/pci/dwc/pcie-designware.h |1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/pci/dwc/pcie-designware.c 
b/drivers/pci/dwc/pcie-designware.c
index 7e1fb7d..14ee7a3 100644
--- a/drivers/pci/dwc/pcie-designware.c
+++ b/drivers/pci/dwc/pcie-designware.c
@@ -97,6 +97,9 @@ void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int 
index, int type,
 {
u32 retries, val;
 
+   if (pp->ops->cpu_addr_fixup)
+   cpu_addr = pp->ops->cpu_addr_fixup(cpu_addr);
+
if (pci->iatu_unroll_enabled) {
dw_pcie_writel_unroll(pci, index, PCIE_ATU_UNR_LOWER_BASE,
  lower_32_bits(cpu_addr));
diff --git a/drivers/pci/dwc/pcie-designware.h 
b/drivers/pci/dwc/pcie-designware.h
index cd3b871..8f3dcb2 100644
--- a/drivers/pci/dwc/pcie-designware.h
+++ b/drivers/pci/dwc/pcie-designware.h
@@ -143,6 +143,7 @@ struct pcie_port {
 };
 
 struct dw_pcie_ops {
+   u64 (*cpu_addr_fixup)(u64 cpu_addr);
u32 (*readl_dbi)(struct dw_pcie *pcie, u32 reg);
void(*writel_dbi)(struct dw_pcie *pcie, u32 reg, u32 val);
int (*link_up)(struct dw_pcie *pcie);
-- 
1.7.9.5



[RESEND PATCH v3 3/7] PCI: dwc: artpec6: Populate cpu_addr_fixup ops

2017-03-08 Thread Kishon Vijay Abraham I
Populate cpu_addr_fixup ops to extract the least 28 bits of the
corresponding cpu address.

Cc: Niklas Cassel 
Acked-by: Joao Pinto 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pcie-artpec6.c |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/dwc/pcie-artpec6.c b/drivers/pci/dwc/pcie-artpec6.c
index fcd3ef8..5b3b3af 100644
--- a/drivers/pci/dwc/pcie-artpec6.c
+++ b/drivers/pci/dwc/pcie-artpec6.c
@@ -78,6 +78,11 @@ static void artpec6_pcie_writel(struct artpec6_pcie 
*artpec6_pcie, u32 offset, u
regmap_write(artpec6_pcie->regmap, offset, val);
 }
 
+static u64 artpec6_pcie_cpu_addr_fixup(u64 pci_addr)
+{
+   return pci_addr & ARTPEC6_CPU_TO_BUS_ADDR;
+}
+
 static int artpec6_pcie_establish_link(struct artpec6_pcie *artpec6_pcie)
 {
struct dw_pcie *pci = artpec6_pcie->pci;
@@ -142,11 +147,6 @@ static int artpec6_pcie_establish_link(struct artpec6_pcie 
*artpec6_pcie)
 */
dw_pcie_writel_dbi(pci, MISC_CONTROL_1_OFF, DBI_RO_WR_EN);
 
-   pp->io_base &= ARTPEC6_CPU_TO_BUS_ADDR;
-   pp->mem_base &= ARTPEC6_CPU_TO_BUS_ADDR;
-   pp->cfg0_base &= ARTPEC6_CPU_TO_BUS_ADDR;
-   pp->cfg1_base &= ARTPEC6_CPU_TO_BUS_ADDR;
-
/* setup root complex */
dw_pcie_setup_rc(pp);
 
@@ -234,6 +234,10 @@ static int artpec6_add_pcie_port(struct artpec6_pcie 
*artpec6_pcie,
return 0;
 }
 
+static const struct dw_pcie_ops dw_pcie_ops = {
+   .cpu_addr_fixup = artpec6_pcie_cpu_addr_fixup,
+};
+
 static int artpec6_pcie_probe(struct platform_device *pdev)
 {
struct device *dev = >dev;
@@ -252,6 +256,7 @@ static int artpec6_pcie_probe(struct platform_device *pdev)
return -ENOMEM;
 
pci->dev = dev;
+   pci->ops = _pcie_ops;
 
artpec6_pcie->pci = pci;
 
-- 
1.7.9.5



[RESEND PATCH v3 5/7] PCI: dwc: all: Modify dbi accessors to access data of 4/2/1 bytes

2017-03-08 Thread Kishon Vijay Abraham I
Previously dbi accessors can be used to access data of size 4
bytes. But there might be situations (like accessing
MSI_MESSAGE_CONTROL in order to set/get the number of required
MSI interrupts in EP mode) where dbi accessors must
be used to access data of size 2. This is in preparation for
adding endpoint mode support to designware driver.

Cc: Jingoo Han 
Cc: Richard Zhu 
Cc: Lucas Stach 
Cc: Murali Karicheri 
Cc: Thomas Petazzoni 
Cc: Niklas Cassel 
Cc: Jesper Nilsson 
Cc: Joao Pinto 
Cc: Zhou Wang 
Cc: Gabriele Paoloni 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pci-dra7xx.c   |8 ++--
 drivers/pci/dwc/pci-exynos.c   |   16 +++
 drivers/pci/dwc/pci-imx6.c |   54 +++---
 drivers/pci/dwc/pci-keystone-dw.c  |   13 +++---
 drivers/pci/dwc/pcie-armada8k.c|   38 
 drivers/pci/dwc/pcie-artpec6.c |6 +--
 drivers/pci/dwc/pcie-designware-host.c |   18 
 drivers/pci/dwc/pcie-designware.c  |   77 +++-
 drivers/pci/dwc/pcie-designware.h  |   14 +++---
 drivers/pci/dwc/pcie-hisi.c|   14 +++---
 10 files changed, 138 insertions(+), 120 deletions(-)

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
index 3708bd6..c6fef0a 100644
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -499,9 +499,9 @@ static int dra7xx_pcie_suspend(struct device *dev)
u32 val;
 
/* clear MSE */
-   val = dw_pcie_readl_dbi(pci, base, PCI_COMMAND);
+   val = dw_pcie_read_dbi(pci, base, PCI_COMMAND, 0x4);
val &= ~PCI_COMMAND_MEMORY;
-   dw_pcie_writel_dbi(pci, base, PCI_COMMAND, val);
+   dw_pcie_write_dbi(pci, base, PCI_COMMAND, 0x4, val);
 
return 0;
 }
@@ -514,9 +514,9 @@ static int dra7xx_pcie_resume(struct device *dev)
u32 val;
 
/* set MSE */
-   val = dw_pcie_readl_dbi(pci, base, PCI_COMMAND);
+   val = dw_pcie_read_dbi(pci, base, PCI_COMMAND, 0x4);
val |= PCI_COMMAND_MEMORY;
-   dw_pcie_writel_dbi(pci, base, PCI_COMMAND, val);
+   dw_pcie_write_dbi(pci, base, PCI_COMMAND, 0x4, val);
 
return 0;
 }
diff --git a/drivers/pci/dwc/pci-exynos.c b/drivers/pci/dwc/pci-exynos.c
index a0d40f7..37d6d2b 100644
--- a/drivers/pci/dwc/pci-exynos.c
+++ b/drivers/pci/dwc/pci-exynos.c
@@ -521,25 +521,25 @@ static void exynos_pcie_enable_interrupts(struct 
exynos_pcie *ep)
exynos_pcie_msi_init(ep);
 }
 
-static u32 exynos_pcie_readl_dbi(struct dw_pcie *pci, void __iomem *base,
-u32 reg)
+static u32 exynos_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base,
+   u32 reg, size_t size)
 {
struct exynos_pcie *ep = to_exynos_pcie(pci);
u32 val;
 
exynos_pcie_sideband_dbi_r_mode(ep, true);
-   val = readl(base + reg);
+   dw_pcie_read(base + reg, size, );
exynos_pcie_sideband_dbi_r_mode(ep, false);
return val;
 }
 
-static void exynos_pcie_writel_dbi(struct dw_pcie *pci, void __iomem *base,
-  u32 reg, u32 val)
+static void exynos_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base,
+ u32 reg, size_t size, u32 val)
 {
struct exynos_pcie *ep = to_exynos_pcie(pci);
 
exynos_pcie_sideband_dbi_w_mode(ep, true);
-   writel(val, base + reg);
+   dw_pcie_write(base + reg, size, val);
exynos_pcie_sideband_dbi_w_mode(ep, false);
 }
 
@@ -646,8 +646,8 @@ static int __init exynos_add_pcie_port(struct exynos_pcie 
*ep,
 }
 
 static const struct dw_pcie_ops dw_pcie_ops = {
-   .readl_dbi = exynos_pcie_readl_dbi,
-   .writel_dbi = exynos_pcie_writel_dbi,
+   .read_dbi = exynos_pcie_read_dbi,
+   .write_dbi = exynos_pcie_write_dbi,
.link_up = exynos_pcie_link_up,
 };
 
diff --git a/drivers/pci/dwc/pci-imx6.c b/drivers/pci/dwc/pci-imx6.c
index 85dd901..e58ca7a 100644
--- a/drivers/pci/dwc/pci-imx6.c
+++ b/drivers/pci/dwc/pci-imx6.c
@@ -104,7 +104,7 @@ static int pcie_phy_poll_ack(struct imx6_pcie *imx6_pcie, 
int exp_val)
u32 wait_counter = 0;
 
do {
-   val = dw_pcie_readl_dbi(pci, base, PCIE_PHY_STAT);
+   val = dw_pcie_read_dbi(pci, base, PCIE_PHY_STAT, 0x4);
val = (val >> PCIE_PHY_STAT_ACK_LOC) & 0x1;
wait_counter++;
 
@@ -125,17 +125,17 @@ static int pcie_phy_wait_ack(struct imx6_pcie *imx6_pcie, 
int addr)
int ret;
 
val = addr << PCIE_PHY_CTRL_DATA_LOC;
-   dw_pcie_writel_dbi(pci, base, PCIE_PHY_CTRL, val);
+   dw_pcie_write_dbi(pci, base, PCIE_PHY_CTRL, 0x4, val);
 
 

[RESEND PATCH v3 5/7] PCI: dwc: all: Modify dbi accessors to access data of 4/2/1 bytes

2017-03-08 Thread Kishon Vijay Abraham I
Previously dbi accessors can be used to access data of size 4
bytes. But there might be situations (like accessing
MSI_MESSAGE_CONTROL in order to set/get the number of required
MSI interrupts in EP mode) where dbi accessors must
be used to access data of size 2. This is in preparation for
adding endpoint mode support to designware driver.

Cc: Jingoo Han 
Cc: Richard Zhu 
Cc: Lucas Stach 
Cc: Murali Karicheri 
Cc: Thomas Petazzoni 
Cc: Niklas Cassel 
Cc: Jesper Nilsson 
Cc: Joao Pinto 
Cc: Zhou Wang 
Cc: Gabriele Paoloni 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pci-dra7xx.c   |8 ++--
 drivers/pci/dwc/pci-exynos.c   |   16 +++
 drivers/pci/dwc/pci-imx6.c |   54 +++---
 drivers/pci/dwc/pci-keystone-dw.c  |   13 +++---
 drivers/pci/dwc/pcie-armada8k.c|   38 
 drivers/pci/dwc/pcie-artpec6.c |6 +--
 drivers/pci/dwc/pcie-designware-host.c |   18 
 drivers/pci/dwc/pcie-designware.c  |   77 +++-
 drivers/pci/dwc/pcie-designware.h  |   14 +++---
 drivers/pci/dwc/pcie-hisi.c|   14 +++---
 10 files changed, 138 insertions(+), 120 deletions(-)

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
index 3708bd6..c6fef0a 100644
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -499,9 +499,9 @@ static int dra7xx_pcie_suspend(struct device *dev)
u32 val;
 
/* clear MSE */
-   val = dw_pcie_readl_dbi(pci, base, PCI_COMMAND);
+   val = dw_pcie_read_dbi(pci, base, PCI_COMMAND, 0x4);
val &= ~PCI_COMMAND_MEMORY;
-   dw_pcie_writel_dbi(pci, base, PCI_COMMAND, val);
+   dw_pcie_write_dbi(pci, base, PCI_COMMAND, 0x4, val);
 
return 0;
 }
@@ -514,9 +514,9 @@ static int dra7xx_pcie_resume(struct device *dev)
u32 val;
 
/* set MSE */
-   val = dw_pcie_readl_dbi(pci, base, PCI_COMMAND);
+   val = dw_pcie_read_dbi(pci, base, PCI_COMMAND, 0x4);
val |= PCI_COMMAND_MEMORY;
-   dw_pcie_writel_dbi(pci, base, PCI_COMMAND, val);
+   dw_pcie_write_dbi(pci, base, PCI_COMMAND, 0x4, val);
 
return 0;
 }
diff --git a/drivers/pci/dwc/pci-exynos.c b/drivers/pci/dwc/pci-exynos.c
index a0d40f7..37d6d2b 100644
--- a/drivers/pci/dwc/pci-exynos.c
+++ b/drivers/pci/dwc/pci-exynos.c
@@ -521,25 +521,25 @@ static void exynos_pcie_enable_interrupts(struct 
exynos_pcie *ep)
exynos_pcie_msi_init(ep);
 }
 
-static u32 exynos_pcie_readl_dbi(struct dw_pcie *pci, void __iomem *base,
-u32 reg)
+static u32 exynos_pcie_read_dbi(struct dw_pcie *pci, void __iomem *base,
+   u32 reg, size_t size)
 {
struct exynos_pcie *ep = to_exynos_pcie(pci);
u32 val;
 
exynos_pcie_sideband_dbi_r_mode(ep, true);
-   val = readl(base + reg);
+   dw_pcie_read(base + reg, size, );
exynos_pcie_sideband_dbi_r_mode(ep, false);
return val;
 }
 
-static void exynos_pcie_writel_dbi(struct dw_pcie *pci, void __iomem *base,
-  u32 reg, u32 val)
+static void exynos_pcie_write_dbi(struct dw_pcie *pci, void __iomem *base,
+ u32 reg, size_t size, u32 val)
 {
struct exynos_pcie *ep = to_exynos_pcie(pci);
 
exynos_pcie_sideband_dbi_w_mode(ep, true);
-   writel(val, base + reg);
+   dw_pcie_write(base + reg, size, val);
exynos_pcie_sideband_dbi_w_mode(ep, false);
 }
 
@@ -646,8 +646,8 @@ static int __init exynos_add_pcie_port(struct exynos_pcie 
*ep,
 }
 
 static const struct dw_pcie_ops dw_pcie_ops = {
-   .readl_dbi = exynos_pcie_readl_dbi,
-   .writel_dbi = exynos_pcie_writel_dbi,
+   .read_dbi = exynos_pcie_read_dbi,
+   .write_dbi = exynos_pcie_write_dbi,
.link_up = exynos_pcie_link_up,
 };
 
diff --git a/drivers/pci/dwc/pci-imx6.c b/drivers/pci/dwc/pci-imx6.c
index 85dd901..e58ca7a 100644
--- a/drivers/pci/dwc/pci-imx6.c
+++ b/drivers/pci/dwc/pci-imx6.c
@@ -104,7 +104,7 @@ static int pcie_phy_poll_ack(struct imx6_pcie *imx6_pcie, 
int exp_val)
u32 wait_counter = 0;
 
do {
-   val = dw_pcie_readl_dbi(pci, base, PCIE_PHY_STAT);
+   val = dw_pcie_read_dbi(pci, base, PCIE_PHY_STAT, 0x4);
val = (val >> PCIE_PHY_STAT_ACK_LOC) & 0x1;
wait_counter++;
 
@@ -125,17 +125,17 @@ static int pcie_phy_wait_ack(struct imx6_pcie *imx6_pcie, 
int addr)
int ret;
 
val = addr << PCIE_PHY_CTRL_DATA_LOC;
-   dw_pcie_writel_dbi(pci, base, PCIE_PHY_CTRL, val);
+   dw_pcie_write_dbi(pci, base, PCIE_PHY_CTRL, 0x4, val);
 
val |= (0x1 << PCIE_PHY_CTRL_CAP_ADR_LOC);
-   dw_pcie_writel_dbi(pci, base, PCIE_PHY_CTRL, val);
+   dw_pcie_write_dbi(pci, base, PCIE_PHY_CTRL, 0x4, val);
 
ret = pcie_phy_poll_ack(imx6_pcie, 1);
if (ret)
return ret;
 
 

[RESEND PATCH v3 7/7] PCI: dwc: dra7xx: Push request_irq call to the bottom of probe

2017-03-08 Thread Kishon Vijay Abraham I
From: Keerthy 

Currently devm_request_irq is being called before base, pci fields
of dra7xx_pcie structure are populated. It is called even before
pm_runtime_enable and pm_runtime_get_sync are called. This will
lead to exceptions if in case an interrupt is triggered before
the all of the above are done. Hence push the devm_request_irq
call to the end of the probe.

Signed-off-by: Keerthy 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pci-dra7xx.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
index c6fef0a..8c53233 100644
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -410,13 +410,6 @@ static int __init dra7xx_pcie_probe(struct platform_device 
*pdev)
return -EINVAL;
}
 
-   ret = devm_request_irq(dev, irq, dra7xx_pcie_irq_handler,
-  IRQF_SHARED, "dra7xx-pcie-main", dra7xx);
-   if (ret) {
-   dev_err(dev, "failed to request irq\n");
-   return ret;
-   }
-
res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ti_conf");
base = devm_ioremap_nocache(dev, res->start, resource_size(res));
if (!base)
@@ -478,6 +471,13 @@ static int __init dra7xx_pcie_probe(struct platform_device 
*pdev)
if (ret < 0)
goto err_gpio;
 
+   ret = devm_request_irq(dev, irq, dra7xx_pcie_irq_handler,
+  IRQF_SHARED, "dra7xx-pcie-main", dra7xx);
+   if (ret) {
+   dev_err(dev, "failed to request irq\n");
+   goto err_gpio;
+   }
+
return 0;
 
 err_gpio:
-- 
1.7.9.5



[RESEND PATCH v3 7/7] PCI: dwc: dra7xx: Push request_irq call to the bottom of probe

2017-03-08 Thread Kishon Vijay Abraham I
From: Keerthy 

Currently devm_request_irq is being called before base, pci fields
of dra7xx_pcie structure are populated. It is called even before
pm_runtime_enable and pm_runtime_get_sync are called. This will
lead to exceptions if in case an interrupt is triggered before
the all of the above are done. Hence push the devm_request_irq
call to the end of the probe.

Signed-off-by: Keerthy 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/dwc/pci-dra7xx.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
index c6fef0a..8c53233 100644
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -410,13 +410,6 @@ static int __init dra7xx_pcie_probe(struct platform_device 
*pdev)
return -EINVAL;
}
 
-   ret = devm_request_irq(dev, irq, dra7xx_pcie_irq_handler,
-  IRQF_SHARED, "dra7xx-pcie-main", dra7xx);
-   if (ret) {
-   dev_err(dev, "failed to request irq\n");
-   return ret;
-   }
-
res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ti_conf");
base = devm_ioremap_nocache(dev, res->start, resource_size(res));
if (!base)
@@ -478,6 +471,13 @@ static int __init dra7xx_pcie_probe(struct platform_device 
*pdev)
if (ret < 0)
goto err_gpio;
 
+   ret = devm_request_irq(dev, irq, dra7xx_pcie_irq_handler,
+  IRQF_SHARED, "dra7xx-pcie-main", dra7xx);
+   if (ret) {
+   dev_err(dev, "failed to request irq\n");
+   goto err_gpio;
+   }
+
return 0;
 
 err_gpio:
-- 
1.7.9.5



Re: kexec regression since 4.9 caused by efi

2017-03-08 Thread Dave Young
Add efi/kexec list.

On 03/08/17 at 12:16pm, Omar Sandoval wrote:
> Hi, everyone,
> 
> Since 4.9, kexec results in the following panic on some of our servers:
> 
> [0.001000] general protection fault:  [#1] SMP
> [0.001000] Modules linked in:
> [0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53
> [0.001000] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05  
>  09/30/2016
> [0.001000] task: 81e0e4c0 task.stack: 81e0
> [0.001000] RIP: 0010:virt_efi_set_variable+0x85/0x1a0
> [0.001000] RSP: :81e03e18 EFLAGS: 00010202
> [0.001000] RAX: afafafafafafafaf RBX: 81e3a4e0 RCX: 
> 0007
> [0.001000] RDX: 81e03e70 RSI: 81e3a4e0 RDI: 
> 88407f8c2de0
> [0.001000] RBP: 81e03e60 R08:  R09: 
> 
> [0.001000] R10:  R11:  R12: 
> 81e03e70
> [0.001000] R13: 0007 R14:  R15: 
> 
> [0.001000] FS:  () GS:881fff60() 
> knlGS:
> [0.001000] CS:  0010 DS:  ES:  CR0: 80050033
> [0.001000] CR2: 88407f30f000 CR3: 001fff102000 CR4: 
> 000406b0
> [0.001000] DR0:  DR1:  DR2: 
> 
> [0.001000] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [0.001000] Call Trace:
> [0.001000]  efi_delete_dummy_variable+0x7a/0x80
> [0.001000]  efi_enter_virtual_mode+0x3e2/0x494
> [0.001000]  start_kernel+0x392/0x418
> [0.001000]  ? set_init_arg+0x55/0x55
> [0.001000]  x86_64_start_reservations+0x2a/0x2c
> [0.001000]  x86_64_start_kernel+0xea/0xed
> [0.001000]  start_cpu+0x14/0x14
> [0.001000] Code: 42 25 8d ff 80 3d 43 77 95 00 00 75 68 9c 8f 04 24 48 8b 
> 05 3e 7d 7e 00 48 89 de 4d 89 f9 4d 89 f0 44 89 e9 4c 89 e2 48 8b 40 58 <48> 
> 8b 78 58 31 c0 e8 90 e4 92 ff 48 8b 3c 24 48 c7 c6 2b 0a ca
> [0.001000] RIP: virt_efi_set_variable+0x85/0x1a0 RSP: 81e03e18
> [0.001000] ---[ end trace 0bd213e540e9b19f ]---
> [0.001000] Kernel panic - not syncing: Fatal exception
> [0.001000] ---[ end Kernel panic - not syncing: Fatal exception
> 
> Booting normally (i.e., not kexec) still works.
> 
> The decoded code is:
> 
> 
>0:   42 25 8d ff 80 3d   rex.X and $0x3d80ff8d,%eax
>6:   43 77 95rex.XB ja 0xff9e
>9:   00 00   add%al,(%rax)
>b:   75 68   jne0x75
>d:   9c  pushfq
>e:   8f 04 24popq   (%rsp)
>   11:   48 8b 05 3e 7d 7e 00mov0x7e7d3e(%rip),%rax# 0x7e7d56
>   18:   48 89 demov%rbx,%rsi
>   1b:   4d 89 f9mov%r15,%r9
>   1e:   4d 89 f0mov%r14,%r8
>   21:   44 89 e9mov%r13d,%ecx
>   24:   4c 89 e2mov%r12,%rdx
>   27:   48 8b 40 58 mov0x58(%rax),%rax
>   2b:*  48 8b 78 58 mov0x58(%rax),%rdi  <-- trapping 
> instruction
>   2f:   31 c0   xor%eax,%eax
>   31:   e8 90 e4 92 ff  callq  0xff92e4c6
>   36:   48 8b 3c 24 mov(%rsp),%rdi
>   3a:   48  rex.W
>   3b:   c7  .byte 0xc7
>   3c:   c6  (bad)
>   3d:   2b 0a   sub(%rdx),%ecx
>   3f:   ca  .byte 0xca
> 
> If I'm reading this correctly, efi.systab->runtime == 0xafafafafafafafaf,

I have no more clue yet from your provided log, but the runtime value is
odd to me. It is set in below code:

arch/x86/platform/efi/efi.c: efi_systab_init()
efi_systab.runtime = data ?
 (void *)(unsigned long)data->runtime :
 (void *)(unsigne long)systab64->runtime;

Here data is the setup_data passed by kexec-tools from normal kernel to
kexec kernel, efi_setup_data structure is like below: 
struct efi_setup_data {
u64 fw_vendor;
u64 runtime;
u64 tables;
u64 smbios;
u64 reserved[8];
};

kexec-tools get the runtime address from /sys/firmware/efi/runtime

So can you do some debuggin on your side, eg. see the sysfs runtime
value is correct or not. And add some printk in efi init path etc.

> and we're crashing when we try to dereference that.
> 
> Here is the output of efi=debug from before the crash:
> 
> [0.00] Linux version 4.11.0-rc1 (osan...@devbig561.prn1.facebook.com) 
> (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #53 SMP Wed Mar 8 
> 12:07:16 PST 2017
> [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.6.7-34_fbk7_2504_g8275185 
> ro root=LABEL=/ ipv6.autoconf=0 erst_disable biosdevname=0 net.ifnames=0 
> fsck.repair=yes pcie_pme=nomsi 
> 

Re: kexec regression since 4.9 caused by efi

2017-03-08 Thread Dave Young
Add efi/kexec list.

On 03/08/17 at 12:16pm, Omar Sandoval wrote:
> Hi, everyone,
> 
> Since 4.9, kexec results in the following panic on some of our servers:
> 
> [0.001000] general protection fault:  [#1] SMP
> [0.001000] Modules linked in:
> [0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53
> [0.001000] Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05  
>  09/30/2016
> [0.001000] task: 81e0e4c0 task.stack: 81e0
> [0.001000] RIP: 0010:virt_efi_set_variable+0x85/0x1a0
> [0.001000] RSP: :81e03e18 EFLAGS: 00010202
> [0.001000] RAX: afafafafafafafaf RBX: 81e3a4e0 RCX: 
> 0007
> [0.001000] RDX: 81e03e70 RSI: 81e3a4e0 RDI: 
> 88407f8c2de0
> [0.001000] RBP: 81e03e60 R08:  R09: 
> 
> [0.001000] R10:  R11:  R12: 
> 81e03e70
> [0.001000] R13: 0007 R14:  R15: 
> 
> [0.001000] FS:  () GS:881fff60() 
> knlGS:
> [0.001000] CS:  0010 DS:  ES:  CR0: 80050033
> [0.001000] CR2: 88407f30f000 CR3: 001fff102000 CR4: 
> 000406b0
> [0.001000] DR0:  DR1:  DR2: 
> 
> [0.001000] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [0.001000] Call Trace:
> [0.001000]  efi_delete_dummy_variable+0x7a/0x80
> [0.001000]  efi_enter_virtual_mode+0x3e2/0x494
> [0.001000]  start_kernel+0x392/0x418
> [0.001000]  ? set_init_arg+0x55/0x55
> [0.001000]  x86_64_start_reservations+0x2a/0x2c
> [0.001000]  x86_64_start_kernel+0xea/0xed
> [0.001000]  start_cpu+0x14/0x14
> [0.001000] Code: 42 25 8d ff 80 3d 43 77 95 00 00 75 68 9c 8f 04 24 48 8b 
> 05 3e 7d 7e 00 48 89 de 4d 89 f9 4d 89 f0 44 89 e9 4c 89 e2 48 8b 40 58 <48> 
> 8b 78 58 31 c0 e8 90 e4 92 ff 48 8b 3c 24 48 c7 c6 2b 0a ca
> [0.001000] RIP: virt_efi_set_variable+0x85/0x1a0 RSP: 81e03e18
> [0.001000] ---[ end trace 0bd213e540e9b19f ]---
> [0.001000] Kernel panic - not syncing: Fatal exception
> [0.001000] ---[ end Kernel panic - not syncing: Fatal exception
> 
> Booting normally (i.e., not kexec) still works.
> 
> The decoded code is:
> 
> 
>0:   42 25 8d ff 80 3d   rex.X and $0x3d80ff8d,%eax
>6:   43 77 95rex.XB ja 0xff9e
>9:   00 00   add%al,(%rax)
>b:   75 68   jne0x75
>d:   9c  pushfq
>e:   8f 04 24popq   (%rsp)
>   11:   48 8b 05 3e 7d 7e 00mov0x7e7d3e(%rip),%rax# 0x7e7d56
>   18:   48 89 demov%rbx,%rsi
>   1b:   4d 89 f9mov%r15,%r9
>   1e:   4d 89 f0mov%r14,%r8
>   21:   44 89 e9mov%r13d,%ecx
>   24:   4c 89 e2mov%r12,%rdx
>   27:   48 8b 40 58 mov0x58(%rax),%rax
>   2b:*  48 8b 78 58 mov0x58(%rax),%rdi  <-- trapping 
> instruction
>   2f:   31 c0   xor%eax,%eax
>   31:   e8 90 e4 92 ff  callq  0xff92e4c6
>   36:   48 8b 3c 24 mov(%rsp),%rdi
>   3a:   48  rex.W
>   3b:   c7  .byte 0xc7
>   3c:   c6  (bad)
>   3d:   2b 0a   sub(%rdx),%ecx
>   3f:   ca  .byte 0xca
> 
> If I'm reading this correctly, efi.systab->runtime == 0xafafafafafafafaf,

I have no more clue yet from your provided log, but the runtime value is
odd to me. It is set in below code:

arch/x86/platform/efi/efi.c: efi_systab_init()
efi_systab.runtime = data ?
 (void *)(unsigned long)data->runtime :
 (void *)(unsigne long)systab64->runtime;

Here data is the setup_data passed by kexec-tools from normal kernel to
kexec kernel, efi_setup_data structure is like below: 
struct efi_setup_data {
u64 fw_vendor;
u64 runtime;
u64 tables;
u64 smbios;
u64 reserved[8];
};

kexec-tools get the runtime address from /sys/firmware/efi/runtime

So can you do some debuggin on your side, eg. see the sysfs runtime
value is correct or not. And add some printk in efi init path etc.

> and we're crashing when we try to dereference that.
> 
> Here is the output of efi=debug from before the crash:
> 
> [0.00] Linux version 4.11.0-rc1 (osan...@devbig561.prn1.facebook.com) 
> (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #53 SMP Wed Mar 8 
> 12:07:16 PST 2017
> [0.00] Command line: BOOT_IMAGE=/vmlinuz-4.6.7-34_fbk7_2504_g8275185 
> ro root=LABEL=/ ipv6.autoconf=0 erst_disable biosdevname=0 net.ifnames=0 
> fsck.repair=yes pcie_pme=nomsi 
> 

Re: [PATCH v5 2/5] powerpc: kretprobes: override default function entry offset

2017-03-08 Thread Michael Ellerman
"Naveen N. Rao"  writes:
> On 2017/03/08 11:29AM, Arnaldo Carvalho de Melo wrote:
>> > I wasn't sure if you were planning on picking up KPROBES_ON_FTRACE for 
>> > v4.11. If so, it would be good to take this patch through the powerpc 
>> > tree. Otherwise, this can go via Ingo's tree.
>> 
>> If you guys convince Ingo that this should go _now_, then just cherry
>> pick what was merged into tip/perf/core that is needed for the arch
>> specific stuff and go from there.
>
> Ok, in hindsight, I think Michael's concern was actually for v4.12 

Yes I was talking about 4.12, sorry I thought that was implied :)

> itself, in which case this particular patch can go via powerpc tree, 
> while the rest of the patches in this series can go via your tree.
>
> Michael?

Yeah I think that's the easiest option. The function will be temporarily
unused until the two trees are merged, but I think that's fine.

cheers


Re: [PATCH v5 2/5] powerpc: kretprobes: override default function entry offset

2017-03-08 Thread Michael Ellerman
"Naveen N. Rao"  writes:
> On 2017/03/08 11:29AM, Arnaldo Carvalho de Melo wrote:
>> > I wasn't sure if you were planning on picking up KPROBES_ON_FTRACE for 
>> > v4.11. If so, it would be good to take this patch through the powerpc 
>> > tree. Otherwise, this can go via Ingo's tree.
>> 
>> If you guys convince Ingo that this should go _now_, then just cherry
>> pick what was merged into tip/perf/core that is needed for the arch
>> specific stuff and go from there.
>
> Ok, in hindsight, I think Michael's concern was actually for v4.12 

Yes I was talking about 4.12, sorry I thought that was implied :)

> itself, in which case this particular patch can go via powerpc tree, 
> while the rest of the patches in this series can go via your tree.
>
> Michael?

Yeah I think that's the easiest option. The function will be temporarily
unused until the two trees are merged, but I think that's fine.

cheers


Re: [f2fs-dev] [PATCH] f2fs: allocate a bio for discarding when actually issuing it

2017-03-08 Thread Chao Yu
Hi Jaegeuk,

On 2017/3/8 10:33, Jaegeuk Kim wrote:
> Let's allocate a bio when issuing discard commands later.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/f2fs.h|   4 +-
>  fs/f2fs/segment.c | 113 
> --
>  2 files changed, 62 insertions(+), 55 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index a58c2e43bd2a..870bb4d9bc65 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -197,10 +197,12 @@ enum {
>  struct discard_cmd {
>   struct list_head list;  /* command list */
>   struct completion wait; /* compleation */
> + struct block_device *bdev;  /* bdev */
>   block_t lstart; /* logical start address */
> + block_t start;  /* actual start address in dev */
>   block_t len;/* length */
> - struct bio *bio;/* bio */
>   int state;  /* state */
> + int error;  /* bio error */
>  };
>  
>  struct discard_cmd_control {
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 50c65cc4645a..d8f9e6c895cd 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -636,7 +636,8 @@ static void locate_dirty_segment(struct f2fs_sb_info 
> *sbi, unsigned int segno)
>  }
>  
>  static void __add_discard_cmd(struct f2fs_sb_info *sbi,
> - struct bio *bio, block_t lstart, block_t len)
> + struct block_device *bdev, block_t lstart,
> + block_t start, block_t len)
>  {
>   struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
>   struct list_head *cmd_list = &(dcc->discard_cmd_list);
> @@ -644,9 +645,9 @@ static void __add_discard_cmd(struct f2fs_sb_info *sbi,
>  
>   dc = f2fs_kmem_cache_alloc(discard_cmd_slab, GFP_NOFS);
>   INIT_LIST_HEAD(>list);
> - dc->bio = bio;
> - bio->bi_private = dc;
> + dc->bdev = bdev;
>   dc->lstart = lstart;
> + dc->start = start;
>   dc->len = len;
>   dc->state = D_PREP;
>   init_completion(>wait);
> @@ -658,22 +659,66 @@ static void __add_discard_cmd(struct f2fs_sb_info *sbi,
>  
>  static void __remove_discard_cmd(struct f2fs_sb_info *sbi, struct 
> discard_cmd *dc)
>  {
> - int err = dc->bio->bi_error;
> -
>   if (dc->state == D_DONE)
>   atomic_dec(&(SM_I(sbi)->dcc_info->submit_discard));
>  
> - if (err == -EOPNOTSUPP)
> - err = 0;
> + if (dc->error == -EOPNOTSUPP)
> + dc->error = 0;
>  
> - if (err)
> + if (dc->error)
>   f2fs_msg(sbi->sb, KERN_INFO,
> - "Issue discard failed, ret: %d", err);
> - bio_put(dc->bio);
> + "Issue discard failed, ret: %d", dc->error);
>   list_del(>list);
>   kmem_cache_free(discard_cmd_slab, dc);
>  }
>  
> +static void f2fs_submit_discard_endio(struct bio *bio)
> +{
> + struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private;
> +
> + complete(>wait);
> + dc->error = bio->bi_error;
> + dc->state = D_DONE;
> + bio_put(bio);
> +}
> +
> +/* this function is copied from blkdev_issue_discard from block/blk-lib.c */
> +static int __submit_discard_cmd(struct discard_cmd *dc)
> +{
> + struct bio *bio = NULL;
> + int err;
> +
> + err = __blkdev_issue_discard(dc->bdev,
> + SECTOR_FROM_BLOCK(dc->start),
> + SECTOR_FROM_BLOCK(dc->len),
> + GFP_NOFS, 0, );
> + if (!err && bio) {
> + bio->bi_private = dc;
> + bio->bi_end_io = f2fs_submit_discard_endio;
> + bio->bi_opf |= REQ_SYNC;
> + submit_bio(bio);

Should set flag only if __blkdev_issue_discard is successful?

dc->state = D_SUBMIT;

And how about moving atomic_inc(>submit_discard); here?

Thanks,

> + }
> + dc->state = D_SUBMIT;
> + return err;
> +}
> +
> +static int __queue_discard_cmd(struct f2fs_sb_info *sbi,
> + struct block_device *bdev, block_t blkstart, block_t blklen)
> +{
> + block_t lblkstart = blkstart;
> +
> + trace_f2fs_issue_discard(bdev, blkstart, blklen);
> +
> + if (sbi->s_ndevs) {
> + int devi = f2fs_target_device_index(sbi, blkstart);
> +
> + blkstart -= FDEV(devi).start_blk;
> + }
> + __add_discard_cmd(sbi, bdev, lblkstart, blkstart, blklen);
> + wake_up(_I(sbi)->dcc_info->discard_wait_queue);
> + return 0;
> +}
> +
>  /* This should be covered by global mutex, _i->sentry_lock */
>  void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr)
>  {
> @@ -690,8 +735,7 @@ void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, 
> block_t blkaddr)
>  
>   if (blkaddr == NULL_ADDR) {
>   if (dc->state == D_PREP) {
> - dc->state = D_SUBMIT;
> - submit_bio(dc->bio);

Re: [RFC 08/11] mm: make ttu's return boolean

2017-03-08 Thread Minchan Kim
Hi John,

On Tue, Mar 07, 2017 at 11:13:26PM -0800, John Hubbard wrote:
> On 03/01/2017 10:39 PM, Minchan Kim wrote:
> >try_to_unmap returns SWAP_SUCCESS or SWAP_FAIL so it's suitable for
> >boolean return. This patch changes it.
> 
> Hi Minchan,
> 
> So, up until this patch, I definitely like the cleanup, because as you
> observed, the return values didn't need so many different values. However,
> at this point, I think you should stop, and keep the SWAP_SUCCESS and
> SWAP_FAIL (or maybe even rename them to UNMAP_* or TTU_RESULT_*, to match
> their functions' names better), because removing them makes the code
> considerably less readable.
> 
> And since this is billed as a cleanup, we care here, even though this is a
> minor point. :)
> 
> Bool return values are sometimes perfect, such as when asking a question:
> 
>bool mode_changed = needs_modeset(crtc_state);
> 
> The above is very nice. However, for returning success or failure, bools are
> not as nice, because *usually* success == true, except when you use the
> errno-based system, in which success == 0 (which would translate to false,
> if you mistakenly treated it as a bool). That leads to the reader having to
> remember which system is in use, usually with no visual cues to help.

I think it's the matter of taste.

if (try_to_unmap(xxx))
something
else
something

It's perfectly understandable to me. IOW, if try_to_unmap returns true,
it means it did unmap successfully. Otherwise, failed.

IMHO, SWAP_SUCCESS or TTU_RESULT_* seems to be an over-engineering.
If the user want it, user can do it by introducing right variable name
in his context. See below.

> 
> >
> [...]
> > if (PageSwapCache(p)) {
> >@@ -971,7 +971,7 @@ static int hwpoison_user_mappings(struct page *p, 
> >unsigned long pfn,
> > collect_procs(hpage, , flags & MF_ACTION_REQUIRED);
> >
> > ret = try_to_unmap(hpage, ttu);
> >-if (ret != SWAP_SUCCESS)
> >+if (!ret)
> > pr_err("Memory failure: %#lx: failed to unmap page 
> > (mapcount=%d)\n",
> >pfn, page_mapcount(hpage));
> >
> >@@ -986,8 +986,7 @@ static int hwpoison_user_mappings(struct page *p, 
> >unsigned long pfn,
> >  * any accesses to the poisoned memory.
> >  */
> > forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL);
> >-kill_procs(, forcekill, trapno,
> >-  ret != SWAP_SUCCESS, p, pfn, flags);
> >+kill_procs(, forcekill, trapno, !ret , p, pfn, flags);
> 
> The kill_procs() invocation was a little more readable before.

Indeed but I think it's not a problem of try_to_unmap but ret variable name
isn't good any more. How about this?

bool unmap_success;

unmap_success = try_to_unmap(hpage, ttu);

..

kill_procs(, forcekill, trapno, !unmap_success , p, pfn, flags);

..

return unmap_success;

My point is user can introduce whatever variable name depends on his
context. No need to make return variable complicated, IMHO.

> 
> >
> [...]
> >diff --git a/mm/vmscan.c b/mm/vmscan.c
> >index 170c61f..e4b74f1 100644
> >--- a/mm/vmscan.c
> >+++ b/mm/vmscan.c
> >@@ -966,7 +966,6 @@ static unsigned long shrink_page_list(struct list_head 
> >*page_list,
> > int may_enter_fs;
> > enum page_references references = PAGEREF_RECLAIM_CLEAN;
> > bool dirty, writeback;
> >-int ret = SWAP_SUCCESS;
> >
> > cond_resched();
> >
> >@@ -1139,13 +1138,9 @@ static unsigned long shrink_page_list(struct 
> >list_head *page_list,
> >  * processes. Try to unmap it here.
> >  */
> > if (page_mapped(page)) {
> >-switch (ret = try_to_unmap(page,
> >-ttu_flags | TTU_BATCH_FLUSH)) {
> >-case SWAP_FAIL:
> 
> Again: the SWAP_FAIL makes it crystal clear which case we're in.

To me, I don't feel it.
To me, below is perfectly understandable.

if (try_to_unmap())
do something
 
That's why I think it's matter of taste. Okay, I admit I might be
biased, too so I will consider what you suggested if others votes
it.

Thanks.

> 
> I also wonder if UNMAP_FAIL or TTU_RESULT_FAIL is a better name?
> 
> thanks,
> John Hubbard
> NVIDIA
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [f2fs-dev] [PATCH] f2fs: allocate a bio for discarding when actually issuing it

2017-03-08 Thread Chao Yu
Hi Jaegeuk,

On 2017/3/8 10:33, Jaegeuk Kim wrote:
> Let's allocate a bio when issuing discard commands later.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/f2fs.h|   4 +-
>  fs/f2fs/segment.c | 113 
> --
>  2 files changed, 62 insertions(+), 55 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index a58c2e43bd2a..870bb4d9bc65 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -197,10 +197,12 @@ enum {
>  struct discard_cmd {
>   struct list_head list;  /* command list */
>   struct completion wait; /* compleation */
> + struct block_device *bdev;  /* bdev */
>   block_t lstart; /* logical start address */
> + block_t start;  /* actual start address in dev */
>   block_t len;/* length */
> - struct bio *bio;/* bio */
>   int state;  /* state */
> + int error;  /* bio error */
>  };
>  
>  struct discard_cmd_control {
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 50c65cc4645a..d8f9e6c895cd 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -636,7 +636,8 @@ static void locate_dirty_segment(struct f2fs_sb_info 
> *sbi, unsigned int segno)
>  }
>  
>  static void __add_discard_cmd(struct f2fs_sb_info *sbi,
> - struct bio *bio, block_t lstart, block_t len)
> + struct block_device *bdev, block_t lstart,
> + block_t start, block_t len)
>  {
>   struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
>   struct list_head *cmd_list = &(dcc->discard_cmd_list);
> @@ -644,9 +645,9 @@ static void __add_discard_cmd(struct f2fs_sb_info *sbi,
>  
>   dc = f2fs_kmem_cache_alloc(discard_cmd_slab, GFP_NOFS);
>   INIT_LIST_HEAD(>list);
> - dc->bio = bio;
> - bio->bi_private = dc;
> + dc->bdev = bdev;
>   dc->lstart = lstart;
> + dc->start = start;
>   dc->len = len;
>   dc->state = D_PREP;
>   init_completion(>wait);
> @@ -658,22 +659,66 @@ static void __add_discard_cmd(struct f2fs_sb_info *sbi,
>  
>  static void __remove_discard_cmd(struct f2fs_sb_info *sbi, struct 
> discard_cmd *dc)
>  {
> - int err = dc->bio->bi_error;
> -
>   if (dc->state == D_DONE)
>   atomic_dec(&(SM_I(sbi)->dcc_info->submit_discard));
>  
> - if (err == -EOPNOTSUPP)
> - err = 0;
> + if (dc->error == -EOPNOTSUPP)
> + dc->error = 0;
>  
> - if (err)
> + if (dc->error)
>   f2fs_msg(sbi->sb, KERN_INFO,
> - "Issue discard failed, ret: %d", err);
> - bio_put(dc->bio);
> + "Issue discard failed, ret: %d", dc->error);
>   list_del(>list);
>   kmem_cache_free(discard_cmd_slab, dc);
>  }
>  
> +static void f2fs_submit_discard_endio(struct bio *bio)
> +{
> + struct discard_cmd *dc = (struct discard_cmd *)bio->bi_private;
> +
> + complete(>wait);
> + dc->error = bio->bi_error;
> + dc->state = D_DONE;
> + bio_put(bio);
> +}
> +
> +/* this function is copied from blkdev_issue_discard from block/blk-lib.c */
> +static int __submit_discard_cmd(struct discard_cmd *dc)
> +{
> + struct bio *bio = NULL;
> + int err;
> +
> + err = __blkdev_issue_discard(dc->bdev,
> + SECTOR_FROM_BLOCK(dc->start),
> + SECTOR_FROM_BLOCK(dc->len),
> + GFP_NOFS, 0, );
> + if (!err && bio) {
> + bio->bi_private = dc;
> + bio->bi_end_io = f2fs_submit_discard_endio;
> + bio->bi_opf |= REQ_SYNC;
> + submit_bio(bio);

Should set flag only if __blkdev_issue_discard is successful?

dc->state = D_SUBMIT;

And how about moving atomic_inc(>submit_discard); here?

Thanks,

> + }
> + dc->state = D_SUBMIT;
> + return err;
> +}
> +
> +static int __queue_discard_cmd(struct f2fs_sb_info *sbi,
> + struct block_device *bdev, block_t blkstart, block_t blklen)
> +{
> + block_t lblkstart = blkstart;
> +
> + trace_f2fs_issue_discard(bdev, blkstart, blklen);
> +
> + if (sbi->s_ndevs) {
> + int devi = f2fs_target_device_index(sbi, blkstart);
> +
> + blkstart -= FDEV(devi).start_blk;
> + }
> + __add_discard_cmd(sbi, bdev, lblkstart, blkstart, blklen);
> + wake_up(_I(sbi)->dcc_info->discard_wait_queue);
> + return 0;
> +}
> +
>  /* This should be covered by global mutex, _i->sentry_lock */
>  void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr)
>  {
> @@ -690,8 +735,7 @@ void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, 
> block_t blkaddr)
>  
>   if (blkaddr == NULL_ADDR) {
>   if (dc->state == D_PREP) {
> - dc->state = D_SUBMIT;
> - submit_bio(dc->bio);
> + 

Re: [RFC 08/11] mm: make ttu's return boolean

2017-03-08 Thread Minchan Kim
Hi John,

On Tue, Mar 07, 2017 at 11:13:26PM -0800, John Hubbard wrote:
> On 03/01/2017 10:39 PM, Minchan Kim wrote:
> >try_to_unmap returns SWAP_SUCCESS or SWAP_FAIL so it's suitable for
> >boolean return. This patch changes it.
> 
> Hi Minchan,
> 
> So, up until this patch, I definitely like the cleanup, because as you
> observed, the return values didn't need so many different values. However,
> at this point, I think you should stop, and keep the SWAP_SUCCESS and
> SWAP_FAIL (or maybe even rename them to UNMAP_* or TTU_RESULT_*, to match
> their functions' names better), because removing them makes the code
> considerably less readable.
> 
> And since this is billed as a cleanup, we care here, even though this is a
> minor point. :)
> 
> Bool return values are sometimes perfect, such as when asking a question:
> 
>bool mode_changed = needs_modeset(crtc_state);
> 
> The above is very nice. However, for returning success or failure, bools are
> not as nice, because *usually* success == true, except when you use the
> errno-based system, in which success == 0 (which would translate to false,
> if you mistakenly treated it as a bool). That leads to the reader having to
> remember which system is in use, usually with no visual cues to help.

I think it's the matter of taste.

if (try_to_unmap(xxx))
something
else
something

It's perfectly understandable to me. IOW, if try_to_unmap returns true,
it means it did unmap successfully. Otherwise, failed.

IMHO, SWAP_SUCCESS or TTU_RESULT_* seems to be an over-engineering.
If the user want it, user can do it by introducing right variable name
in his context. See below.

> 
> >
> [...]
> > if (PageSwapCache(p)) {
> >@@ -971,7 +971,7 @@ static int hwpoison_user_mappings(struct page *p, 
> >unsigned long pfn,
> > collect_procs(hpage, , flags & MF_ACTION_REQUIRED);
> >
> > ret = try_to_unmap(hpage, ttu);
> >-if (ret != SWAP_SUCCESS)
> >+if (!ret)
> > pr_err("Memory failure: %#lx: failed to unmap page 
> > (mapcount=%d)\n",
> >pfn, page_mapcount(hpage));
> >
> >@@ -986,8 +986,7 @@ static int hwpoison_user_mappings(struct page *p, 
> >unsigned long pfn,
> >  * any accesses to the poisoned memory.
> >  */
> > forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL);
> >-kill_procs(, forcekill, trapno,
> >-  ret != SWAP_SUCCESS, p, pfn, flags);
> >+kill_procs(, forcekill, trapno, !ret , p, pfn, flags);
> 
> The kill_procs() invocation was a little more readable before.

Indeed but I think it's not a problem of try_to_unmap but ret variable name
isn't good any more. How about this?

bool unmap_success;

unmap_success = try_to_unmap(hpage, ttu);

..

kill_procs(, forcekill, trapno, !unmap_success , p, pfn, flags);

..

return unmap_success;

My point is user can introduce whatever variable name depends on his
context. No need to make return variable complicated, IMHO.

> 
> >
> [...]
> >diff --git a/mm/vmscan.c b/mm/vmscan.c
> >index 170c61f..e4b74f1 100644
> >--- a/mm/vmscan.c
> >+++ b/mm/vmscan.c
> >@@ -966,7 +966,6 @@ static unsigned long shrink_page_list(struct list_head 
> >*page_list,
> > int may_enter_fs;
> > enum page_references references = PAGEREF_RECLAIM_CLEAN;
> > bool dirty, writeback;
> >-int ret = SWAP_SUCCESS;
> >
> > cond_resched();
> >
> >@@ -1139,13 +1138,9 @@ static unsigned long shrink_page_list(struct 
> >list_head *page_list,
> >  * processes. Try to unmap it here.
> >  */
> > if (page_mapped(page)) {
> >-switch (ret = try_to_unmap(page,
> >-ttu_flags | TTU_BATCH_FLUSH)) {
> >-case SWAP_FAIL:
> 
> Again: the SWAP_FAIL makes it crystal clear which case we're in.

To me, I don't feel it.
To me, below is perfectly understandable.

if (try_to_unmap())
do something
 
That's why I think it's matter of taste. Okay, I admit I might be
biased, too so I will consider what you suggested if others votes
it.

Thanks.

> 
> I also wonder if UNMAP_FAIL or TTU_RESULT_FAIL is a better name?
> 
> thanks,
> John Hubbard
> NVIDIA
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}

2017-03-08 Thread Dan Williams
On Mon, Mar 6, 2017 at 12:22 AM, Heiko Carstens
 wrote:
> Hello Dan,
>
>> > If you look at commit 5e33bc4165f3 ("driver core / ACPI: Avoid device hot
>> > remove locking issues") then lock_device_hotplug_sysfs() was introduced to
>> > avoid a different subtle deadlock, but it also sleeps uninterruptible, but
>> > not for more than 5ms ;)
>> >
>> > However I'm not sure if the device hotplug lock should also be used to fix
>> > an unrelated bug that was introduced with the get_online_mems() /
>> > put_online_mems() interface. Should it?
>>
>> No, I don't think it should.
>>
>> I like your proposed direction of creating a new lock internal to
>> mem_hotplug_begin() to protect active_writer, and stop relying on
>> lock_device_hotplug to serve this purpose.
>>
>> > If so, we need to sprinkle around a couple of lock_device_hotplug() calls
>> > near mem_hotplug_begin() calls, like Sebastian already started, and give it
>> > additional semantics (protecting mem_hotplug.active_writer), and hope it
>> > doesn't lead to deadlocks anywhere.
>>
>> I'll put your proposed patch through some testing.
>
> On s390 it _seems_ to work. Did it pass your testing too?
> If so I would send a patch with proper patch description for inclusion.

Looks ok here. No lockdep warnings running it through it paces with
the persistent memory use case.


Re: [PATCH] mm, add_memory_resource: hold device_hotplug lock over mem_hotplug_{begin, done}

2017-03-08 Thread Dan Williams
On Mon, Mar 6, 2017 at 12:22 AM, Heiko Carstens
 wrote:
> Hello Dan,
>
>> > If you look at commit 5e33bc4165f3 ("driver core / ACPI: Avoid device hot
>> > remove locking issues") then lock_device_hotplug_sysfs() was introduced to
>> > avoid a different subtle deadlock, but it also sleeps uninterruptible, but
>> > not for more than 5ms ;)
>> >
>> > However I'm not sure if the device hotplug lock should also be used to fix
>> > an unrelated bug that was introduced with the get_online_mems() /
>> > put_online_mems() interface. Should it?
>>
>> No, I don't think it should.
>>
>> I like your proposed direction of creating a new lock internal to
>> mem_hotplug_begin() to protect active_writer, and stop relying on
>> lock_device_hotplug to serve this purpose.
>>
>> > If so, we need to sprinkle around a couple of lock_device_hotplug() calls
>> > near mem_hotplug_begin() calls, like Sebastian already started, and give it
>> > additional semantics (protecting mem_hotplug.active_writer), and hope it
>> > doesn't lead to deadlocks anywhere.
>>
>> I'll put your proposed patch through some testing.
>
> On s390 it _seems_ to work. Did it pass your testing too?
> If so I would send a patch with proper patch description for inclusion.

Looks ok here. No lockdep warnings running it through it paces with
the persistent memory use case.


Re: Race condition in ext4 (was Re: 4.11-rc1 acpi stomping ext4 slabs)

2017-03-08 Thread Nikolay Borisov


On  9.03.2017 03:58, Theodore Ts'o wrote:
> On Tue, Mar 07, 2017 at 10:40:53PM +0200, Nikolay Borisov wrote:
>> So this is wrong, the reason why the issues seemed fix is because I
>> switched my compiler to version 5.4.0. So this manifests only if I'm
>> using gcc 4.7.4. With the pr_info added here is the output of a boot. So
>> there are multiple invocations of ext4_ext_map_blocks and the freeing,
>> including with the address being used in subsequent kasan reports :
>> 88006ae8fdb0
> 
> Can you help bisect this, then?  I'm using Debian Testing, and the
> default gcc is gcc 6.3.0.  I'm currently forcing the use of gcc 5.4.1
> because I was running into problems with gcc 6.x a while back.  (TBH,
> I was thinking about trying to see if gcc 6.3 was stable for kernel
> compiles when I had some spare time.)  But I don't have access to
> *any* gcc 4.x on my development system, and I don't think I've tried
> using gcc 4.x in a long, Long, LONG time.
> 
> I'm currently kicking off a test run using 5.4.1 with KASAN enabled to
> see if I can trigger it myself.  Can you send me a copy of your
> .config so I can see what else might be interesting with your config?
> (e.g., SLAB vs SLUB, etc.)

Attached the config. FUrther debugging and talking with the kasan
developers I think this actually might be a kasan problem when used with
an old compiler.  I bisected this all the way to 1771c6e1a567ea0ba2,
which is the commit introducing the user access instrumentation. Here is
a mail thread where I confirmed that this might be a kasan issue :
https://lkml.org/lkml/2017/3/8/69

What I believe is happening is that the manual checks inserted in user
access code misses some context information due to instrumentation not
inserted by the compiler. Kasan gets confused as a result, hence the
warnings.


> 
> Thanks,
> 
>  - Ted
> 
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.7.0 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION="-nbor"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# 

Re: Race condition in ext4 (was Re: 4.11-rc1 acpi stomping ext4 slabs)

2017-03-08 Thread Nikolay Borisov


On  9.03.2017 03:58, Theodore Ts'o wrote:
> On Tue, Mar 07, 2017 at 10:40:53PM +0200, Nikolay Borisov wrote:
>> So this is wrong, the reason why the issues seemed fix is because I
>> switched my compiler to version 5.4.0. So this manifests only if I'm
>> using gcc 4.7.4. With the pr_info added here is the output of a boot. So
>> there are multiple invocations of ext4_ext_map_blocks and the freeing,
>> including with the address being used in subsequent kasan reports :
>> 88006ae8fdb0
> 
> Can you help bisect this, then?  I'm using Debian Testing, and the
> default gcc is gcc 6.3.0.  I'm currently forcing the use of gcc 5.4.1
> because I was running into problems with gcc 6.x a while back.  (TBH,
> I was thinking about trying to see if gcc 6.3 was stable for kernel
> compiles when I had some spare time.)  But I don't have access to
> *any* gcc 4.x on my development system, and I don't think I've tried
> using gcc 4.x in a long, Long, LONG time.
> 
> I'm currently kicking off a test run using 5.4.1 with KASAN enabled to
> see if I can trigger it myself.  Can you send me a copy of your
> .config so I can see what else might be interesting with your config?
> (e.g., SLAB vs SLUB, etc.)

Attached the config. FUrther debugging and talking with the kasan
developers I think this actually might be a kasan problem when used with
an old compiler.  I bisected this all the way to 1771c6e1a567ea0ba2,
which is the commit introducing the user access instrumentation. Here is
a mail thread where I confirmed that this might be a kasan issue :
https://lkml.org/lkml/2017/3/8/69

What I believe is happening is that the manual checks inserted in user
access code misses some context information due to instrumentation not
inserted by the compiler. Kasan gets confused as a result, hence the
warnings.


> 
> Thanks,
> 
>  - Ted
> 
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.7.0 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION="-nbor"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# 

[PATCH 1/6] mm/migrate: Add new mode parameter to migrate_page_copy() function

2017-03-08 Thread Anshuman Khandual
From: Zi Yan 

This is a prerequisite change required to make page migration framewok
copy in different modes like the default single threaded or the new
multi threaded one yet to be introduced in follow up patches. This
does not change any existing functionality. Only migrate_page_copy()
and copy_huge_page() function's signatures are affected.

Signed-off-by: Zi Yan 
Signed-off-by: Anshuman Khandual 
---
* Updated include/linux/migrate_mode.h comment as per Naoya

 fs/aio.c |  2 +-
 fs/f2fs/data.c   |  2 +-
 fs/hugetlbfs/inode.c |  2 +-
 fs/ubifs/file.c  |  2 +-
 include/linux/migrate.h  |  6 --
 include/linux/migrate_mode.h |  2 ++
 mm/migrate.c | 14 --
 7 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 873b4ca..ba3f6eb 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -418,7 +418,7 @@ static int aio_migratepage(struct address_space *mapping, 
struct page *new,
 * events from being lost.
 */
spin_lock_irqsave(>completion_lock, flags);
-   migrate_page_copy(new, old);
+   migrate_page_copy(new, old, MIGRATE_ST);
BUG_ON(ctx->ring_pages[idx] != old);
ctx->ring_pages[idx] = new;
spin_unlock_irqrestore(>completion_lock, flags);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9ac2625..ad41356 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1997,7 +1997,7 @@ int f2fs_migrate_page(struct address_space *mapping,
SetPagePrivate(newpage);
set_page_private(newpage, page_private(page));
 
-   migrate_page_copy(newpage, page);
+   migrate_page_copy(newpage, page, MIGRATE_ST);
 
return MIGRATEPAGE_SUCCESS;
 }
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 54de77e..0e16512f 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -850,7 +850,7 @@ static int hugetlbfs_migrate_page(struct address_space 
*mapping,
rc = migrate_huge_page_move_mapping(mapping, newpage, page);
if (rc != MIGRATEPAGE_SUCCESS)
return rc;
-   migrate_page_copy(newpage, page);
+   migrate_page_copy(newpage, page, MIGRATE_ST);
 
return MIGRATEPAGE_SUCCESS;
 }
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index b0d7837..293616f 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1482,7 +1482,7 @@ static int ubifs_migrate_page(struct address_space 
*mapping,
SetPagePrivate(newpage);
}
 
-   migrate_page_copy(newpage, page);
+   migrate_page_copy(newpage, page, MIGRATE_ST);
return MIGRATEPAGE_SUCCESS;
 }
 #endif
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ae8d475..d843b8f 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -42,7 +42,8 @@ extern void putback_movable_page(struct page *page);
 
 extern int migrate_prep(void);
 extern int migrate_prep_local(void);
-extern void migrate_page_copy(struct page *newpage, struct page *page);
+extern void migrate_page_copy(struct page *newpage, struct page *page,
+   enum migrate_mode mode);
 extern int migrate_huge_page_move_mapping(struct address_space *mapping,
  struct page *newpage, struct page *page);
 extern int migrate_page_move_mapping(struct address_space *mapping,
@@ -61,7 +62,8 @@ static inline int migrate_prep(void) { return -ENOSYS; }
 static inline int migrate_prep_local(void) { return -ENOSYS; }
 
 static inline void migrate_page_copy(struct page *newpage,
-struct page *page) {}
+struct page *page,
+enum migrate_mode mode) {}
 
 static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
  struct page *newpage, struct page *page)
diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index ebf3d89..deaeba5 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -6,11 +6,13 @@
  * on most operations but not ->writepage as the potential stall time
  * is too significant
  * MIGRATE_SYNC will block when migrating pages
+ * MIGRATE_ST will use single thread when migrating pages
  */
 enum migrate_mode {
MIGRATE_ASYNC,
MIGRATE_SYNC_LIGHT,
MIGRATE_SYNC,
+   MIGRATE_ST
 };
 
 #endif /* MIGRATE_MODE_H_INCLUDED */
diff --git a/mm/migrate.c b/mm/migrate.c
index e8ae11a..5ef4aa4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -631,7 +631,8 @@ static void __copy_gigantic_page(struct page *dst, struct 
page *src,
}
 }
 
-static void copy_huge_page(struct page *dst, struct page *src)
+static void copy_huge_page(struct page *dst, struct page *src,
+   enum migrate_mode mode)
 {
int i;
int nr_pages;
@@ -660,12 +661,13 @@ 

[PATCH 1/6] mm/migrate: Add new mode parameter to migrate_page_copy() function

2017-03-08 Thread Anshuman Khandual
From: Zi Yan 

This is a prerequisite change required to make page migration framewok
copy in different modes like the default single threaded or the new
multi threaded one yet to be introduced in follow up patches. This
does not change any existing functionality. Only migrate_page_copy()
and copy_huge_page() function's signatures are affected.

Signed-off-by: Zi Yan 
Signed-off-by: Anshuman Khandual 
---
* Updated include/linux/migrate_mode.h comment as per Naoya

 fs/aio.c |  2 +-
 fs/f2fs/data.c   |  2 +-
 fs/hugetlbfs/inode.c |  2 +-
 fs/ubifs/file.c  |  2 +-
 include/linux/migrate.h  |  6 --
 include/linux/migrate_mode.h |  2 ++
 mm/migrate.c | 14 --
 7 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 873b4ca..ba3f6eb 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -418,7 +418,7 @@ static int aio_migratepage(struct address_space *mapping, 
struct page *new,
 * events from being lost.
 */
spin_lock_irqsave(>completion_lock, flags);
-   migrate_page_copy(new, old);
+   migrate_page_copy(new, old, MIGRATE_ST);
BUG_ON(ctx->ring_pages[idx] != old);
ctx->ring_pages[idx] = new;
spin_unlock_irqrestore(>completion_lock, flags);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9ac2625..ad41356 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1997,7 +1997,7 @@ int f2fs_migrate_page(struct address_space *mapping,
SetPagePrivate(newpage);
set_page_private(newpage, page_private(page));
 
-   migrate_page_copy(newpage, page);
+   migrate_page_copy(newpage, page, MIGRATE_ST);
 
return MIGRATEPAGE_SUCCESS;
 }
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 54de77e..0e16512f 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -850,7 +850,7 @@ static int hugetlbfs_migrate_page(struct address_space 
*mapping,
rc = migrate_huge_page_move_mapping(mapping, newpage, page);
if (rc != MIGRATEPAGE_SUCCESS)
return rc;
-   migrate_page_copy(newpage, page);
+   migrate_page_copy(newpage, page, MIGRATE_ST);
 
return MIGRATEPAGE_SUCCESS;
 }
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index b0d7837..293616f 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1482,7 +1482,7 @@ static int ubifs_migrate_page(struct address_space 
*mapping,
SetPagePrivate(newpage);
}
 
-   migrate_page_copy(newpage, page);
+   migrate_page_copy(newpage, page, MIGRATE_ST);
return MIGRATEPAGE_SUCCESS;
 }
 #endif
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ae8d475..d843b8f 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -42,7 +42,8 @@ extern void putback_movable_page(struct page *page);
 
 extern int migrate_prep(void);
 extern int migrate_prep_local(void);
-extern void migrate_page_copy(struct page *newpage, struct page *page);
+extern void migrate_page_copy(struct page *newpage, struct page *page,
+   enum migrate_mode mode);
 extern int migrate_huge_page_move_mapping(struct address_space *mapping,
  struct page *newpage, struct page *page);
 extern int migrate_page_move_mapping(struct address_space *mapping,
@@ -61,7 +62,8 @@ static inline int migrate_prep(void) { return -ENOSYS; }
 static inline int migrate_prep_local(void) { return -ENOSYS; }
 
 static inline void migrate_page_copy(struct page *newpage,
-struct page *page) {}
+struct page *page,
+enum migrate_mode mode) {}
 
 static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
  struct page *newpage, struct page *page)
diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index ebf3d89..deaeba5 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -6,11 +6,13 @@
  * on most operations but not ->writepage as the potential stall time
  * is too significant
  * MIGRATE_SYNC will block when migrating pages
+ * MIGRATE_ST will use single thread when migrating pages
  */
 enum migrate_mode {
MIGRATE_ASYNC,
MIGRATE_SYNC_LIGHT,
MIGRATE_SYNC,
+   MIGRATE_ST
 };
 
 #endif /* MIGRATE_MODE_H_INCLUDED */
diff --git a/mm/migrate.c b/mm/migrate.c
index e8ae11a..5ef4aa4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -631,7 +631,8 @@ static void __copy_gigantic_page(struct page *dst, struct 
page *src,
}
 }
 
-static void copy_huge_page(struct page *dst, struct page *src)
+static void copy_huge_page(struct page *dst, struct page *src,
+   enum migrate_mode mode)
 {
int i;
int nr_pages;
@@ -660,12 +661,13 @@ static void copy_huge_page(struct page *dst, struct page 
*src)
 /*
  * 

[GIT PULL] xen features and fixes for 4.11 rc1

2017-03-08 Thread Juergen Gross
Linus,

Please git pull the following tag:

 git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
for-linus-4.11-rc1-tag

features and fixes for 4.11 rc1

It contains one fix for MSIX handling under Xen and a trivial cleanup
patch.

Thanks.

Juergen

 arch/x86/pci/xen.c   | 23 +++
 drivers/xen/xenbus/xenbus_dev_frontend.c |  1 -
 2 files changed, 7 insertions(+), 17 deletions(-)

Dan Streetman (1):
  xen: do not re-use pirq number cached in pci device msi msg data

Masanari Iida (1):
  xenbus: Remove duplicate inclusion of linux/init.h


[GIT PULL] xen features and fixes for 4.11 rc1

2017-03-08 Thread Juergen Gross
Linus,

Please git pull the following tag:

 git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
for-linus-4.11-rc1-tag

features and fixes for 4.11 rc1

It contains one fix for MSIX handling under Xen and a trivial cleanup
patch.

Thanks.

Juergen

 arch/x86/pci/xen.c   | 23 +++
 drivers/xen/xenbus/xenbus_dev_frontend.c |  1 -
 2 files changed, 7 insertions(+), 17 deletions(-)

Dan Streetman (1):
  xen: do not re-use pirq number cached in pci device msi msg data

Masanari Iida (1):
  xenbus: Remove duplicate inclusion of linux/init.h


Re: [PATCH 3/4] phy: rockchip-typec: support DP phy switch

2017-03-08 Thread Brian Norris
Hi,

On Thu, Mar 09, 2017 at 02:02:54AM +0100, Heiko Stuebner wrote:
> Am Mittwoch, 8. März 2017, 16:39:23 CET schrieb Brian Norris:
> > On Fri, Feb 10, 2017 at 03:44:13PM +0800, Chris Zhong wrote:
> > > There are 2 Type-c PHYs in RK3399, but only one DP controller. Hence
> > > only one PHY can connect to DP controller at one time, the other should
> > > be disconnected. The GRF_SOC_CON26 register has a switch bit to do it,
> > > set this bit means enable PHY 1, clear this bit means enable PHY 0.
> > > 
> > > Signed-off-by: Chris Zhong 
> > > ---
> > > 
> > >  drivers/phy/phy-rockchip-typec.c | 9 +
> > >  1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/drivers/phy/phy-rockchip-typec.c
> > > b/drivers/phy/phy-rockchip-typec.c index 7cfb0f8..1604aaa 100644
> > > --- a/drivers/phy/phy-rockchip-typec.c
> > > +++ b/drivers/phy/phy-rockchip-typec.c

...

> > > @@ -869,6 +873,11 @@ static int tcphy_parse_dt(struct rockchip_typec_phy
> > > *tcphy,> 
> > >   if (ret)
> > >   
> > >   return ret;
> > > 
> > > + ret = tcphy_get_param(dev, >uphy_dp_sel,
> > > +   "rockchip,uphy-dp-sel");
> > > + if (ret)
> > > + return ret;
> > 
> > What about existing device trees? You're essentially adding this
> > new property and requiring it at the same time.
> > 
> > Or are we considering no RK3399 DP stable at the moment? I guess we
> > haven't actually merged any device trees that support this yet, no?
> 
> An interesting situation we're in here. On the one hand, you're right this 
> breaks "backwards compatiblity".
> 
> But on the other hand, the type-c phy is currently very much unused. The only 
> current board rk3399-evb.dts does not enable them (so they're disabled 
> everywhere) and we have neither dwc3 nor dp nodes in any rk3399 devicetrees 
> so 
> far. Also Rob was ok with the binding change :-) .
> 
> So from my pov, I'd say it _should_ be ok, as nothing is using the phys at 
> all 
> yet and thus there is nothing that could get broken.

Yeah, I guess it's OK... but BTW out-of-tree DTs are perfectly
legit, once the bindings are accepted.

Another random point of contention (not worth too much, as the pattern
is already set), but why do these deserve DT properties at all? The
device already has a "rk3399" compatible property, so can't we derive
GRF offsets from that?

Brian


Re: [PATCH 3/4] phy: rockchip-typec: support DP phy switch

2017-03-08 Thread Brian Norris
Hi,

On Thu, Mar 09, 2017 at 02:02:54AM +0100, Heiko Stuebner wrote:
> Am Mittwoch, 8. März 2017, 16:39:23 CET schrieb Brian Norris:
> > On Fri, Feb 10, 2017 at 03:44:13PM +0800, Chris Zhong wrote:
> > > There are 2 Type-c PHYs in RK3399, but only one DP controller. Hence
> > > only one PHY can connect to DP controller at one time, the other should
> > > be disconnected. The GRF_SOC_CON26 register has a switch bit to do it,
> > > set this bit means enable PHY 1, clear this bit means enable PHY 0.
> > > 
> > > Signed-off-by: Chris Zhong 
> > > ---
> > > 
> > >  drivers/phy/phy-rockchip-typec.c | 9 +
> > >  1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/drivers/phy/phy-rockchip-typec.c
> > > b/drivers/phy/phy-rockchip-typec.c index 7cfb0f8..1604aaa 100644
> > > --- a/drivers/phy/phy-rockchip-typec.c
> > > +++ b/drivers/phy/phy-rockchip-typec.c

...

> > > @@ -869,6 +873,11 @@ static int tcphy_parse_dt(struct rockchip_typec_phy
> > > *tcphy,> 
> > >   if (ret)
> > >   
> > >   return ret;
> > > 
> > > + ret = tcphy_get_param(dev, >uphy_dp_sel,
> > > +   "rockchip,uphy-dp-sel");
> > > + if (ret)
> > > + return ret;
> > 
> > What about existing device trees? You're essentially adding this
> > new property and requiring it at the same time.
> > 
> > Or are we considering no RK3399 DP stable at the moment? I guess we
> > haven't actually merged any device trees that support this yet, no?
> 
> An interesting situation we're in here. On the one hand, you're right this 
> breaks "backwards compatiblity".
> 
> But on the other hand, the type-c phy is currently very much unused. The only 
> current board rk3399-evb.dts does not enable them (so they're disabled 
> everywhere) and we have neither dwc3 nor dp nodes in any rk3399 devicetrees 
> so 
> far. Also Rob was ok with the binding change :-) .
> 
> So from my pov, I'd say it _should_ be ok, as nothing is using the phys at 
> all 
> yet and thus there is nothing that could get broken.

Yeah, I guess it's OK... but BTW out-of-tree DTs are perfectly
legit, once the bindings are accepted.

Another random point of contention (not worth too much, as the pattern
is already set), but why do these deserve DT properties at all? The
device already has a "rk3399" compatible property, so can't we derive
GRF offsets from that?

Brian


Re: [PATCH net] team: use ETH_MAX_MTU as max mtu

2017-03-08 Thread David Miller
From: Jarod Wilson 
Date: Mon,  6 Mar 2017 08:48:58 -0500

> This restores the ability to set a team device's mtu to anything higher
> than 1500. Similar to the reported issue with bonding, the team driver
> calls ether_setup(), which sets an initial max_mtu of 1500, while the
> underlying hardware can handle something much larger. Just set it to
> ETH_MAX_MTU to support all possible values, and the limitations of the
> underlying devices will prevent setting anything too large.
> 
> Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
> CC: Cong Wang 
> CC: Jiri Pirko 
> CC: net...@vger.kernel.org
> Signed-off-by: Jarod Wilson 

Applied and queued up for -stable, thanks.


Re: [PATCH net] team: use ETH_MAX_MTU as max mtu

2017-03-08 Thread David Miller
From: Jarod Wilson 
Date: Mon,  6 Mar 2017 08:48:58 -0500

> This restores the ability to set a team device's mtu to anything higher
> than 1500. Similar to the reported issue with bonding, the team driver
> calls ether_setup(), which sets an initial max_mtu of 1500, while the
> underlying hardware can handle something much larger. Just set it to
> ETH_MAX_MTU to support all possible values, and the limitations of the
> underlying devices will prevent setting anything too large.
> 
> Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
> CC: Cong Wang 
> CC: Jiri Pirko 
> CC: net...@vger.kernel.org
> Signed-off-by: Jarod Wilson 

Applied and queued up for -stable, thanks.


Re: [PATCH] net: toshiba: spider_net: use new api ethtool_{get|set}_link_ksettings

2017-03-08 Thread David Miller
From: Philippe Reynes 
Date: Sun,  5 Mar 2017 23:46:00 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH] net: toshiba: spider_net: use new api ethtool_{get|set}_link_ksettings

2017-03-08 Thread David Miller
From: Philippe Reynes 
Date: Sun,  5 Mar 2017 23:46:00 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH] net: toshiba: ps3_genic_net: use new api ethtool_{get|set}_link_ksettings

2017-03-08 Thread David Miller
From: Philippe Reynes 
Date: Sun,  5 Mar 2017 23:21:06 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH] net: toshiba: ps3_genic_net: use new api ethtool_{get|set}_link_ksettings

2017-03-08 Thread David Miller
From: Philippe Reynes 
Date: Sun,  5 Mar 2017 23:21:06 +0100

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: blk: improve order of bio handling in generic_make_request()

2017-03-08 Thread NeilBrown
On Wed, Mar 08 2017, Mikulas Patocka wrote:

> On Wed, 8 Mar 2017, NeilBrown wrote:
>> 
>> I don't think this will fix the DM snapshot deadlock by itself.
>> Rather, it make it possible for some internal changes to DM to fix it.
>> The DM change might be something vaguely like:
>> 
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index 3086da5664f3..06ee0960e415 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct 
>> clone_info *ci)
>> 
>>  len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count);
>> 
>> +if (len < ci->sector_count) {
>> +struct bio *split = bio_split(bio, len, GFP_NOIO, fs_bio_set);
>
> fs_bio_set is a shared bio set, so it is prone to deadlocks. For this 
> change, we would need two bio sets per dm device, one for the split bio 
> and one for the outgoing bio. (this also means having one more kernel 
> thread per dm device)

Yes, two local bio_sets would be best.
But we don't really need those extra kernel threads.  I'll start working
on patches to make them optional, and then to start removing them.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: blk: improve order of bio handling in generic_make_request()

2017-03-08 Thread NeilBrown
On Wed, Mar 08 2017, Mikulas Patocka wrote:

> On Wed, 8 Mar 2017, NeilBrown wrote:
>> 
>> I don't think this will fix the DM snapshot deadlock by itself.
>> Rather, it make it possible for some internal changes to DM to fix it.
>> The DM change might be something vaguely like:
>> 
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index 3086da5664f3..06ee0960e415 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct 
>> clone_info *ci)
>> 
>>  len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count);
>> 
>> +if (len < ci->sector_count) {
>> +struct bio *split = bio_split(bio, len, GFP_NOIO, fs_bio_set);
>
> fs_bio_set is a shared bio set, so it is prone to deadlocks. For this 
> change, we would need two bio sets per dm device, one for the split bio 
> and one for the outgoing bio. (this also means having one more kernel 
> thread per dm device)

Yes, two local bio_sets would be best.
But we don't really need those extra kernel threads.  I'll start working
on patches to make them optional, and then to start removing them.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


  1   2   3   4   5   6   7   8   9   10   >