[PATCH v5 1/5] ARM: dma-mapping: Optimize allocation

public inbox for linux-arm-kernel@lists.infradead.org 
 help / color / mirror / Atom feed

From: robin.murphy@arm•com (Robin Murphy)
To: linux-arm-kernel@lists•infradead.org
Subject: [PATCH v5 1/5] ARM: dma-mapping: Optimize allocation
Date: Wed, 13 Jan 2016 17:44:18 +0000	[thread overview]
Message-ID: <56968CF2.1020409@arm.com> (raw)
In-Reply-To: <CAAFQd5C8asfo8wSa=jKvp4Vmg6A83R-vG7vQXtsHyOTADLo+9g@mail.gmail.com>

On 13/01/16 17:33, Tomasz Figa wrote:
> On Wed, Jan 13, 2016 at 9:17 PM, Robin Murphy <robin.murphy@arm•com> wrote:
>> Hi Doug,
>>
>>
>> On 08/01/16 23:05, Douglas Anderson wrote:
>>>
>>> The __iommu_alloc_buffer() is expected to be called to allocate pretty
>>> sizeable buffers.  Upon simple tests of video I saw it trying to
>>> allocate 4,194,304 bytes.  The function tries to allocate large chunks
>>> in order to optimize IOMMU TLB usage.
>>>
>>> The current function is very, very slow.
>>>
>>> One problem is the way it keeps trying and trying to allocate big
>>> chunks.  Imagine a very fragmented memory that has 4M free but no
>>> contiguous pages at all.  Further imagine allocating 4M (1024 pages).
>>> We'll do the following memory allocations:
>>> - For page 1:
>>>     - Try to allocate order 10 (no retry)
>>>     - Try to allocate order 9 (no retry)
>>>     - ...
>>>     - Try to allocate order 0 (with retry, but not needed)
>>> - For page 2:
>>>     - Try to allocate order 9 (no retry)
>>>     - Try to allocate order 8 (no retry)
>>>     - ...
>>>     - Try to allocate order 0 (with retry, but not needed)
>>> - ...
>>> - ...
>>>
>>> Total number of calls to alloc() calls for this case is:
>>>     sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
>>>     => 9228
>>>
>>> The above is obviously worse case, but given how slow alloc can be we
>>> really want to try to avoid even somewhat bad cases.  I timed the old
>>> code with a device under memory pressure and it wasn't hard to see it
>>> take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
>>> was done on kernel 3.14, so possibly mainline would behave
>>> differently).
>>>
>>> A second problem is that allocating big chunks under memory pressure
>>> when we don't need them is just not a great idea anyway unless we really
>>> need them.  We can make due pretty well with smaller chunks so it's
>>> probably wise to leave bigger chunks for other users once memory
>>> pressure is on.
>>>
>>> Let's adjust the allocation like this:
>>>
>>> 1. If a big chunk fails, stop trying to hard and bump down to lower
>>>      order allocations.
>>> 2. Don't try useless orders.  The whole point of big chunks is to
>>>      optimize the TLB and it can really only make use of 2M, 1M, 64K and
>>>      4K sizes.
>>>
>>> We'll still tend to eat up a bunch of big chunks, but that might be the
>>> right answer for some users.  A future patch could possibly add a new
>>> DMA_ATTR that would let the caller decide that TLB optimization isn't
>>> important and that we should use smaller chunks.  Presumably this would
>>> be a sane strategy for some callers.
>>
>>
>> Now that I've had time to think about it properly:
>>
>> Reviewed-by: Robin Murphy <robin.murphy@arm•com>
>>
>> I just had an absolutely disgusting idea of how to get the same progression
>> with just a single variable and no static array, but I'll keep that firmly
>> to myself as it's almost IOCCC-grade WTF :D
>
> Just out of curiosity, a bitmap and loop with fls() and clearing bit
> on failure or something more freaky? :)

Got a Python interpreter handy?

order = 9
for i in range(4):
     print order
     order = (order - 1) & 0xc

Like I said, disgusting :D

Robin.

>
> Anyway:
>
> Reviewed-by: Tomasz Figa <tfiga@chromium•org>
>
> Best regards,
> Tomasz
>

next prev parent reply	other threads:[~2016-01-13 17:44 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-08 23:05 [PATCH v5 0/5] dma-mapping: Patches for speeding up allocation Douglas Anderson
2016-01-08 23:05 ` [PATCH v5 1/5] ARM: dma-mapping: Optimize allocation Douglas Anderson
2016-01-13 12:17   ` Robin Murphy
2016-01-13 17:33     ` Tomasz Figa
2016-01-13 17:44       ` Robin Murphy [this message]
2016-01-08 23:05 ` [PATCH v5 3/5] ARM: dma-mapping: Use DMA_ATTR_NO_HUGE_PAGE hint to optimize allocation Douglas Anderson
2016-01-08 23:05 ` [PATCH v5 5/5] [media] s5p-mfc: Set DMA_ATTR_NO_HUGE_PAGE Douglas Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56968CF2.1020409@arm.com \
    --to=robin.murphy@arm$(echo .)com \
    --cc=linux-arm-kernel@lists$(echo .)infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox