public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV•linux.org.uk>
To: David Miller <davem@davemloft•net>
Cc: herbert@gondor•apana.org.au, netdev@vger•kernel.org,
	linux-kernel@vger•kernel.org, bcrl@kvack•org
Subject: Re: [0/3] net: Kill skb_copy_datagram_const_iovec
Date: Tue, 4 Nov 2014 05:45:13 +0000	[thread overview]
Message-ID: <20141104054513.GB7996@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20141103.150553.633529234958324183.davem@davemloft.net>

On Mon, Nov 03, 2014 at 03:05:53PM -0500, David Miller wrote:

> I'll see if I can make some progress converting the networking over
> to iov_iter.  It can't be that difficult... albeit perhaps a little
> time consuming.

FWIW, I have a queue that got started back in April; basically, the plan
of attack was
	* separate kernel-side and userland msghdr.
	* localize ->msg_iov uses - most of that gets taken care of by
several new helpers, as in
    new helper: skb_copy_datagram_msg()
    
    Absolute majority of skb_copy_datagram_iovec() callers (49 out of 56)
    are passing it msg->msg_iov as iovec.  Provide a trivial wrapper that
    takes msg as argument instead of iovec.
and several like that (the numbers in the above are probably incorrect these
days - it was done more than half a year ago).
	* switch kernel-side msghdr to iov_iter.  That means diverging
layouts; it's really not hard, since we have copying of msghdr from
userland already localized.  Initially - just a mechanical conversion
(i.e. direct uses of iov_iter fields instead of ->msg_iov/->msg_iovlen;
note that after the introduction of wrappers the number of such places
is very much reduced).
	* start converting those relatively few places to iov_iter primitives.

And that's where it got stalled, since we have to deal with expectations
of callers.  Syscall ones are trivial; that's not a problem.  Unfortunately,
there are kernel_{send,recv}msg() users, and those do care about the state the
iovec is left in.  Strictly speaking, the state of iovec after e.g.
->sendmsg() is undefined.  And it's not just protocol-dependent - unless
I'm seriously misreading it, tcp_sendmsg() ends up modifying iovec in case
when it hits tcp_send_rcvq(), while in the normal case it leaves iovec
unmodified.  So in general you need to feed ->{send,recv}msg() a throwaway copy
of iovec.  Leads to wonders like
        /* NB we can't trust socket ops to either consume our iovs
         * or leave them alone. */
        LASSERT (niov > 0);

        for (nob = i = 0; i < niov; i++) {
                scratchiov[i] = iov[i];
                nob += scratchiov[i].iov_len;
        }
        LASSERT (nob <= conn->ksnc_rx_nob_wanted);

        rc = kernel_recvmsg(conn->ksnc_sock, &msg,
                (struct kvec *)scratchiov, niov, nob, MSG_DONTWAIT);
etc.  However, there are places that don't bother and do this:
        while (total_rx < data) {
                rx_loop = kernel_recvmsg(conn->sock, &msg, iov_p, iov_len,
                                        (data - total_rx), MSG_WAITALL);
                if (rx_loop <= 0) {
                        pr_debug("rx_loop: %d total_rx: %d\n",
                                rx_loop, total_rx); 
                        return rx_loop;
                }
                total_rx += rx_loop;
                pr_debug("rx_loop: %d, total_rx: %d, data: %d\n",
                                rx_loop, total_rx, data);
        }
(that's iscsit_do_rx_data()).  Maybe it's a bug; maybe it's relying on
specific behaviour of the protocol known to be used - this code clearly
expects recvmsg to advance iovec, which seems to depend only on the
protocol.  At the moment.  In any case, it's very brittle...

Hell knows; I hadn't finished digging through that zoo - got sidetracked back
then.  *IF* all such places either use a throwaway copy or assume that iovec
gets modified, we can do the following: switch the access to iovecs to
iov_iter primitives, with the first kind of callers creating a throwaway
iov_iter and the second just feeding the same iov_iter to e.g.
kernel_recvmsg().  iovec will remain constant, iov_iter will be advanced.
Moreover, in a lot of cases of first kind will be able to get rid of
throwaway iov_iter (and of manually advancing it), effectively converting
to the second one.

If we have places that currently rely on iovec remaining unchanged (i.e.
manually advancing it after kernel_{send,recv}msg()), the series will be
more painful ;-/  I very much hope that no such places exist...

FWIW, there is also a tactical question that needs to be dealt with.  We
can, of course, start with renaming the "kernel-side" (i.e. post
copy_msghdr_from_user()/get_compat_msghdr()) to struct kmsghdr.  OTOH,
that's a _lot_ of churn for very little reason - most of the instances
in the tree are of that kind.  So I did it the other way round - introduced
struct user_msghdr (only in linux/socket.h; note that we do *not* have
struct msghdr in uabi/linux/socket.h, or anywhere else in uabi/*),
made the syscalls take pointers to it and (initially) rely upon the identical
layouts in copy_msghdr_from_user(); once we put iov_iter into kernel-side
msghdr, we'll just do it like get_compat_msghdr() does.

Is that acceptable?  It would greatly reduce the amount of churn in net/* -
we don't need to pass iov_iter separately and most of the functions in
the middle of call chains are completely unchanged.  Only the originators
of ->sendmsg()/->recvmsg() and the places doing actual data copying
need to be touched.  OTOH, it makes for kernel struct msghdr looking
odd - instead of normal ->msg_iov and ->msg_iovlen it would have
->msg_iov_iter, with ->sendmsg()/->recvmsg() callers needing to set it
up...  OTTH, the things *are* odd from userland programmer POV - sendmsg(2)
and recvmsg(2) leave the iovec unchanged, and having it changed unpredicatably
in the kernel-side counterparts seems to make for a nasty trap.  Certainly
makes for a bunch of nasty comments in the code using those...

Comments?

  parent reply	other threads:[~2014-11-04  5:45 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-02 23:05 fs: Use non-const iov in aio_read/aio_write Herbert Xu
2014-11-03  0:16 ` Al Viro
2014-11-03  0:21   ` Al Viro
2014-11-03  0:22   ` Herbert Xu
2014-11-03  0:45     ` Al Viro
2014-11-03  5:37       ` [0/3] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-03  5:44         ` [PATCH 1/3] tun: Modify const aio_read iovec per do_sock_read Herbert Xu
2014-11-03  5:44         ` [PATCH 3/3] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-03  5:44         ` [PATCH 2/3] macvtap: Modify const aio_read iovec per do_sock_read Herbert Xu
2014-11-03 20:05         ` [0/3] net: Kill skb_copy_datagram_const_iovec David Miller
2014-11-04  3:38           ` Herbert Xu
2014-11-04  8:31             ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-04  8:37               ` Herbert Xu
2014-11-05  2:49                 ` YOSHIFUJI Hideaki
2014-11-05  3:41                   ` Herbert Xu
2014-11-04  8:31             ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-04 14:32               ` Al Viro
2014-11-04 14:35                 ` Al Viro
2014-11-04 14:44                   ` Herbert Xu
2014-11-04 14:52                     ` Al Viro
2014-11-04 14:55                       ` Herbert Xu
2014-11-04 14:42                 ` Herbert Xu
2014-11-04 15:13                   ` Al Viro
2014-11-05  2:22                     ` Herbert Xu
2014-11-05  3:27                       ` David Miller
2014-11-05  3:55                         ` Al Viro
2014-11-05  4:12                           ` Al Viro
2014-11-05 20:51                             ` David Miller
2014-11-05 20:50                           ` David Miller
2014-11-05 21:07                             ` Al Viro
2014-11-05 21:57                               ` David Miller
2014-11-06  3:25                                 ` Al Viro
2014-11-06  5:50                                   ` ipv4: Use standard iovec primitive in raw_probe_proto_opt Herbert Xu
2014-11-06  6:43                                     ` Al Viro
2014-11-06  6:46                                       ` Herbert Xu
2014-11-06  7:11                                         ` Al Viro
2014-11-06  9:55                                           ` Jon Maloy
2014-11-06 22:16                                             ` Al Viro
2014-11-28  5:14                                               ` Al Viro
2014-11-06 21:28                                         ` David Miller
2014-11-07  2:00                                           ` Herbert Xu
2014-11-07 13:25                                             ` [PATCH 0/2] ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice Herbert Xu
2014-11-07 13:27                                               ` [PATCH 1/2] ipv4: Use standard iovec primitive in raw_probe_proto_opt Herbert Xu
2014-11-07 13:27                                               ` [PATCH 2/2] ipv4: Avoid reading user iov twice after raw_probe_proto_opt Herbert Xu
2014-11-10 19:26                                               ` [PATCH 0/2] ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice David Miller
2014-11-06  9:50                                   ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Jon Maloy
2014-11-07 21:48                                   ` David Miller
2014-11-07 22:11                                     ` Al Viro
2014-11-07 22:31                                       ` Al Viro
2014-11-07 22:35                                         ` Al Viro
2014-11-07 23:42                                       ` Al Viro
2014-11-08  2:21                                         ` Herbert Xu
2014-11-09 21:19                                         ` Al Viro
2014-11-10  5:20                                           ` David Miller
2014-11-10  6:58                                             ` Al Viro
2014-11-10  7:30                                               ` David Miller
2014-11-10  9:09                                                 ` Al Viro
2014-11-10 16:18                                                   ` David Miller
2014-11-10 10:14                                           ` Michael S. Tsirkin
2014-11-07 21:52                                   ` David Miller
2014-11-05 20:24               ` David Miller
2014-11-06  8:23                 ` Herbert Xu
2014-11-06 17:25                   ` David Miller
2014-11-07  1:59                     ` Herbert Xu
2014-11-07  3:13                       ` David Miller
2014-11-07 13:21                         ` [PATCH 0/4] Replace skb_copy_datagram_const_iovec with iterator version Herbert Xu
2014-11-07 13:22                           ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-07 13:22                           ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-07 13:22                           ` [PATCH 3/4] macvtap: " Herbert Xu
2014-11-07 13:22                           ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-06  8:27                 ` [PATCH 0/4] Replace skb_copy_datagram_const_iovec with iterator version Herbert Xu
2014-11-06  8:28                   ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-06 17:30                     ` Al Viro
2014-11-07  1:58                       ` Herbert Xu
2014-11-06  8:28                   ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-06  8:28                   ` [PATCH 3/4] macvtap: " Herbert Xu
2014-11-06 17:33                     ` Al Viro
2014-11-06  8:28                   ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-04  8:31             ` [PATCH 3/4] macvtap: Use iovec iterators Herbert Xu
2014-11-04  8:31             ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-04  5:45           ` Al Viro [this message]
2014-11-05  1:53             ` [0/3] " Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141104054513.GB7996@ZenIV.linux.org.uk \
    --to=viro@zeniv$(echo .)linux.org.uk \
    --cc=bcrl@kvack$(echo .)org \
    --cc=davem@davemloft$(echo .)net \
    --cc=herbert@gondor$(echo .)apana.org.au \
    --cc=linux-kernel@vger$(echo .)kernel.org \
    --cc=netdev@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox