From: Al Viro <viro@ZenIV•linux.org.uk>
To: David Miller <davem@davemloft•net>
Cc: herbert@gondor•apana.org.au, netdev@vger•kernel.org,
linux-kernel@vger•kernel.org, bcrl@kvack•org
Subject: Re: [0/3] net: Kill skb_copy_datagram_const_iovec
Date: Tue, 4 Nov 2014 05:45:13 +0000 [thread overview]
Message-ID: <20141104054513.GB7996@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20141103.150553.633529234958324183.davem@davemloft.net>
On Mon, Nov 03, 2014 at 03:05:53PM -0500, David Miller wrote:
> I'll see if I can make some progress converting the networking over
> to iov_iter. It can't be that difficult... albeit perhaps a little
> time consuming.
FWIW, I have a queue that got started back in April; basically, the plan
of attack was
* separate kernel-side and userland msghdr.
* localize ->msg_iov uses - most of that gets taken care of by
several new helpers, as in
new helper: skb_copy_datagram_msg()
Absolute majority of skb_copy_datagram_iovec() callers (49 out of 56)
are passing it msg->msg_iov as iovec. Provide a trivial wrapper that
takes msg as argument instead of iovec.
and several like that (the numbers in the above are probably incorrect these
days - it was done more than half a year ago).
* switch kernel-side msghdr to iov_iter. That means diverging
layouts; it's really not hard, since we have copying of msghdr from
userland already localized. Initially - just a mechanical conversion
(i.e. direct uses of iov_iter fields instead of ->msg_iov/->msg_iovlen;
note that after the introduction of wrappers the number of such places
is very much reduced).
* start converting those relatively few places to iov_iter primitives.
And that's where it got stalled, since we have to deal with expectations
of callers. Syscall ones are trivial; that's not a problem. Unfortunately,
there are kernel_{send,recv}msg() users, and those do care about the state the
iovec is left in. Strictly speaking, the state of iovec after e.g.
->sendmsg() is undefined. And it's not just protocol-dependent - unless
I'm seriously misreading it, tcp_sendmsg() ends up modifying iovec in case
when it hits tcp_send_rcvq(), while in the normal case it leaves iovec
unmodified. So in general you need to feed ->{send,recv}msg() a throwaway copy
of iovec. Leads to wonders like
/* NB we can't trust socket ops to either consume our iovs
* or leave them alone. */
LASSERT (niov > 0);
for (nob = i = 0; i < niov; i++) {
scratchiov[i] = iov[i];
nob += scratchiov[i].iov_len;
}
LASSERT (nob <= conn->ksnc_rx_nob_wanted);
rc = kernel_recvmsg(conn->ksnc_sock, &msg,
(struct kvec *)scratchiov, niov, nob, MSG_DONTWAIT);
etc. However, there are places that don't bother and do this:
while (total_rx < data) {
rx_loop = kernel_recvmsg(conn->sock, &msg, iov_p, iov_len,
(data - total_rx), MSG_WAITALL);
if (rx_loop <= 0) {
pr_debug("rx_loop: %d total_rx: %d\n",
rx_loop, total_rx);
return rx_loop;
}
total_rx += rx_loop;
pr_debug("rx_loop: %d, total_rx: %d, data: %d\n",
rx_loop, total_rx, data);
}
(that's iscsit_do_rx_data()). Maybe it's a bug; maybe it's relying on
specific behaviour of the protocol known to be used - this code clearly
expects recvmsg to advance iovec, which seems to depend only on the
protocol. At the moment. In any case, it's very brittle...
Hell knows; I hadn't finished digging through that zoo - got sidetracked back
then. *IF* all such places either use a throwaway copy or assume that iovec
gets modified, we can do the following: switch the access to iovecs to
iov_iter primitives, with the first kind of callers creating a throwaway
iov_iter and the second just feeding the same iov_iter to e.g.
kernel_recvmsg(). iovec will remain constant, iov_iter will be advanced.
Moreover, in a lot of cases of first kind will be able to get rid of
throwaway iov_iter (and of manually advancing it), effectively converting
to the second one.
If we have places that currently rely on iovec remaining unchanged (i.e.
manually advancing it after kernel_{send,recv}msg()), the series will be
more painful ;-/ I very much hope that no such places exist...
FWIW, there is also a tactical question that needs to be dealt with. We
can, of course, start with renaming the "kernel-side" (i.e. post
copy_msghdr_from_user()/get_compat_msghdr()) to struct kmsghdr. OTOH,
that's a _lot_ of churn for very little reason - most of the instances
in the tree are of that kind. So I did it the other way round - introduced
struct user_msghdr (only in linux/socket.h; note that we do *not* have
struct msghdr in uabi/linux/socket.h, or anywhere else in uabi/*),
made the syscalls take pointers to it and (initially) rely upon the identical
layouts in copy_msghdr_from_user(); once we put iov_iter into kernel-side
msghdr, we'll just do it like get_compat_msghdr() does.
Is that acceptable? It would greatly reduce the amount of churn in net/* -
we don't need to pass iov_iter separately and most of the functions in
the middle of call chains are completely unchanged. Only the originators
of ->sendmsg()/->recvmsg() and the places doing actual data copying
need to be touched. OTOH, it makes for kernel struct msghdr looking
odd - instead of normal ->msg_iov and ->msg_iovlen it would have
->msg_iov_iter, with ->sendmsg()/->recvmsg() callers needing to set it
up... OTTH, the things *are* odd from userland programmer POV - sendmsg(2)
and recvmsg(2) leave the iovec unchanged, and having it changed unpredicatably
in the kernel-side counterparts seems to make for a nasty trap. Certainly
makes for a bunch of nasty comments in the code using those...
Comments?
next prev parent reply other threads:[~2014-11-04 5:45 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-02 23:05 fs: Use non-const iov in aio_read/aio_write Herbert Xu
2014-11-03 0:16 ` Al Viro
2014-11-03 0:21 ` Al Viro
2014-11-03 0:22 ` Herbert Xu
2014-11-03 0:45 ` Al Viro
2014-11-03 5:37 ` [0/3] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-03 5:44 ` [PATCH 1/3] tun: Modify const aio_read iovec per do_sock_read Herbert Xu
2014-11-03 5:44 ` [PATCH 3/3] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-03 5:44 ` [PATCH 2/3] macvtap: Modify const aio_read iovec per do_sock_read Herbert Xu
2014-11-03 20:05 ` [0/3] net: Kill skb_copy_datagram_const_iovec David Miller
2014-11-04 3:38 ` Herbert Xu
2014-11-04 8:31 ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-04 8:37 ` Herbert Xu
2014-11-05 2:49 ` YOSHIFUJI Hideaki
2014-11-05 3:41 ` Herbert Xu
2014-11-04 8:31 ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-04 14:32 ` Al Viro
2014-11-04 14:35 ` Al Viro
2014-11-04 14:44 ` Herbert Xu
2014-11-04 14:52 ` Al Viro
2014-11-04 14:55 ` Herbert Xu
2014-11-04 14:42 ` Herbert Xu
2014-11-04 15:13 ` Al Viro
2014-11-05 2:22 ` Herbert Xu
2014-11-05 3:27 ` David Miller
2014-11-05 3:55 ` Al Viro
2014-11-05 4:12 ` Al Viro
2014-11-05 20:51 ` David Miller
2014-11-05 20:50 ` David Miller
2014-11-05 21:07 ` Al Viro
2014-11-05 21:57 ` David Miller
2014-11-06 3:25 ` Al Viro
2014-11-06 5:50 ` ipv4: Use standard iovec primitive in raw_probe_proto_opt Herbert Xu
2014-11-06 6:43 ` Al Viro
2014-11-06 6:46 ` Herbert Xu
2014-11-06 7:11 ` Al Viro
2014-11-06 9:55 ` Jon Maloy
2014-11-06 22:16 ` Al Viro
2014-11-28 5:14 ` Al Viro
2014-11-06 21:28 ` David Miller
2014-11-07 2:00 ` Herbert Xu
2014-11-07 13:25 ` [PATCH 0/2] ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice Herbert Xu
2014-11-07 13:27 ` [PATCH 1/2] ipv4: Use standard iovec primitive in raw_probe_proto_opt Herbert Xu
2014-11-07 13:27 ` [PATCH 2/2] ipv4: Avoid reading user iov twice after raw_probe_proto_opt Herbert Xu
2014-11-10 19:26 ` [PATCH 0/2] ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice David Miller
2014-11-06 9:50 ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Jon Maloy
2014-11-07 21:48 ` David Miller
2014-11-07 22:11 ` Al Viro
2014-11-07 22:31 ` Al Viro
2014-11-07 22:35 ` Al Viro
2014-11-07 23:42 ` Al Viro
2014-11-08 2:21 ` Herbert Xu
2014-11-09 21:19 ` Al Viro
2014-11-10 5:20 ` David Miller
2014-11-10 6:58 ` Al Viro
2014-11-10 7:30 ` David Miller
2014-11-10 9:09 ` Al Viro
2014-11-10 16:18 ` David Miller
2014-11-10 10:14 ` Michael S. Tsirkin
2014-11-07 21:52 ` David Miller
2014-11-05 20:24 ` David Miller
2014-11-06 8:23 ` Herbert Xu
2014-11-06 17:25 ` David Miller
2014-11-07 1:59 ` Herbert Xu
2014-11-07 3:13 ` David Miller
2014-11-07 13:21 ` [PATCH 0/4] Replace skb_copy_datagram_const_iovec with iterator version Herbert Xu
2014-11-07 13:22 ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-07 13:22 ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-07 13:22 ` [PATCH 3/4] macvtap: " Herbert Xu
2014-11-07 13:22 ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-06 8:27 ` [PATCH 0/4] Replace skb_copy_datagram_const_iovec with iterator version Herbert Xu
2014-11-06 8:28 ` [PATCH 1/4] inet: Add skb_copy_datagram_iter Herbert Xu
2014-11-06 17:30 ` Al Viro
2014-11-07 1:58 ` Herbert Xu
2014-11-06 8:28 ` [PATCH 2/4] tun: Use iovec iterators Herbert Xu
2014-11-06 8:28 ` [PATCH 3/4] macvtap: " Herbert Xu
2014-11-06 17:33 ` Al Viro
2014-11-06 8:28 ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-04 8:31 ` [PATCH 3/4] macvtap: Use iovec iterators Herbert Xu
2014-11-04 8:31 ` [PATCH 4/4] net: Kill skb_copy_datagram_const_iovec Herbert Xu
2014-11-04 5:45 ` Al Viro [this message]
2014-11-05 1:53 ` [0/3] " Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141104054513.GB7996@ZenIV.linux.org.uk \
--to=viro@zeniv$(echo .)linux.org.uk \
--cc=bcrl@kvack$(echo .)org \
--cc=davem@davemloft$(echo .)net \
--cc=herbert@gondor$(echo .)apana.org.au \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=netdev@vger$(echo .)kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox