public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Pavel Emelyanov <xemul@parallels•com>
To: David Miller <davem@davemloft•net>,
	Eric Dumazet <eric.dumazet@gmail•com>
Cc: "tj@kernel•org" <tj@kernel•org>,
	"netdev@vger•kernel.org" <netdev@vger•kernel.org>
Subject: Re: [PATCH] datagram: Extend the datagram queue MSG_PEEK-ing
Date: Mon, 20 Feb 2012 16:17:11 +0400	[thread overview]
Message-ID: <4F4239C7.6040905@parallels.com> (raw)
In-Reply-To: <20120215.155226.1446930776120577759.davem@davemloft.net>

> I'm still thinking about how I feel about this.
> 
> But meanwhile you can fix some thing up, for example I really
> feel that you should make sure that multiple bits aren't set
> at the same time.
> 
> You either mean MSG_PEEK_MORE or MSG_PEEK_AGAIN, so setting both
> makes no sense and should be flagged as an error.

Actually Eric has pointed out the critical issue with this -- if a task
(whose socket's queue we're about to peek) is currently peek-ing this
queue himself, we will peek only the queue's tail and will potentially 
break this task's peeking in case he uses the _MORE/_AGAIN macros :(


I was thinking about another option of doing the same, how about introducing
the peek offset member on sock (get/set via sockopt) which works like

* if == -1, then peek works as before
* if >= 0, then each peek/recvmsg will increase/decrease the value so that
  the next peek peeks next data

It's questionable what to do if the peek_off points into the middle of a
datagram however. Here's an example of how this looks for datagram sockets
(tested on pf_unix), for stream this requires more patching.

What do you think? Does it make sense to go on with this making other 
->recvmsg handlers support peeking offset?

---

diff --git a/include/asm-generic/socket.h b/include/asm-generic/socket.h
index 49c1704..832c270 100644
--- a/include/asm-generic/socket.h
+++ b/include/asm-generic/socket.h
@@ -66,5 +66,6 @@
 #define SO_RXQ_OVFL             40
 
 #define SO_WIFI_STATUS		41
+#define SO_PEEK_OFF		42
 #define SCM_WIFI_STATUS	SO_WIFI_STATUS
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 91c1c8b..94b0372 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -357,6 +357,7 @@ struct sock {
 	struct page		*sk_sndmsg_page;
 	struct sk_buff		*sk_send_head;
 	__u32			sk_sndmsg_off;
+	__s32			sk_peek_off;
 	int			sk_write_pending;
 #ifdef CONFIG_SECURITY
 	void			*sk_security;
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 68bbf9f..78e5147 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -180,21 +180,37 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
 		 * However, this function was correct in any case. 8)
 		 */
 		unsigned long cpu_flags;
+		long skip;
 
 		spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
-		skb = skb_peek(&sk->sk_receive_queue);
-		if (skb) {
+		skip = sk->sk_peek_off;
+		skb_queue_walk(&sk->sk_receive_queue, skb) {
 			*peeked = skb->peeked;
 			if (flags & MSG_PEEK) {
 				skb->peeked = 1;
 				atomic_inc(&skb->users);
-			} else
+				if (skip >= skb->len) {
+					skip -= skb->len;
+					continue;
+				}
+
+				if (sk->sk_peek_off >= 0)
+					sk->sk_peek_off += (skb->len - skip);
+			} else {
 				__skb_unlink(skb, &sk->sk_receive_queue);
-		}
-		spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
 
-		if (skb)
+				if (sk->sk_peek_off >= 0) {
+					if (sk->sk_peek_off >= skb->len)
+						sk->sk_peek_off -= skb->len;
+					else
+						sk->sk_peek_off = 0;
+				}
+			}
+
+			spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
 			return skb;
+		}
+		spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
 
 		/* User doesn't want to wait */
 		error = -EAGAIN;
diff --git a/net/core/sock.c b/net/core/sock.c
index 02f8dfe..5a5d581 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -792,7 +792,9 @@ set_rcvbuf:
 	case SO_WIFI_STATUS:
 		sock_valbool_flag(sk, SOCK_WIFI_STATUS, valbool);
 		break;
-
+	case SO_PEEK_OFF:
+		sk->sk_peek_off = val;
+		break;
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -1017,6 +1019,9 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 	case SO_WIFI_STATUS:
 		v.val = !!sock_flag(sk, SOCK_WIFI_STATUS);
 		break;
+	case SO_PEEK_OFF:
+		v.val = sk->sk_peek_off;
+		break;
 
 	default:
 		return -ENOPROTOOPT;
@@ -2092,6 +2097,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 
 	sk->sk_sndmsg_page	=	NULL;
 	sk->sk_sndmsg_off	=	0;
+	sk->sk_peek_off		=	-1;
 
 	sk->sk_peer_pid 	=	NULL;
 	sk->sk_peer_cred	=	NULL;

  reply	other threads:[~2012-02-20 12:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-10 13:54 [PATCH] datagram: Extend the datagram queue MSG_PEEK-ing Pavel Emelyanov
2012-02-10 14:41 ` Eric Dumazet
2012-02-10 14:52   ` Pavel Emelyanov
2012-02-15 20:52 ` David Miller
2012-02-20 12:17   ` Pavel Emelyanov [this message]
2012-02-21  5:01     ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F4239C7.6040905@parallels.com \
    --to=xemul@parallels$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=eric.dumazet@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=tj@kernel$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox