public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Matheus Afonso Martins Moreira via GitGitGadget" <gitgitgadget@gmail•com>
To: git@vger•kernel.org
Cc: Matheus Moreira <matheus.a.m.moreira@gmail•com>,
	Matheus Afonso Martins Moreira <matheus@matheusmoreira•com>
Subject: [PATCH 02/13] urlmatch: define url_parse function
Date: Sun, 28 Apr 2024 22:30:50 +0000	[thread overview]
Message-ID: <13b81b8aa06cfd63a5fd9d1acbaf21a8b388ff47.1714343461.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1715.git.git.1714343461.gitgitgadget@gmail.com>

From: Matheus Afonso Martins Moreira <matheus@matheusmoreira•com>

Define general parsing function that supports all Git URLs
including scp style URLs such as hostname:~user/repo.
Has the same interface as the URL normalization function
and uses the same data structures, facilitating its use.
It's adapted from the algorithm used to process URLs in connect.c,
so it should support the same inputs.

Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira•com>
---
 urlmatch.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 urlmatch.h |  1 +
 2 files changed, 91 insertions(+)

diff --git a/urlmatch.c b/urlmatch.c
index 1d0254abacb..5a442e31fa2 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -3,6 +3,7 @@
 #include "hex-ll.h"
 #include "strbuf.h"
 #include "urlmatch.h"
+#include "url.h"
 
 #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
 #define URL_DIGIT "0123456789"
@@ -438,6 +439,95 @@ char *url_normalize(const char *url, struct url_info *out_info)
 	return url_normalize_1(url, out_info, 0);
 }
 
+enum protocol {
+	PROTO_UNKNOWN = 0,
+	PROTO_LOCAL,
+	PROTO_FILE,
+	PROTO_SSH,
+	PROTO_GIT,
+};
+
+static enum protocol url_get_protocol(const char *name, size_t n)
+{
+	if (!strncmp(name, "ssh", n))
+		return PROTO_SSH;
+	if (!strncmp(name, "git", n))
+		return PROTO_GIT;
+	if (!strncmp(name, "git+ssh", n)) /* deprecated - do not use */
+		return PROTO_SSH;
+	if (!strncmp(name, "ssh+git", n)) /* deprecated - do not use */
+		return PROTO_SSH;
+	if (!strncmp(name, "file", n))
+		return PROTO_FILE;
+	return PROTO_UNKNOWN;
+}
+
+char *url_parse(const char *url_orig, struct url_info *out_info)
+{
+	struct strbuf url;
+	char *host, *separator;
+	char *detached, *normalized;
+	enum protocol protocol = PROTO_LOCAL;
+	struct url_info local_info;
+	struct url_info *info = out_info? out_info : &local_info;
+	bool scp_syntax = false;
+
+	if (is_url(url_orig)) {
+		url_orig = url_decode(url_orig);
+	} else {
+		url_orig = xstrdup(url_orig);
+	}
+
+	strbuf_init(&url, strlen(url_orig) + sizeof("ssh://"));
+	strbuf_addstr(&url, url_orig);
+
+	host = strstr(url.buf, "://");
+	if (host) {
+		protocol = url_get_protocol(url.buf, host - url.buf);
+		host += 3;
+	} else {
+		if (!url_is_local_not_ssh(url.buf)) {
+			scp_syntax = true;
+			protocol = PROTO_SSH;
+			strbuf_insertstr(&url, 0, "ssh://");
+			host = url.buf + 6;
+		}
+	}
+
+	/* path starts after ':' in scp style SSH URLs */
+	if (scp_syntax) {
+		separator = strchr(host, ':');
+		if (separator) {
+			if (separator[1] == '/')
+				strbuf_remove(&url, separator - url.buf, 1);
+			else
+				*separator = '/';
+		}
+	}
+
+	detached = strbuf_detach(&url, NULL);
+	normalized = url_normalize(detached, info);
+	free(detached);
+
+	if (!normalized) {
+		return NULL;
+	}
+
+	/* point path to ~ for URL's like this:
+	 *
+	 *     ssh://host.xz/~user/repo
+	 *     git://host.xz/~user/repo
+	 *     host.xz:~user/repo
+	 *
+	 */
+	if (protocol == PROTO_GIT || protocol == PROTO_SSH) {
+		if (normalized[info->path_off + 1] == '~')
+			info->path_off++;
+	}
+
+	return normalized;
+}
+
 static size_t url_match_prefix(const char *url,
 			       const char *url_prefix,
 			       size_t url_prefix_len)
diff --git a/urlmatch.h b/urlmatch.h
index 5ba85cea139..6b3ce428582 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -35,6 +35,7 @@ struct url_info {
 };
 
 char *url_normalize(const char *, struct url_info *);
+char *url_parse(const char *, struct url_info *);
 
 struct urlmatch_item {
 	size_t hostmatch_len;
-- 
gitgitgadget


  parent reply	other threads:[~2024-04-28 22:31 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget [this message]
2024-05-01 22:18   ` [PATCH 02/13] urlmatch: define url_parse function Ghanshyam Thakkar
2024-05-02  4:02     ` Torsten Bögershausen
2024-04-28 22:30 ` [PATCH 03/13] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 04/13] url-parse: add URL parsing helper function Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 05/13] url-parse: enumerate possible URL components Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 06/13] url-parse: define component extraction helper fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 07/13] url-parse: define string to component converter fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 08/13] url-parse: define usage and options Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 09/13] url-parse: parse options given on the command line Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 10/13] url-parse: validate all given git URLs Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 11/13] url-parse: output URL components selected by user Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2024-04-30  7:37   ` Ghanshyam Thakkar
2024-04-28 22:31 ` [PATCH 13/13] tests: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
2024-04-29 22:04   ` Reply to community feedback Matheus Afonso Martins Moreira
2024-04-30  6:51     ` Torsten Bögershausen
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28   ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2026-05-03  3:49     ` [PATCH v3 0/8] builtin: implement, document and test url-parse Junio C Hamano
2026-05-03  4:29       ` Matheus Afonso Martins Moreira
2026-05-03 17:28     ` Torsten Bögershausen
2026-05-03 19:36       ` Matheus Afonso Martins Moreira
2026-05-12  3:50         ` Junio C Hamano
2026-05-12  8:57           ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13b81b8aa06cfd63a5fd9d1acbaf21a8b388ff47.1714343461.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=matheus.a.m.moreira@gmail$(echo .)com \
    --cc=matheus@matheusmoreira$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox