public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: simon@hollie•ento.csiro.au (Simon Fowler)
To: Chris Mason <mason@suse•com>
Cc: Linus Torvalds <torvalds@osdl•org>, git@vger•kernel.org
Subject: Re: Finding file revisions
Date: Thu, 28 Apr 2005 18:41:57 +1000	[thread overview]
Message-ID: <20050428084156.GK17682@himi.org> (raw)
In-Reply-To: <200504271831.47830.mason@suse.com>


[-- Attachment #1.1: Type: text/plain, Size: 1826 bytes --]

On Wed, Apr 27, 2005 at 06:31:47PM -0400, Chris Mason wrote:
> On Wednesday 27 April 2005 18:19, Linus Torvalds wrote:
> > On Wed, 27 Apr 2005, Chris Mason wrote:
> > > So, new prog attached.  New usage:
> > >
> > > file-changes [-c commit_id] [-s commit_id] file ...
> > >
> > > -c is the commit where you want to start searching
> > > -s is the commit where you want to stop searching
> >
> > Your script will do some funky stuff, because you incorrectly think that
> > the rev-list is sorted linearly. It's not. It's sorted in a rough
> > chronological order, but you really can't do the "last" vs "cur" thing
> > that you do, because two commits after each other in the rev-list listing
> > may well be from two totally different branches, so when you compare one
> > tree against the other, you're really doing something pretty nonsensical.
> 
> Aha, didn't realize that one.  Thanks, I'll rework things here.
> 
I've got a version of this written in C that I've been working on
for a bit - some example output:

+040000 tree    bfb75011c32589b282dd9c86621dadb0f0bb3866        ppc
+100644 blob    5ba4fc5259b063dab6417c142938d987ee894fc0        ppc/sha1.c
+100644 blob    c3c51aa4d487f2e85c02b0257c1f0b57d6158d76        ppc/sha1.h
+100644 blob    e85611a4ef0598f45911357d0d2f1fc354039de4        ppc/sha1ppc.S
commit b5af9107270171b79d46b099ee0b198e653f3a24->a6ef3518f9ac8a1c46a36c8d27173b1f73d839c4

You run it as:
find-changes commit_id file_prefix ...

The file_prefix is a path prefix to match - it's not as flexible as
regexes, but it shouldn't be too much less useful.

Simon

-- 
PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc
(crappy) Homepage: http://himi.org
doe #237 (see http://www.lemuria.org/DeCSS) 
My DeCSS mirror: ftp://himi.org/pub/mirrors/css/ 

[-- Attachment #1.2: find-changes.diff --]
[-- Type: text/plain, Size: 8905 bytes --]

Find commits that changed files matching the prefix given on the command line.

Signed-off-by: Simon Fowler <simon@dreamcraft•com.au>
---

Index: Makefile
===================================================================
--- c3aa1e6b53cc59d5fbe261f3f859584904ae3a63/Makefile  (mode:100644 sha1:d73bea1cbb9451a89b03d6066bf2ed7fec32fd31)
+++ uncommitted/Makefile  (mode:100644)
@@ -38,7 +38,7 @@
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
 	check-files ls-tree merge-base merge-cache unpack-file git-export \
 	diff-cache convert-cache http-pull rpush rpull rev-list git-mktag \
-	diff-tree-helper
+	diff-tree-helper find-changes
 
 SCRIPT=	commit-id tree-id parent-id cg-Xdiffdo cg-Xmergefile \
 	cg-add cg-admin-lsobj cg-cancel cg-clone cg-commit cg-diff \
Index: find-changes.c
===================================================================
--- /dev/null  (tree:c3aa1e6b53cc59d5fbe261f3f859584904ae3a63)
+++ uncommitted/find-changes.c  (mode:100644 sha1:64c0c3627d84969ee1596b05f97705455fba1871)
@@ -0,0 +1,279 @@
+/*
+ * find-changes.c - find the commits that changed a particular file.
+ */
+
+#include "cache.h"
+//#include "revision.h"
+#include "commit.h"
+#include <sys/param.h>
+
+/* 
+ * This is a simple tool that walks through the revisions cache and
+ * checks the parent-child diffs to see if they include the given
+ * filename. 
+ */
+
+static int recursive = 1;
+static int found = 0;
+
+static char *malloc_base(const char *base, const char *path, int pathlen)
+{
+	int baselen = strlen(base);
+	char *newbase = malloc(baselen + pathlen + 2);
+	memcpy(newbase, base, baselen);
+	memcpy(newbase + baselen, path, pathlen);
+	memcpy(newbase + baselen + pathlen, "/", 2);
+	return newbase;
+}
+
+static void update_tree_entry(void **bufp, unsigned long *sizep)
+{
+	void *buf = *bufp;
+	unsigned long size = *sizep;
+	int len = strlen(buf) + 1 + 20;
+
+	if (size < len)
+		die("corrupt tree file");
+	*bufp = buf + len;
+	*sizep = size - len;
+}
+
+static const unsigned char *extract(void *tree, unsigned long size, const char **pathp, unsigned int *modep)
+{
+	int len = strlen(tree)+1;
+	const unsigned char *sha1 = tree + len;
+	const char *path = strchr(tree, ' ');
+
+	if (!path || size < len + 20 || sscanf(tree, "%o", modep) != 1)
+		die("corrupt tree file");
+	*pathp = path+1;
+	return sha1;
+}
+
+static int check_file(void *tree, unsigned long size, const char *base, const char *target);
+
+/* A whole sub-tree went away or appeared */
+static int check_tree(void *tree, unsigned long size, const char *base, const char *target)
+{
+	int retval = 0;
+
+	while (size && !retval) {
+		retval = check_file(tree, size, base, target);
+		update_tree_entry(&tree, &size);
+	}
+	return retval;
+}
+
+/* A file entry went away or appeared.
+ * Check the entire subtree under this, and long_jmp() back to the parse_diffs()
+ * function if we find the target. */
+static int check_file(void *tree, unsigned long size, const char *base, const char *target)
+{
+	unsigned mode;
+	const char *path;
+	char full_path[MAXPATHLEN + 1];
+	int pathlen, retval;
+	const unsigned char *sha1 = extract(tree, size, &path, &mode);
+
+	pathlen = snprintf(full_path, MAXPATHLEN, "%s%s", base, path);
+	if (!cache_name_compare(full_path, pathlen, target, strlen(target)))
+		found = 1;
+
+	if (recursive && S_ISDIR(mode)) {
+		char type[20];
+		unsigned long size;
+		char *newbase = malloc_base(base, path, strlen(path));
+		void *tree;
+
+		tree = read_sha1_file(sha1, type, &size);
+		if (!tree || strcmp(type, "tree"))
+			die("corrupt tree sha %s", sha1_to_hex(sha1));
+
+		retval = check_tree(tree, size, newbase, target);
+		
+		free(tree);
+		free(newbase);
+		return retval;
+	}
+	return 0;
+}
+	
+static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base, const char *target);
+
+/* the diff-tree algorithm depends on compare_tree_entry returning basically
+ * the same thing that memcmp() would on the filenames - this is important
+ * because the directories are sorted, and hence you need to decide what */
+static int compare_tree_entry(void *tree1, unsigned long size1, 
+			      void *tree2, unsigned long size2, 
+			      const char *base, const char *target)
+{
+	unsigned mode1, mode2;
+	const char *path1, *path2;
+	const unsigned char *sha1, *sha2;
+	int cmp, pathlen1, pathlen2;
+
+	if (found)
+		return 0;
+
+	sha1 = extract(tree1, size1, &path1, &mode1);
+	sha2 = extract(tree2, size2, &path2, &mode2);
+
+	pathlen1 = strlen(path1);
+	pathlen2 = strlen(path2);
+	cmp = cache_name_compare(path1, pathlen1, path2, pathlen2);
+	/* these files are different - if this is a directory then the
+	 * contents of the subtree are all different. So, we need to
+	 * run over the subtree and see if our target is in there
+	 * . . . */
+	if (cmp) {
+		check_file(tree1, size1, base, target);
+		check_file(tree2, size2, base, target);
+		return cmp;
+	}
+
+	if (!memcmp(sha1, sha2, 20) && mode1 == mode2)
+		return 0;
+
+	/*
+	 * If the filemode has changed to/from a directory from/to a regular
+	 * file, we need to consider it a remove and an add.
+	 */
+	if (S_ISDIR(mode1) != S_ISDIR(mode2)) {
+		check_file(tree1, size1, base, target);
+		check_file(tree2, size2, base, target);
+		return 0;
+	}
+
+	if (recursive && S_ISDIR(mode1)) {
+		int retval;
+		char *newbase = malloc_base(base, path1, pathlen1);
+		retval = diff_tree_sha1(sha1, sha2, newbase, target);
+		free(newbase);
+		return retval;
+	}
+	
+	check_file(tree1, size1, base, target);
+	check_file(tree2, size2, base, target);
+	return 0;
+}
+
+static int diff_tree(void *tree1, unsigned long size1, void *tree2, unsigned long size2, 
+		     const char *base, const char *target)
+{
+	while (size1 | size2) {
+		if (!size1) {
+			check_file(tree2, size2, base, target);
+			update_tree_entry(&tree2, &size2);
+			continue;
+		}
+		if (!size2) {
+			check_file(tree1, size1, base, target);
+			update_tree_entry(&tree1, &size1);
+			continue;
+		}
+		switch (compare_tree_entry(tree1, size1, tree2, size2, base, target)) {
+		case -1:
+			update_tree_entry(&tree1, &size1);
+			continue;
+		case 0:
+			update_tree_entry(&tree1, &size1);
+			/* Fallthrough */
+		case 1:
+			update_tree_entry(&tree2, &size2);
+			continue;
+		}
+		die("diff-tree: internal error");
+	}
+	return 0;
+}
+
+static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base,
+			  const char *target)
+{
+	void *tree1, *tree2;
+	unsigned long size1, size2;
+	char type[20];
+	int retval;
+
+	tree1 = read_sha1_file(old, type, &size1);
+	if (!tree1 || strcmp(type, "tree"))
+		die("unable to read source tree %s", sha1_to_hex(old));
+	tree2 = read_sha1_file(new, type, &size2);
+	if (!tree2 || strcmp(type, "tree"))
+		die("unable to read destination tree %s", sha1_to_hex(new));
+	retval = diff_tree(tree1, size1, tree2, size2, base, target);
+	free(tree1);
+	free(tree2);
+	return retval;
+}
+
+static int process_diffs(struct commit *parent, struct commit *commit, const char *target)
+{
+	found = 0;
+	diff_tree_sha1(parent->tree->object.sha1, commit->tree->object.sha1, "", target);
+	if (found)
+		printf("%s\n", sha1_to_hex(commit->object.sha1));
+	return 0;
+}
+
+/*
+ * Walk the set of parents, and collect a list of the objects. 
+ */
+void process_commit(struct commit *item)
+{
+	struct commit_list *parents;
+
+	if (parse_commit(item))
+		die("unable to parse commit %s", sha1_to_hex(item->object.sha1));
+	
+	parents = item->parents;
+	while (parents) {
+		process_commit(parents->item);
+		parents = parents->next;
+	}
+}
+
+/*
+ * Usage: find-changes <parent-id> <filename>
+ *
+ * Note that this code will find the commits that change the given
+ * file in the set of commits that are parents of the one given on the
+ * command line.
+ */ 
+
+int main(int argc, char **argv)
+{
+	int i;
+	char sha1[20];
+	struct commit *orig;
+
+	if (argc != 3) 
+		usage("find-changes <parent-id> <filename>");
+		
+	get_sha1_hex(argv[1], sha1);
+	orig = lookup_commit(sha1);
+	process_commit(orig);
+	mark_reachable(&lookup_commit(argv[1])->object, 1);
+
+	/* this code needs to use tree.c to do most of the work - this
+	 * will simplify things a lot. 
+	 * XXX: rewrite diff-tree.c to do the same. */
+	
+	for (i = 0; i < nr_objs; i++) {
+		struct object *obj = objs[i];
+		struct commit *commit;
+		struct commit_list *p;
+
+		if (obj->type != commit_type)
+			continue;
+
+		commit = (struct commit *) obj;
+
+		p = commit->parents;
+		while (p) {
+			process_diffs(p->item, commit, argv[2]);
+			p = p->next;
+		}
+	}
+	return 0;
+}

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2005-04-28  8:51 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-27 16:50 Finding file revisions Chris Mason
2005-04-27 17:34 ` Linus Torvalds
2005-04-27 18:23   ` Chris Mason
2005-04-27 22:19     ` Linus Torvalds
2005-04-27 22:31       ` Chris Mason
2005-04-28  8:41         ` Simon Fowler [this message]
2005-04-28 11:56           ` Chris Mason
2005-04-28 13:13             ` Simon Fowler
2005-04-28 11:45       ` Chris Mason
2005-04-28 16:34         ` Kay Sievers
2005-04-28 17:10           ` Tony Luck
2005-04-28 17:22             ` Thomas Glanzmann
2005-04-28 19:11         ` Kay Sievers
2005-04-28 20:58           ` Chris Mason
2005-04-28 21:32             ` Linus Torvalds
2005-04-28 21:33             ` Kay Sievers
2005-04-28 21:50               ` Linus Torvalds
2005-04-28 22:27               ` Chris Mason
2005-04-28 13:09       ` David Woodhouse
2005-04-28 13:01     ` David Woodhouse
2005-04-27 18:41   ` Thomas Gleixner
2005-04-28 15:24     ` Linus Torvalds
2005-04-28 16:47       ` Thomas Gleixner
2005-04-28 16:08 ` Daniel Barkalow
2005-04-28 17:05   ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050428084156.GK17682@himi.org \
    --to=simon@hollie$(echo .)ento.csiro.au \
    --cc=git@vger$(echo .)kernel.org \
    --cc=mason@suse$(echo .)com \
    --cc=simon@dreamcraft$(echo .)com.au \
    --cc=torvalds@osdl$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox