From: simon@hollie•ento.csiro.au (Simon Fowler)
To: Chris Mason <mason@suse•com>
Cc: Linus Torvalds <torvalds@osdl•org>, git@vger•kernel.org
Subject: Re: Finding file revisions
Date: Thu, 28 Apr 2005 18:41:57 +1000 [thread overview]
Message-ID: <20050428084156.GK17682@himi.org> (raw)
In-Reply-To: <200504271831.47830.mason@suse.com>
[-- Attachment #1.1: Type: text/plain, Size: 1826 bytes --]
On Wed, Apr 27, 2005 at 06:31:47PM -0400, Chris Mason wrote:
> On Wednesday 27 April 2005 18:19, Linus Torvalds wrote:
> > On Wed, 27 Apr 2005, Chris Mason wrote:
> > > So, new prog attached. New usage:
> > >
> > > file-changes [-c commit_id] [-s commit_id] file ...
> > >
> > > -c is the commit where you want to start searching
> > > -s is the commit where you want to stop searching
> >
> > Your script will do some funky stuff, because you incorrectly think that
> > the rev-list is sorted linearly. It's not. It's sorted in a rough
> > chronological order, but you really can't do the "last" vs "cur" thing
> > that you do, because two commits after each other in the rev-list listing
> > may well be from two totally different branches, so when you compare one
> > tree against the other, you're really doing something pretty nonsensical.
>
> Aha, didn't realize that one. Thanks, I'll rework things here.
>
I've got a version of this written in C that I've been working on
for a bit - some example output:
+040000 tree bfb75011c32589b282dd9c86621dadb0f0bb3866 ppc
+100644 blob 5ba4fc5259b063dab6417c142938d987ee894fc0 ppc/sha1.c
+100644 blob c3c51aa4d487f2e85c02b0257c1f0b57d6158d76 ppc/sha1.h
+100644 blob e85611a4ef0598f45911357d0d2f1fc354039de4 ppc/sha1ppc.S
commit b5af9107270171b79d46b099ee0b198e653f3a24->a6ef3518f9ac8a1c46a36c8d27173b1f73d839c4
You run it as:
find-changes commit_id file_prefix ...
The file_prefix is a path prefix to match - it's not as flexible as
regexes, but it shouldn't be too much less useful.
Simon
--
PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc
(crappy) Homepage: http://himi.org
doe #237 (see http://www.lemuria.org/DeCSS)
My DeCSS mirror: ftp://himi.org/pub/mirrors/css/
[-- Attachment #1.2: find-changes.diff --]
[-- Type: text/plain, Size: 8905 bytes --]
Find commits that changed files matching the prefix given on the command line.
Signed-off-by: Simon Fowler <simon@dreamcraft•com.au>
---
Index: Makefile
===================================================================
--- c3aa1e6b53cc59d5fbe261f3f859584904ae3a63/Makefile (mode:100644 sha1:d73bea1cbb9451a89b03d6066bf2ed7fec32fd31)
+++ uncommitted/Makefile (mode:100644)
@@ -38,7 +38,7 @@
cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
check-files ls-tree merge-base merge-cache unpack-file git-export \
diff-cache convert-cache http-pull rpush rpull rev-list git-mktag \
- diff-tree-helper
+ diff-tree-helper find-changes
SCRIPT= commit-id tree-id parent-id cg-Xdiffdo cg-Xmergefile \
cg-add cg-admin-lsobj cg-cancel cg-clone cg-commit cg-diff \
Index: find-changes.c
===================================================================
--- /dev/null (tree:c3aa1e6b53cc59d5fbe261f3f859584904ae3a63)
+++ uncommitted/find-changes.c (mode:100644 sha1:64c0c3627d84969ee1596b05f97705455fba1871)
@@ -0,0 +1,279 @@
+/*
+ * find-changes.c - find the commits that changed a particular file.
+ */
+
+#include "cache.h"
+//#include "revision.h"
+#include "commit.h"
+#include <sys/param.h>
+
+/*
+ * This is a simple tool that walks through the revisions cache and
+ * checks the parent-child diffs to see if they include the given
+ * filename.
+ */
+
+static int recursive = 1;
+static int found = 0;
+
+static char *malloc_base(const char *base, const char *path, int pathlen)
+{
+ int baselen = strlen(base);
+ char *newbase = malloc(baselen + pathlen + 2);
+ memcpy(newbase, base, baselen);
+ memcpy(newbase + baselen, path, pathlen);
+ memcpy(newbase + baselen + pathlen, "/", 2);
+ return newbase;
+}
+
+static void update_tree_entry(void **bufp, unsigned long *sizep)
+{
+ void *buf = *bufp;
+ unsigned long size = *sizep;
+ int len = strlen(buf) + 1 + 20;
+
+ if (size < len)
+ die("corrupt tree file");
+ *bufp = buf + len;
+ *sizep = size - len;
+}
+
+static const unsigned char *extract(void *tree, unsigned long size, const char **pathp, unsigned int *modep)
+{
+ int len = strlen(tree)+1;
+ const unsigned char *sha1 = tree + len;
+ const char *path = strchr(tree, ' ');
+
+ if (!path || size < len + 20 || sscanf(tree, "%o", modep) != 1)
+ die("corrupt tree file");
+ *pathp = path+1;
+ return sha1;
+}
+
+static int check_file(void *tree, unsigned long size, const char *base, const char *target);
+
+/* A whole sub-tree went away or appeared */
+static int check_tree(void *tree, unsigned long size, const char *base, const char *target)
+{
+ int retval = 0;
+
+ while (size && !retval) {
+ retval = check_file(tree, size, base, target);
+ update_tree_entry(&tree, &size);
+ }
+ return retval;
+}
+
+/* A file entry went away or appeared.
+ * Check the entire subtree under this, and long_jmp() back to the parse_diffs()
+ * function if we find the target. */
+static int check_file(void *tree, unsigned long size, const char *base, const char *target)
+{
+ unsigned mode;
+ const char *path;
+ char full_path[MAXPATHLEN + 1];
+ int pathlen, retval;
+ const unsigned char *sha1 = extract(tree, size, &path, &mode);
+
+ pathlen = snprintf(full_path, MAXPATHLEN, "%s%s", base, path);
+ if (!cache_name_compare(full_path, pathlen, target, strlen(target)))
+ found = 1;
+
+ if (recursive && S_ISDIR(mode)) {
+ char type[20];
+ unsigned long size;
+ char *newbase = malloc_base(base, path, strlen(path));
+ void *tree;
+
+ tree = read_sha1_file(sha1, type, &size);
+ if (!tree || strcmp(type, "tree"))
+ die("corrupt tree sha %s", sha1_to_hex(sha1));
+
+ retval = check_tree(tree, size, newbase, target);
+
+ free(tree);
+ free(newbase);
+ return retval;
+ }
+ return 0;
+}
+
+static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base, const char *target);
+
+/* the diff-tree algorithm depends on compare_tree_entry returning basically
+ * the same thing that memcmp() would on the filenames - this is important
+ * because the directories are sorted, and hence you need to decide what */
+static int compare_tree_entry(void *tree1, unsigned long size1,
+ void *tree2, unsigned long size2,
+ const char *base, const char *target)
+{
+ unsigned mode1, mode2;
+ const char *path1, *path2;
+ const unsigned char *sha1, *sha2;
+ int cmp, pathlen1, pathlen2;
+
+ if (found)
+ return 0;
+
+ sha1 = extract(tree1, size1, &path1, &mode1);
+ sha2 = extract(tree2, size2, &path2, &mode2);
+
+ pathlen1 = strlen(path1);
+ pathlen2 = strlen(path2);
+ cmp = cache_name_compare(path1, pathlen1, path2, pathlen2);
+ /* these files are different - if this is a directory then the
+ * contents of the subtree are all different. So, we need to
+ * run over the subtree and see if our target is in there
+ * . . . */
+ if (cmp) {
+ check_file(tree1, size1, base, target);
+ check_file(tree2, size2, base, target);
+ return cmp;
+ }
+
+ if (!memcmp(sha1, sha2, 20) && mode1 == mode2)
+ return 0;
+
+ /*
+ * If the filemode has changed to/from a directory from/to a regular
+ * file, we need to consider it a remove and an add.
+ */
+ if (S_ISDIR(mode1) != S_ISDIR(mode2)) {
+ check_file(tree1, size1, base, target);
+ check_file(tree2, size2, base, target);
+ return 0;
+ }
+
+ if (recursive && S_ISDIR(mode1)) {
+ int retval;
+ char *newbase = malloc_base(base, path1, pathlen1);
+ retval = diff_tree_sha1(sha1, sha2, newbase, target);
+ free(newbase);
+ return retval;
+ }
+
+ check_file(tree1, size1, base, target);
+ check_file(tree2, size2, base, target);
+ return 0;
+}
+
+static int diff_tree(void *tree1, unsigned long size1, void *tree2, unsigned long size2,
+ const char *base, const char *target)
+{
+ while (size1 | size2) {
+ if (!size1) {
+ check_file(tree2, size2, base, target);
+ update_tree_entry(&tree2, &size2);
+ continue;
+ }
+ if (!size2) {
+ check_file(tree1, size1, base, target);
+ update_tree_entry(&tree1, &size1);
+ continue;
+ }
+ switch (compare_tree_entry(tree1, size1, tree2, size2, base, target)) {
+ case -1:
+ update_tree_entry(&tree1, &size1);
+ continue;
+ case 0:
+ update_tree_entry(&tree1, &size1);
+ /* Fallthrough */
+ case 1:
+ update_tree_entry(&tree2, &size2);
+ continue;
+ }
+ die("diff-tree: internal error");
+ }
+ return 0;
+}
+
+static int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base,
+ const char *target)
+{
+ void *tree1, *tree2;
+ unsigned long size1, size2;
+ char type[20];
+ int retval;
+
+ tree1 = read_sha1_file(old, type, &size1);
+ if (!tree1 || strcmp(type, "tree"))
+ die("unable to read source tree %s", sha1_to_hex(old));
+ tree2 = read_sha1_file(new, type, &size2);
+ if (!tree2 || strcmp(type, "tree"))
+ die("unable to read destination tree %s", sha1_to_hex(new));
+ retval = diff_tree(tree1, size1, tree2, size2, base, target);
+ free(tree1);
+ free(tree2);
+ return retval;
+}
+
+static int process_diffs(struct commit *parent, struct commit *commit, const char *target)
+{
+ found = 0;
+ diff_tree_sha1(parent->tree->object.sha1, commit->tree->object.sha1, "", target);
+ if (found)
+ printf("%s\n", sha1_to_hex(commit->object.sha1));
+ return 0;
+}
+
+/*
+ * Walk the set of parents, and collect a list of the objects.
+ */
+void process_commit(struct commit *item)
+{
+ struct commit_list *parents;
+
+ if (parse_commit(item))
+ die("unable to parse commit %s", sha1_to_hex(item->object.sha1));
+
+ parents = item->parents;
+ while (parents) {
+ process_commit(parents->item);
+ parents = parents->next;
+ }
+}
+
+/*
+ * Usage: find-changes <parent-id> <filename>
+ *
+ * Note that this code will find the commits that change the given
+ * file in the set of commits that are parents of the one given on the
+ * command line.
+ */
+
+int main(int argc, char **argv)
+{
+ int i;
+ char sha1[20];
+ struct commit *orig;
+
+ if (argc != 3)
+ usage("find-changes <parent-id> <filename>");
+
+ get_sha1_hex(argv[1], sha1);
+ orig = lookup_commit(sha1);
+ process_commit(orig);
+ mark_reachable(&lookup_commit(argv[1])->object, 1);
+
+ /* this code needs to use tree.c to do most of the work - this
+ * will simplify things a lot.
+ * XXX: rewrite diff-tree.c to do the same. */
+
+ for (i = 0; i < nr_objs; i++) {
+ struct object *obj = objs[i];
+ struct commit *commit;
+ struct commit_list *p;
+
+ if (obj->type != commit_type)
+ continue;
+
+ commit = (struct commit *) obj;
+
+ p = commit->parents;
+ while (p) {
+ process_diffs(p->item, commit, argv[2]);
+ p = p->next;
+ }
+ }
+ return 0;
+}
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2005-04-28 8:51 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-27 16:50 Finding file revisions Chris Mason
2005-04-27 17:34 ` Linus Torvalds
2005-04-27 18:23 ` Chris Mason
2005-04-27 22:19 ` Linus Torvalds
2005-04-27 22:31 ` Chris Mason
2005-04-28 8:41 ` Simon Fowler [this message]
2005-04-28 11:56 ` Chris Mason
2005-04-28 13:13 ` Simon Fowler
2005-04-28 11:45 ` Chris Mason
2005-04-28 16:34 ` Kay Sievers
2005-04-28 17:10 ` Tony Luck
2005-04-28 17:22 ` Thomas Glanzmann
2005-04-28 19:11 ` Kay Sievers
2005-04-28 20:58 ` Chris Mason
2005-04-28 21:32 ` Linus Torvalds
2005-04-28 21:33 ` Kay Sievers
2005-04-28 21:50 ` Linus Torvalds
2005-04-28 22:27 ` Chris Mason
2005-04-28 13:09 ` David Woodhouse
2005-04-28 13:01 ` David Woodhouse
2005-04-27 18:41 ` Thomas Gleixner
2005-04-28 15:24 ` Linus Torvalds
2005-04-28 16:47 ` Thomas Gleixner
2005-04-28 16:08 ` Daniel Barkalow
2005-04-28 17:05 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050428084156.GK17682@himi.org \
--to=simon@hollie$(echo .)ento.csiro.au \
--cc=git@vger$(echo .)kernel.org \
--cc=mason@suse$(echo .)com \
--cc=simon@dreamcraft$(echo .)com.au \
--cc=torvalds@osdl$(echo .)org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox