On 2025-10-03 at 20:48:40, Elijah Newren wrote:
> Would this mean that you wanted to ban contributions like d12166d3c8bb
> (Merge branch 'en/docfixes', 2023-10-23), available on the list over
> at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/
> ?   We don't need to go theoretical, I've already contributed such a
> patch series before -- 2 years ago -- and it was merged.  Granted,
> that was entirely documentation, and I called out the usage of AI in
> the cover letter, and I manually checked every change (discarding many
> of them) and split it into commits on my own, could easily explain any
> change and why it was good, etc.  And I was upfront about all of it.

I think the main problem here is that we don't know the copyright
status of LLM outputs.  It is not uncommon for them to produce output
that reflects their training input and we see evidence of that in, for
instance, the New York Times lawsuit against OpenAI.

As I said, the situation is very unclear legally, with active litigation
in multiple countries, and we have to comply with pretty much every
country's laws in this situation.  Whether something is legal in the
United States, where you're located, is completely irrelevant to whether
it is legal in Canada, where I'm located, or Germany or the UK, where we
have other contributors.  We also have to consider whether it's legal in
all of the countries that Git is distributed in, which includes every
country in which Debian has a mirror[0], even countries under
international sanctions, such as Iran, Russia, and Belarus.

It doesn't matter if the person using AI has indemnification, either,
since that only covers civil matters, and at least in the U.S. and
Canada, knowingly violating copyright is also a criminal offence.

The sign-off process is designed to clearly state that a person has the
ability to contribute code under the license and I don't think, as
things stand, it's possible to make that assertion with code or
documentation generated from an LLM except in very limited
circumstances.  I don't allow LLM-generated code in my personal projects
that require sign-off for that reason, and neither does QEMU[1].  I
don't think I could honestly assert either (a) or (b) in the DCO with
LLM-generated code because it's not clear to me whether "I have the
right to submit it under the…license."

To quote the QEMU policy:

  To satisfy the DCO, the patch contributor has to fully understand the
  copyright and license status of content they are contributing to QEMU. With AI
  content generators, the copyright and license status of the output is
  ill-defined with no generally accepted, settled legal foundation.

  Where the training material is known, it is common for it to include large
  volumes of material under restrictive licensing/copyright terms. Even where
  the training material is all known to be under open source licenses, it is
  likely to be under a variety of terms, not all of which will be compatible
  with QEMU's licensing requirements.

I remember the SCO situation with Linux and how it really created a lot
of uncertainty with Linux because SCO created FUD around Linux licensing
and how that led to the DCO being created.  I am aware of the fact that
many open source contributors are very unhappy that their code has been
used to train LLMs without retaining credits and copyright notices or
honouring the license terms[2].  And I have spent many years working
with non-profits[3], where I have always been taught that we should
avoid even the appearance of impropriety.

It may matter less what the situation actually ends up being legally
(although it could end up being quite bad) and more whether someone can
imply or suggest that Git is not being distributed in compliance with
the license or contains infringing code, which could effectively make it
undistributable because nobody wants to take that risk.  And litigation,
even if Git and its contributors are successful, can be extraordinarily
expensive.

So I think, given the circumstances, yes, the right thing to do is to
ban LLM-generated contributions with a policy very similar or identical
to QEMU's.  If, in the future, the legal situation changes and it
becomes unambiguously legal to use LLMs across the world, then we can
reconsider that policy then.

[0] https://www.debian.org/mirror/list
[1] https://github.com/qemu/qemu/commit/3d40db0efc22520fa6c399cf73960dced423b048
[2] Regardless of the legal concerns, this implicates professional
ethics concerns, such as §1.5 of the ACM Code of Ethics[4].  Ethics
requirements usually go well beyond what the law requires.
[3] Software Freedom Conservancy, which handles legal matters for the
Git project, is a non-profit.
[4] https://www.acm.org/code-of-ethics
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA