Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #373 -- Added CompositePrimaryKey. #18056

Open
wants to merge 107 commits into
base: main
Choose a base branch
from

Conversation

csirmazbendeguz
Copy link

@csirmazbendeguz csirmazbendeguz commented Apr 7, 2024

Trac ticket number

ticket-373

Branch description

This branch adds the CompositePrimaryKey field. If present, Django will create a composite primary key.

Proposal
Previous PR
Composite FK
Serial Fields

class Tenant(models.Model):
    pass


class User(models.Model):
    primary_key = models.CompositePrimaryKey("tenant_id", "id")
    tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
    id = models.IntegerField()


class Comment(models.Model):
    primary_key = models.CompositePrimaryKey("tenant_id", "id")
    tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
    id = models.IntegerField()
    user_id = models.IntegerField()
    user = models.ForeignObject(
        User,
        on_delete=models.CASCADE,
        from_fields=("tenant_id", "user_id"),
        to_fields=("tenant_id", "id"),
        related_name="+",
    )

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • For UI changes, I have attached screenshots in both light and dark modes.

@grjones
Copy link

grjones commented Apr 17, 2024

I was trying out this exciting branch and ran into this error when running a test:

<...>/lib/python3.12/site-packages/django/db/models/lookups.py:30: in __init__
    self.rhs = self.get_prep_lookup()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = TupleIn(<django.db.models.fields.composite.Cols object at 0x107560980>, <django.db.models.sql.query.Query object at 0x1074e23f0>)

    def get_prep_lookup(self):
        if not isinstance(self.lhs, Cols):
            raise ValueError(
                "The left-hand side of the 'in' lookup must be an instance of Cols"
            )
        if not isinstance(self.rhs, Iterable):
>           raise ValueError(
                "The right-hand side of the 'in' lookup must be an iterable"
            )
E           ValueError: The right-hand side of the 'in' lookup must be an iterable

The issue stems from the use of isnull like so:

MyModel.objects.filter(
    type_override__severity__isnull=False
).update(severity="high")

Curious if anyone ran into this as well.

Edited for traceback:

<...>
lib/python3.12/site-packages/django/db/models/sql/compiler.py:2080: in pre_sql_setup
    self.query.add_filter("pk__in", query)
lib/python3.12/site-packages/django/db/models/sql/query.py:1601: in add_filter
    self.add_q(Q((filter_lhs, filter_rhs)))
lib/python3.12/site-packages/django/db/models/sql/query.py:1617: in add_q
    clause, _ = self._add_q(q_object, self.used_aliases)
lib/python3.12/site-packages/django/db/models/sql/query.py:1649: in _add_q
    child_clause, needed_inner = self.build_filter(
lib/python3.12/site-packages/django/db/models/sql/query.py:1563: in build_filter
    condition = self.build_lookup(lookups, col, value)
lib/python3.12/site-packages/django/db/models/sql/query.py:1393: in build_lookup
    lookup = lookup_class(lhs, rhs)
lib/python3.12/site-packages/django/db/models/lookups.py:30: in __init__
    self.rhs = self.get_prep_lookup()

So, this is part of SQLUpdateCompiler and is coming from the update code path.

@csirmazbendeguz
Copy link
Author

csirmazbendeguz commented Apr 18, 2024

Thanks for testing and reporting the issue @grjones! Indeed, I forgot to handle this use case. I'll look into it this week.

@csirmazbendeguz csirmazbendeguz force-pushed the ticket_373 branch 2 times, most recently from 6a26b19 to c75dcdd Compare April 19, 2024 12:22
@csirmazbendeguz
Copy link
Author

@grjones, FYI I pushed the fix

@grjones
Copy link

grjones commented Apr 20, 2024

@grjones, FYI I pushed the fix

Nice! I hope this gets merged in soon. Your branch has been working great for me.

@grjones
Copy link

grjones commented Apr 22, 2024

I may have found one other small issue. When adding a regular primary_key=True on a single field, a unique constraint is added. But when using this branch, it becomes an IntegrityError instead. Adding a UniqueConstraint on the composite fields is a work-a-round but ideally would be captured in this PR. Imo, this PR is sooooo close. I'm excited for it to be merged in.

@csirmazbendeguz
Copy link
Author

@grjones , thanks, I appreciate the feedback, I'll look into it. If a model defines Meta.primary_key, defining primary_key=True on a field should not be possible - could you give me a code example so I know how to reproduce the issue? I didn't know Django added unique constraints to primary keys, I'll check, but isn't that redundant?

@grjones
Copy link

grjones commented Apr 23, 2024

@grjones , thanks, I appreciate the feedback, I'll look into it. If a model defines Meta.primary_key, defining primary_key=True on a field should not be possible - could you give me a code example so I know how to reproduce the issue? I didn't know Django added unique constraints to primary keys, I'll check, but isn't that redundant?

I'll see if I can give you a solid failing test. My "unique constraint" phrasing might not be exactly right. But ultimately, I believe Django queries the DB first to see if the new object's PK already exists and throws a validation error. The composite key logic doesn't seem to be doing that and so an unhandled IntegrityError is raised instead.

@csirmazbendeguz
Copy link
Author

csirmazbendeguz commented May 1, 2024

@grjones , sorry for the late reply, I've been busy last week. Could you give me more specifics? What's the error message you expect?

@grjones
Copy link

grjones commented May 2, 2024

@grjones , sorry for the late reply, I've been busy last week. Could you give me more specifics? What's the error message you expect?

Actually, I think it's mostly ok. I was using Django Spanner and it's just not quite working with composite keys and will need to be fixed there. I wrote this and it passed. It probably shouldn't say Id though?

from django.core.exceptions import ValidationError
from django.test import TestCase

from .models import Tenant, User


class CompositePKCleanTests(TestCase):
    """
    Test the .clean() method of composite_pk models.
    """

    @classmethod
    def setUpTestData(cls):
        cls.tenant = Tenant.objects.create()

    def test_validation_error_is_raised_when_pk_already_exists(self):
        test_cases = [
            {"tenant": self.tenant, "id": 2412, "email": "user2412@example.com"},
            {"tenant_id": self.tenant.id, "id": 5316, "email": "user5316@example.com"},
            {"pk": (self.tenant.id, 7424), "email": "user7424@example.com"},
        ]
        expected = "{'id': ['User with this Id already exists.']}"
        for fields in test_cases:
            User.objects.create(**fields)
            with self.assertRaisesMessage(ValidationError, expected):
                User(**fields).clean()

Copy link
Contributor

@LilyFoote LilyFoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start!

I've left a bunch of ideas for improvement. Feel free to push back if you think I'm wrong about anything.

django/db/models/base.py Outdated Show resolved Hide resolved
django/db/models/fields/composite.py Outdated Show resolved Hide resolved
django/db/models/fields/composite.py Outdated Show resolved Hide resolved
django/db/models/fields/composite.py Outdated Show resolved Hide resolved
django/db/models/fields/composite.py Outdated Show resolved Hide resolved
tests/composite_pk/test_get.py Outdated Show resolved Hide resolved
tests/composite_pk/test_get.py Outdated Show resolved Hide resolved
tests/composite_pk/test_update.py Outdated Show resolved Hide resolved
tests/composite_pk/tests.py Outdated Show resolved Hide resolved
tests/composite_pk/tests.py Outdated Show resolved Hide resolved
@csirmazbendeguz
Copy link
Author

Thank you so much for taking the time to review my changes @LilyFoote !
I have two questions:

  1. If Meta.primary_key is defined, this PR will automatically add a composite field called primary_key to the model. What do you think about this approach? I felt like it was easier to handle the composite primary keys this way as we can run checks against the meta class instead of traversing the model's fields for a composite field.
  2. I wrote a lot of tests testing the underlying queries made by the ORM. It makes a lot of sense to me, but I haven't seen this type of tests that much in the Django source code - do these tests look okay to you?

@LilyFoote
Copy link
Contributor

If Meta.primary_key is defined, this PR will automatically add a composite field called primary_key to the model. What do you think about this approach?

I don't feel strongly that this is better or worse than another option here, so happy to go with what you think is best.

I wrote a lot of tests testing the underlying queries made by the ORM. It makes a lot of sense to me, but I haven't seen this type of tests that much in the Django source code - do these tests look okay to you?

I like your tests quite a bit - they're pretty readable and comprehensive. The main issue I have with them is that they're written for specific databases instead of for generic database features. Where possible Django strongly prefers to test based on features because then the tests apply to as many databases as possible (including third party database libraries). I think the asserts of the actual SQL might be a bit tricky to adapt though, so we might need a different way to check what they're checking.

Also, after I reviewed yesterday, I thought of some more things:

  • We should add migrations tests to make sure that adding/removing Meta.primary_key works correctly and that removing a field that's part of a primary key also does something appropriate.
  • We might want tests for composite keys in forms and the admin. Maybe there's other areas too that we need to check the interactions.

Comment on lines 26 to 32
user = models.ForeignObject(
User,
on_delete=models.CASCADE,
from_fields=("tenant_id", "user_id"),
to_fields=("tenant_id", "id"),
related_name="+",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think this should be a stretch goal to get it working. See the comment above about MultiColSource.

return compiler.compile(WhereNode(exprs, connector=OR))


class Cols(Expression):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is an opportunity to merge this TuplesIn, Cols, and friends logic with MultiColSource so it's less of an 👽. They both do very similar thing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charettes , I'll need to look into this, I wasn't aware.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged Cols with MultiColSource (ad51da4) however, I'm not sure this is correct.

As far as I understand, MultiColSource was meant to represent columns in a JOIN, and as such, it has a sources field. Cols, on the other hand, was meant to represent a list of columns and it doesn't need a sources field. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted back to Cols. Please resolve if you agree.

# to the table definition.
# It's expected 'primary_key=True' isn't set on any fields (see E043).
pk = model._meta.pk
if hasattr(pk, "columns"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have been answered somewhere else already, so please ignore the question if it was. I am wondering if it would make sense to handle single column primary keys like this as well. What would be the upside of having different objects (?) here for single vs multi-columns fields? Or do we gain some backwards compat wins here by having it like this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we've considered this yet. My feeling without digging in is that it's not necessary for this feature, but it might be a nice follow-up refactor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if we want to do it in a follow-up refactor we should still ensure that it ends up in the same release. While Meta is probably not documented, changes there will still hurt more advanced 3rd party packages.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference is, primary_key=True creates the primary key inline, while Meta.primary_key creates it as a table level constraint.

CREATE TABLE foo (
    id INTEGER PRIMARY KEY
)

vs.

CREATE TABLE foo (
    id INTEGER,
    PRIMARY KEY (id)
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik all db backends support both inline and table-level primary keys, so there's no real difference in practice. It would take some work to move away from inline primary keys and it's out of scope for this ticket.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am second guessing if Meta.primary_key is really necessary now.

You do raise the entirely valid point, why is it None if primary_key=True is set on a field? As a user, I would expect it to always store some info related to the primary key. And I would expect this info to be the source of truth.

So, how about we consider the following API instead?

class Foo(models.Model):
   primary_key = models.CompositePrimaryKey("tenant_id", "id")
   tenant_id = models.IntegerField()
   id = models.IntegerField()

This is how @LilyFoote did it too in the previous PR, I just thought adding a Meta option is more user friendly.
But if it leads to confusion, I'm not sure it's worth it anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that explicit approach certainly feels less magical (ie we don't have to add an extra field to the class and the user can name it whatever they want). All in all it feels more consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this feels like the right API to me given this discussion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted it 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think, and thanks for the review @apollo13 , I appreciate it! ❤️

@csirmazbendeguz csirmazbendeguz changed the title Fixed #373 -- Added CompositePrimaryKey-based Meta.primary_key. Fixed #373 -- Added CompositePrimaryKey. Jun 10, 2024
@omerfarukabaci
Copy link
Contributor

omerfarukabaci commented Jun 10, 2024

Hey @csirmazbendeguz, thank you for the amazing work out there! I was trying to test this branch on my local with SQLite and realised a few things:

  1. If you run makemigrations for a model with a CompositePrimaryKey, the resulting migration file has erroneous imports. To fix this, I believe we need to add django.db.models.fields.composite path to the if...elif block here.

  2. Assume that I have the following models:

    class Author(models.Model):
    name = models.CharField(max_length=100)
    
    class Book(models.Model):
        id = models.CompositePrimaryKey("author", "title")
        author = models.ForeignKey(Author, on_delete=models.CASCADE, related_name="books")
        title = models.CharField(max_length=255)

    With the current implementation, following test fails:

    class TestCompositeFks(TestCase):
        def test_composite_fks(self):
            author = Author.objects.create(name="Author")
            book = Book.objects.create(author=author, title="Title")
            list(Author.objects.filter(books__in=[book])) == book

    with an OperationalError, caused by a syntax error. Executed SQL is as following:

    SELECT
        "books_author"."id",
        "books_author"."name"
    FROM
        "books_author"
        INNER JOIN "books_book" ON ("books_author"."id" = "books_book"."author_id")
    WHERE
        "books_book"."author_id", "books_book"."title" IN ((1, 'Title'))

    because LHS in WHERE clause should have been wrapped with parantheses like this:

    ...
    WHERE
        ("books_book"."author_id", "books_book"."title") IN ((1, 'Title'))

    Unfortunately I didn't have a time to deep-dive to this.

  3. Not a big issue but my code editor (VSCode) does not recognize models.CompositePrimaryKey, although the import is working fine. This is probably related with Pylance or something that VSCode uses to recognize fields under models module.

Again thanks for this amazing initiative! 🚀

Comment on lines 285 to 299
def get_lookup(self, lookup_name):
if lookup_name == "exact":
return TupleExact
elif lookup_name == "gt":
return TupleGreaterThan
elif lookup_name == "gte":
return TupleGreaterThanOrEqual
elif lookup_name == "lt":
return TupleLessThan
elif lookup_name == "lte":
return TupleLessThanOrEqual
elif lookup_name == "in":
return TupleIn

raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason to not to use CompositePrimaryKey.register_lookup as we do for other fields (e.g. JSONField)?

@csirmazbendeguz
Copy link
Author

Thanks a lot for the review @omerfarukabaci ! I'll take a look

@csirmazbendeguz
Copy link
Author

csirmazbendeguz commented Jun 11, 2024

Author.objects.filter(books__in=[book])

@omerfarukabaci , I pushed the changes to support this, but note that filtering on reverse relations is one of those "gotchas" in Django, it may not produce the results you expect.

EDIT: I mean it might return duplicates, you probably already know this, I'm just mentioning it just in case.

@csirmazbendeguz
Copy link
Author

If you run makemigrations for a model with a CompositePrimaryKey, the resulting migration file has erroneous imports

Yes, I recently changed the API to CompositePrimaryKey, the migrations are not 100% yet. I'm working on sorting them out.
I pushed the fix for the issue you mentioned, thanks 👍

@omerfarukabaci
Copy link
Contributor

@csirmazbendeguz Thanks for your answers, now the above issues seem like fixed, created migration is correct and reverse relation lookup is working as expected. Thank you! 🚀

While I was testing it further with the exact same models, I realized another issue:

class TestCompositeFks(TestCase):
    def test_composite_fks(self):
        author = Author.objects.create(name="Author")
        Book.objects.create(author=author, title="Title")
        author = Author.objects.annotate(book_count=Count("books")).get()
        assert author.book_count == 1

This test fails with the following error:

django.db.utils.OperationalError: wrong number of arguments to function COUNT()

Executed SQL is as following:

SELECT
    "books_author"."id",
    "books_author"."name",
    COUNT("books_book"."author_id", "books_book"."title") AS "book_count"
FROM
    "books_author"
    LEFT OUTER JOIN "books_book" ON ("books_author"."id" = "books_book"."author_id")
GROUP BY
    "books_author"."id",
    "books_author"."name"

If we could change the parameter we pass to the COUNT function to a concatenation as below:

COUNT("books_book"."author_id" || '-' || "books_book"."title")

it should work fine (if I am not missing something), with the exception that for some databases we need to use CONCAT function instead of || operator, which might be resolved using the existing db.models.functions.Concat function.

Note: I am not sure if concatenation works between every data type that is allowed to be a primary key, although this could be considered as an edge case.

@csirmazbendeguz
Copy link
Author

Thanks @omerfarukabaci , these bug reports are very helpful. Yes, I haven't considered annotations with multi-column pks. I'll look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants