3rd updated version with a lot of input from the GAE group. Thanks to everyone!
ER-Modeling with Google App Engine is somewhat different to "normal" modeling for a relational database.
Here is a small tutorial on how to create well-known relationship models One-to-One (1:1), One-to-Many (1:n), Many-to-Many (m:n) and a special one (Cascading relations) with GAE:
We need the following Model classes for our example. Also if you want to run it, don't forget to import the sys module - we need it for printing to stdout:
import sys #import sys module for printing to stdout
class Car(db.Model):
brand = db.StringProperty(required=True)
wheels = db.ListProperty(db.Key)
class Human(db.Model):
name = db.StringProperty(required=True)
drives = db.ReferenceProperty(reference_class=Car)
spouse = db.SelfReferenceProperty()
owns = db.ListProperty(db.Key)
class Wheel(db.Model):
isBroken = db.BooleanProperty(default=False)
position = db.StringProperty(choices=set(["left_front",
"left_back",
"right_front",
"right_back"]))
One-to-One (1:1)
A simple relationship between two entities.
Let's say a human called Jack drives one car.
We can model this relationship as following:
# one-to-one
jack = Human(name="Jack")
mercedes = Car(brand="Mercedes")
jack.drives = mercedes.put()
jack.put()
print >> sys.stdout, "Jack drives a "+jack.drives.brand
# Jack drives a Mercedes
As you can see, we create a human called "Jack" and a car "Mercedes". After that we assign the car to the drives-property of Jack and save Jack.
Hint (Thanks to Miguel for pointing this out): be careful with 1:1 relationships done with ReferenceProperty-properties - you could easily write something like this:
jack = Human(name="Jack")
mike = Human(name="Mike")
mercedes = Car(brand="Mercedes")
mercedesid = mercedes.put()
jack.drives = mercedesid
jack.put()
mike.drives = mercedesid
mike.put()
which won't be a 1:1 relation any more. So if you really need to make sure that an entity is only referenced once, you need to do this by your code design (by searching within your existing model kinds and throwing exceptions).
Even using a cascading-relationship with a parent entity (see later in this article) does not make sure, that there is only one car per human and therefore includes the same difficulty just the other way round.
Special: self references
Now a special sort of references are self-references. That means a Model references an entity of the same Model class (e.g. a Human references a Human):
# one-to-one self
bob = Human(name="Bob")
jane = Human(name="Jane")
bob.spouse = jane.put()
bob.put()
b_spouse = Human.get(bob.spouse.key())
print >> sys.stdout, "Bob's spouse is "+b_spouse.name
# Bob's spouse is Jane
We created two humans (Bob and Jane) and set Bob's
spouse-property to reference Jane.
Now we can easily find out who Bob's spouse is.
Special: mutual references
But if Bob is married to Jane, Jane is also married to Bob, isn't she?
So let's do this semantically correct:
# one-to-one self mutual
bob = Human(name="Bob")
jane = Human(name="Jane")
bob.spouse = jane.put()
jane.spouse = bob.put()
jane.put()
j_spouse = Human.get(jane.spouse.key())
print >> sys.stdout, "Jane's spouse is "+j_spouse.name
b_spouse = Human.get(bob.spouse.key())
print >> sys.stdout, "Bob's spouse is "+b_spouse.name
# Jane's spouse is Bob
# Bob's spouse is Jane
Be careful to check if reflexive references are semantically allowed or not – in our case it wouldn't be valid if bob's spouse is himself.
Also in a monogamous society, it wouldn't be valid if Bob is the spouse of more than one other Human entity (!)
If you don't want this behavior, you need to prevent it by throwing exceptions. You could use the validator-parameter of every Property class.
One-to-Many (1:n)
Sometimes we need to reference more than one entity from another.
When modeling 1:n relationships, a special of the Model class comes in handy: parent models (or ancestors).
Let's think of a car having four wheels. Those wheels belong exactly to one car, so we define the car as parent for every wheel we create:
# one-to-many using parent
bmw = Car(brand="BMW")
bmw.put()
lf = Wheel(parent=bmw,position="left_front")
lf.put()
lb = Wheel(parent=bmw,position="left_back")
lb.put()
rf = Wheel(parent=bmw,position="right_front")
rf.put()
# uh, snap, the 4th wheel is broken!
rb = Wheel(parent=bmw,position="right_back",isBroken=True)
rb.put()
# from car to wheels
bmwWheels = Wheel.all().ancestor(bmw)
print >> sys.stdout, "The BMW has the wheels: "
for wheel in bmwWheels:
print >> sys.stdout, "- "+wheel.position
# The BMW has the wheels:
# - left_front
# - right_back
# - right_front
# - left_back
# from wheel to car
brokenWheels = Wheel.gql("WHERE isBroken = :broken",
broken=True)
print >> sys.stdout, "The following cars are broken: "
for wheel in brokenWheels:
print >> sys.stdout, "- "+wheel.parent().brand
# The following cars are broken:
# - BMW
We can see: it is easy to get the wheels for a car and it is also possible to get the car a certain wheel belongs to. Also an entity can only have one parent at a time - so we made sure, that a wheel is not used by more than one car at the same time. You could even add some spare wheels to a car (there is no way to define a maximum number an entity can be used as parent entity).
The other way round
It is also possible to do this the other way round. Lets say we have an additional Model like this:
class OwnedCar(db.Model):
brand = db.StringProperty(required=True)
owner = db.ReferenceProperty(Human, required=True)
then we could add cars to a person as following:
paul = Human(name="Paul")
paul.put()
pauls_bmw = OwnedCar(brand="BMW", owner=paul)
pauls_bmw.put()
pauls_mercedes = OwnedCar(brand="Mercedes", owner=paul)
pauls_mercedes.put()
pauls_cars = paul.ownedcar_set
print >> sys.stdout, "Paul's cars: "
for car in pauls_cars:
print >> sys.stdout, "- "+car.brand
# Paul's cars:
# - BMW
# - Mercedes
This makes for example sure, that one car is only owned by one human at a time.
Also notice the part on how to get Paul's cars. You don't even need to create a GQL query - the property modelname_set holds the references. You can find more about this in the docs.
Thanks to Aprigio for his input!
Special: One-to-Many using a list
You can also create 1:n relationships using a list:
# one-to-many using list
dodge = Car(brand="Dodge")
w1 = Wheel(position="left_front")
w2 = Wheel(position="left_back")
w3 = Wheel(position="right_front")
w4 = Wheel(position="right_back")
dodge.wheels = [w1.put(),w2.put(),w3.put(),w4.put()]
dodge.put()
dodgeWheels = Wheel.get(dodge.wheels)
print >> sys.stdout, "The Dodge has the wheels: "
for wheel in dodgeWheels:
print >> sys.stdout, "-"+wheel.position
# The Dodge has the wheels:
# -left_front
# -left_back
# -right_front
# -right_back
but: be careful, this model does not make sure, a wheel does belong to exactly one car! You could reference a wheel from more than one car, which would be semantically wrong in our example. So rather use this model for n:m relationships!
Many-to-Many (m:n)
For creating m:n relationships, use the following model:
# many-to-many using list+db.Key
jack = Human(name="Jack")
bob = Human(name="Bob")
vw = Car(brand="VW")
chevy = Car(brand="Chevy")
carpool = [vw.put(),chevy.put()]
jack.owns = carpool
jack.put()
bob.owns = carpool
bob.put()
chrysler = Car(brand="Chrysler")
jack.owns.append(chrysler.put())
jack.put()
jackOwns = Car.get(jack.owns)
print >> sys.stdout, "Jack owns: "
for car in jackOwns:
print >> sys.stdout, "- "+car.brand
# Jack owns:
# - VW
# - Chevy
# - Chrysler
whoOwnsTheChevy = Human.gql("WHERE owns = :car",car=chevy)
print >> sys.stdout, "These humans own the Chevy: "
for who in whoOwnsTheChevy:
print >> sys.stdout, "- "+who.name
# These humans own the Chevy:
# - Jack
# - Bob
First we create our two guys Jack and Bob. Then they decide on sharing cars. Their carpool consists of a VW and a Chevy. After that, Jack decides on buying an additional car for himself (the Chrysler).
When adding to a list like in this example, there is no check if the value already is present within the list - so it would be possible to reference the same entity multiple times. If you want to prevent this, you need to run a function on the list, which makes sure there are only unique entities:
def unique(lst):
d = {}
for item in lst:
d[item] = 1
return d.keys()
jeep = Car(brand="Jeep")
jack.owns = unique(jack.owns + [jeep.put()])
jack.put()
You could also check to content of the list before adding an entity, if it is already referenced.
Most flexible: using mapping entitiesIt might be a better idea to implement a different entity to represent the relationship
between a human and the cars they own.
The reason for this is that you could add more fields to this which may be beneficial later in a query. Lets say we add a bought field that contains the date the car was bough. Then one could get cars owned by Jack that he bought after a certain date.
class CarOwner(db.Model):
car = db.Reference(Car, required=True)
owner = db.Reference(Human, required=True)
bought = db.DateProperty(auto_now_add=True)
@staticmethod
def get_owner_cars(human, bought):
"""Returns the cars that the given human
bought before a specified date"""
if not human: return []
query = db.Query(CarOwner)
query.filter('owner =',human)
query.filter('bought >= ',bought)
return [entry.car for entry in query]
also we'd like to add a function to the Car-Model, so that it looks like this:
class Car(db.Model):
brand = db.StringProperty(required=True)
wheels = db.ListProperty(db.Key)
def human_owns(self, human):
"""Returns true if the given human owns this car."""
if not human: return False
query = db.Query(CarOwner)
query.filter('car =', self)
query.filter('owner =', human)
return query.get()
When we have the Models defined like that, we can go on using them:
max = Human(name="Max")
max.put()
saab = Car(brand="Saab")
saab.put()
ownership = CarOwner(car=saab,owner=max)
ownership.put()
import datetime
delta = datetime.timedelta(days=-1)
yesterday = (datetime.datetime.utcnow() + delta)
maxs_cars = CarOwner.get_owner_cars(max,yesterday)
print >> sys.stdout, "Max's cars since yesterday: "
for car in maxs_cars:
print >> sys.stdout, "- "+car.brand
# Max's cars since yesterday:
# - Saab
max_owns_saab = bool(saab.human_owns(max))
print >> sys.stdout, "Max owns the Saab: "+str(max_owns_saab)
# Max owns the Saab: True
As you can see, the additional property on the mapping entity comes in handy when selecting the cars Max bought since yesterday.
And also the method on the car-Model is pretty neat to bundle functionality related to a model within it.
Note: you can use this sort of referencing also use for 1:1 and 1:n relationships. You just need to make sure, that either one referenced entity (1:n) is unique, or both (the combination of those - for 1:1)
Thanks to Brian, who brought this up!
Cascading relations
With the
parent-property, cascading relationships are possible. This is very useful, if you have a folder-like structure (e.g. categories and products of an online shop). Let's have a look on how to do this:
import sys
class Category(db.Model):
name = db.StringProperty(required=True)
class Product(db.Model):
name = db.StringProperty(required=True)
price = db.FloatProperty()
categories = db.ListProperty(db.Key)
root = Category(name="Products")
root.put()
tech = Category(parent=root,name="Tech stuff").put()
books = Category(parent=root,name="Books")
books.put()
fantasy = Category(parent=books,name="Fantasy")
fantasy.put()
scifi = Category(parent=books,name="Science Fiction")
scifi.put()
scomics = Category(parent=scifi,name="SciFi comics")
scomics.put()
somebook = Product(name="Some book")
somebook.categories.append(books.key())
somebook.price = 9.99
somebook.put()
lotr = Product(name="Lord Of The Rings")
lotr.categories.append(fantasy.key())
lotr.price = 29.99
lotr.put()
allFantasyBooks = Product.gql("WHERE categories = :cat",
cat=fantasy)
print >> sys.stdout, "All fantasy books: "
for book in allFantasyBooks:
print >> sys.stdout, "- "+book.name
path = []
p = scomics
while p.parent():
path.append(p.name)
p = p.parent()
path.reverse()
print >> sys.stdout, " > ".join(path)
# Books > Science Fiction > SciFi comics
Now getting all books in a category is pretty easy, huh? And also generating a breadcrumb-like "where am I" is simple by iterating through all parent entities for a given entity (here it is the science fiction comics Category).
Attention: while the use of parent relationships if fine for the use case here, it would not be very efficient in the case where there are a lot of children associated with a parent.
This is because this results in a very large entity group.
The more entity groups your application has - that is, the more root entities there are - the more efficiently the datastore can distribute the entity groups across datastore nodes.
Thus for efficiency you should avoid the case where there are a lot of children.
According to the docs:
"A good rule of thumb for entity groups is that they should be about the size of a
single user's worth of data or smaller."
Thanks to Ben, Michael and Brian for pointing this out.