3rd updated version with a lot of input from the GAE group. Thanks to everyone!
ER-Modeling with Google App Engine is somewhat different to "normal" modeling for a relational database.
Here is a small tutorial on how to create well-known relationship models One-to-One (1:1), One-to-Many (1:n), Many-to-Many (m:n) and a special one (Cascading relations) with GAE:
We need the following Model classes for our example. Also if you want to run it, don't forget to import the sys module - we need it for printing to stdout:
import sys #import sys module for printing to stdout
class Car(db.Model):
brand = db.StringProperty(required=True)
wheels = db.ListProperty(db.Key)
class Human(db.Model):name = db.StringProperty(required=True)drives = db.ReferenceProperty(reference_class=Car)spouse = db.SelfReferenceProperty()owns = db.ListProperty(db.Key)
class Wheel(db.Model):isBroken = db.BooleanProperty(default=False)position = db.StringProperty(choices=set(["left_front",
"left_back",
"right_front",
"right_back"]))
One-to-One (1:1)
A simple relationship between two entities.
Let's say a human called Jack drives one car.
We can model this relationship as following:
# one-to-one# Jack drives a Mercedes
jack = Human(name="Jack")
mercedes = Car(brand="Mercedes")
jack.drives = mercedes.put()
jack.put()
print >> sys.stdout, "Jack drives a "+jack.drives.brand
As you can see, we create a human called "Jack" and a car "Mercedes". After that we assign the car to the drives-property of Jack and save Jack.
Hint (Thanks to Miguel for pointing this out): be careful with 1:1 relationships done with ReferenceProperty-properties - you could easily write something like this:
jack = Human(name="Jack")
mike = Human(name="Mike")
mercedes = Car(brand="Mercedes")
mercedesid = mercedes.put()
jack.drives = mercedesid
jack.put()
mike.drives = mercedesid
mike.put()which won't be a 1:1 relation any more. So if you really need to make sure that an entity is only referenced once, you need to do this by your code design (by searching within your existing model kinds and throwing exceptions).Even using a cascading-relationship with a parent entity (see later in this article) does not make sure, that there is only one car per human and therefore includes the same difficulty just the other way round.
Special: self references
Now a special sort of references are self-references. That means a Model references an entity of the same Model class (e.g. a Human references a Human):We created two humans (Bob and Jane) and set Bob's spouse-property to reference Jane.# one-to-one self# Bob's spouse is Jane
bob = Human(name="Bob")
jane = Human(name="Jane")
bob.spouse = jane.put()
bob.put()
b_spouse = Human.get(bob.spouse.key())
print >> sys.stdout, "Bob's spouse is "+b_spouse.name
Now we can easily find out who Bob's spouse is.
Special: mutual references
But if Bob is married to Jane, Jane is also married to Bob, isn't she?So let's do this semantically correct:
# one-to-one self mutual
bob = Human(name="Bob")
jane = Human(name="Jane")
bob.spouse = jane.put()
jane.spouse = bob.put()
jane.put()j_spouse = Human.get(jane.spouse.key())print >> sys.stdout, "Jane's spouse is "+j_spouse.nameb_spouse = Human.get(bob.spouse.key())print >> sys.stdout, "Bob's spouse is "+b_spouse.name# Jane's spouse is Bob
# Bob's spouse is Jane
Be careful to check if reflexive references are semantically allowed or not – in our case it wouldn't be valid if bob's spouse is himself.
Also in a monogamous society, it wouldn't be valid if Bob is the spouse of more than one other Human entity (!)
If you don't want this behavior, you need to prevent it by throwing exceptions. You could use the validator-parameter of every Property class.
One-to-Many (1:n)
Sometimes we need to reference more than one entity from another.
When modeling 1:n relationships, a special of the Model class comes in handy: parent models (or ancestors).
Let's think of a car having four wheels. Those wheels belong exactly to one car, so we define the car as parent for every wheel we create:
# one-to-many using parent
bmw = Car(brand="BMW")
bmw.put()
lf = Wheel(parent=bmw,position="left_front")
lf.put()
lb = Wheel(parent=bmw,position="left_back")
lb.put()
rf = Wheel(parent=bmw,position="right_front")
rf.put()# uh, snap, the 4th wheel is broken!rb = Wheel(parent=bmw,position="right_back",isBroken=True)# The BMW has the wheels:
rb.put()
# from car to wheels
bmwWheels = Wheel.all().ancestor(bmw)
print >> sys.stdout, "The BMW has the wheels: "
for wheel in bmwWheels:
print >> sys.stdout, "- "+wheel.position
# - left_front
# - right_back
# - right_front
# - left_back# The following cars are broken:
# from wheel to car
brokenWheels = Wheel.gql("WHERE isBroken = :broken",
broken=True)
print >> sys.stdout, "The following cars are broken: "
for wheel in brokenWheels:
print >> sys.stdout, "- "+wheel.parent().brand
# - BMW
We can see: it is easy to get the wheels for a car and it is also possible to get the car a certain wheel belongs to. Also an entity can only have one parent at a time - so we made sure, that a wheel is not used by more than one car at the same time. You could even add some spare wheels to a car (there is no way to define a maximum number an entity can be used as parent entity).
The other way round
It is also possible to do this the other way round. Lets say we have an additional Model like this:
class OwnedCar(db.Model):
brand = db.StringProperty(required=True)
owner = db.ReferenceProperty(Human, required=True)then we could add cars to a person as following:
paul = Human(name="Paul")# Paul's cars:
paul.put()
pauls_bmw = OwnedCar(brand="BMW", owner=paul)
pauls_bmw.put()
pauls_mercedes = OwnedCar(brand="Mercedes", owner=paul)
pauls_mercedes.put()
pauls_cars = paul.ownedcar_set
print >> sys.stdout, "Paul's cars: "
for car in pauls_cars:
print >> sys.stdout, "- "+car.brand
# - BMW
# - Mercedes
This makes for example sure, that one car is only owned by one human at a time.
Also notice the part on how to get Paul's cars. You don't even need to create a GQL query - the property modelname_set holds the references. You can find more about this in the docs.
Thanks to Aprigio for his input!
Special: One-to-Many using a list
You can also create 1:n relationships using a list:
# one-to-many using list# The Dodge has the wheels:
dodge = Car(brand="Dodge")
w1 = Wheel(position="left_front")
w2 = Wheel(position="left_back")
w3 = Wheel(position="right_front")
w4 = Wheel(position="right_back")
dodge.wheels = [w1.put(),w2.put(),w3.put(),w4.put()]
dodge.put()
dodgeWheels = Wheel.get(dodge.wheels)
print >> sys.stdout, "The Dodge has the wheels: "
for wheel in dodgeWheels:
print >> sys.stdout, "-"+wheel.position
# -left_front
# -left_back
# -right_front
# -right_back
but: be careful, this model does not make sure, a wheel does belong to exactly one car! You could reference a wheel from more than one car, which would be semantically wrong in our example. So rather use this model for n:m relationships!
Many-to-Many (m:n)
For creating m:n relationships, use the following model:
# many-to-many using list+db.Key# Jack owns:
jack = Human(name="Jack")
bob = Human(name="Bob")
vw = Car(brand="VW")
chevy = Car(brand="Chevy")
carpool = [vw.put(),chevy.put()]
jack.owns = carpool
jack.put()
bob.owns = carpool
bob.put()
chrysler = Car(brand="Chrysler")
jack.owns.append(chrysler.put())
jack.put()
jackOwns = Car.get(jack.owns)
print >> sys.stdout, "Jack owns: "
for car in jackOwns:
print >> sys.stdout, "- "+car.brand
# - VW
# - Chevy
# - Chrysler# These humans own the Chevy:
whoOwnsTheChevy = Human.gql("WHERE owns = :car",car=chevy)
print >> sys.stdout, "These humans own the Chevy: "
for who in whoOwnsTheChevy:
print >> sys.stdout, "- "+who.name
# - Jack
# - Bob
First we create our two guys Jack and Bob. Then they decide on sharing cars. Their carpool consists of a VW and a Chevy. After that, Jack decides on buying an additional car for himself (the Chrysler).
When adding to a list like in this example, there is no check if the value already is present within the list - so it would be possible to reference the same entity multiple times. If you want to prevent this, you need to run a function on the list, which makes sure there are only unique entities:
You could also check to content of the list before adding an entity, if it is already referenced.def unique(lst):d = {}for item in lst:d[item] = 1return d.keys()
jeep = Car(brand="Jeep")
jack.owns = unique(jack.owns + [jeep.put()])
jack.put()
Most flexible: using mapping entities
It might be a better idea to implement a different entity to represent the relationship
between a human and the cars they own.
The reason for this is that you could add more fields to this which may be beneficial later in a query. Lets say we add a bought field that contains the date the car was bough. Then one could get cars owned by Jack that he bought after a certain date.
class CarOwner(db.Model):
car = db.Reference(Car, required=True)
owner = db.Reference(Human, required=True)
bought = db.DateProperty(auto_now_add=True)
@staticmethod
def get_owner_cars(human, bought):
"""Returns the cars that the given human
bought before a specified date"""
if not human: return []
query =db.Query(CarOwner)
query.filter('owner =',human)
query.filter('bought >= ',bought)
return [entry.car for entry in query]
also we'd like to add a function to the Car-Model, so that it looks like this:
class Car(db.Model):
brand = db.StringProperty(required=True)
wheels = db.ListProperty(db.Key)
def human_owns(self, human):
"""Returns true if the given human owns this car."""
if not human: return False
query = db.Query(CarOwner)
query.filter('car =', self)
query.filter('owner =', human)
return query.get()When we have the Models defined like that, we can go on using them:max = Human(name="Max")
max.put()
saab = Car(brand="Saab")
saab.put()
ownership = CarOwner(car=saab,owner=max)
ownership.put()
import datetime
delta = datetime.timedelta(days=-1)
yesterday = (datetime.datetime.utcnow() + delta)
maxs_cars = CarOwner.get_owner_cars(max,yesterday)
print >> sys.stdout, "Max's cars since yesterday: "
for car in maxs_cars:
print >> sys.stdout, "- "+car.brand
# Max's cars since yesterday:
# - Saab
max_owns_saab = bool(saab.human_owns(max))
print >> sys.stdout, "Max owns the Saab: "+str(max_owns_saab)
# Max owns the Saab: True
As you can see, the additional property on the mapping entity comes in handy when selecting the cars Max bought since yesterday.
And also the method on the car-Model is pretty neat to bundle functionality related to a model within it.
Note: you can use this sort of referencing also use for 1:1 and 1:n relationships. You just need to make sure, that either one referenced entity (1:n) is unique, or both (the combination of those - for 1:1)
Thanks to Brian, who brought this up!
Cascading relations
With the parent-property, cascading relationships are possible. This is very useful, if you have a folder-like structure (e.g. categories and products of an online shop). Let's have a look on how to do this:import sys
class Category(db.Model):name = db.StringProperty(required=True)
class Product(db.Model):name = db.StringProperty(required=True)price = db.FloatProperty()
categories = db.ListProperty(db.Key)
root = Category(name="Products")
root.put()
tech = Category(parent=root,name="Tech stuff").put()
books = Category(parent=root,name="Books")
books.put()
fantasy = Category(parent=books,name="Fantasy")
fantasy.put()
scifi = Category(parent=books,name="Science Fiction")
scifi.put()
scomics = Category(parent=scifi,name="SciFi comics")
scomics.put()
somebook = Product(name="Some book")
somebook.categories.append(books.key())
somebook.price = 9.99
somebook.put()
lotr = Product(name="Lord Of The Rings")
lotr.categories.append(fantasy.key())
lotr.price = 29.99
lotr.put()
allFantasyBooks = Product.gql("WHERE categories = :cat",
cat=fantasy)
print >> sys.stdout, "All fantasy books: "
for book in allFantasyBooks:
print >> sys.stdout, "- "+book.name
path = []
p = scomics
while p.parent():path.append(p.name)p = p.parent()
path.reverse()
print >> sys.stdout, " > ".join(path)
# Books > Science Fiction > SciFi comics
Now getting all books in a category is pretty easy, huh? And also generating a breadcrumb-like "where am I" is simple by iterating through all parent entities for a given entity (here it is the science fiction comics Category).
Attention: while the use of parent relationships if fine for the use case here, it would not be very efficient in the case where there are a lot of children associated with a parent.
This is because this results in a very large entity group.
The more entity groups your application has - that is, the more root entities there are - the more efficiently the datastore can distribute the entity groups across datastore nodes.
Thus for efficiency you should avoid the case where there are a lot of children.
According to the docs:
"A good rule of thumb for entity groups is that they should be about the size of a
single user's worth of data or smaller."
Thanks to Ben, Michael and Brian for pointing this out.

Das ist zehr gut. Veilen Danke.
AntwortenLöschenVenkatesh
Very, very useful and helpful ! Thanks.
AntwortenLöschenI'm sure I speak for all when i say how helpful your writeup has been!
AntwortenLöschenMy question involves 'entity groups':
Shouldn't entity groups be applied in all cases where you have a reference property?
For instance, I have blog posts, and comments which have a reference property to the blog post. If I want to delete a post and all its comments, I want to run them as a transaction, and therefore I need the comments to have the blog post as its 'parent'.
So, should your analysis mention this for ALL reference property use cases?
Thanks again!
@g-man:
AntwortenLöschenactually in the case of blog posts and comments it might be a good idea to store all entities in one entity group if (and only if) as far as I understand you always fetch all comments with the blog post. If not, it might be better to store them in different groups...specifically if you expect a high number of comments.
Yes, I should use the entity group idea, but my problem has been getting Django to do that from its form editing.
AntwortenLöschenBecause the GAE API requirement is:
comment=Comment(parent=item)
which must be assigned at the moment of instantiation, and the Django forms are immutable, it makes it tricky for me to figure out how to insert my parent information before the Model instance is saved.
I appreciated you previous Django writing, but this is a tricky problem...
Quick question:
AntwortenLöschenIs it true that in your example (quoted below) bob != j_spouse?
It seems this would be difficult to deal with if the model in question were mutable since every dereference will give you a different copy of the model. No?
# one-to-one self mutual
bob = Human(name="Bob")
jane = Human(name="Jane")
bob.spouse = jane.put()
jane.spouse = bob.put()
jane.put()
j_spouse = Human.get(jane.spouse.key())
print >> sys.stdout, "Jane's spouse is "+j_spouse.name
b_spouse = Human.get(bob.spouse.key())
print >> sys.stdout, "Bob's spouse is "+b_spouse.name
Hello, can you please help me:
AntwortenLöschenGiven only the books category, how do I get all the products that has category's parent of books category?
@Anonymous: you can use the parent relationship to the categories to walk up in the tree and fetch any categories' items you like.
AntwortenLöschenHello Joscha, thanks for your reply, I assume you want me to do this:
AntwortenLöschenchild_categories = Category.all().ancestor(books)
the_books = []
for category in child_categories: the_books.append(Product.all().filter("category =", category))
This method seems to work just fine, but then I can't do the fetch with limit and offset just for the books category (since I have to do the fetch(limit, offset) on Product that has parent of books category):
for category in child_categories: the_books.append(Product.all().filter("category =", category).fetch(0, 10))
I hope I don't confuse you so far. Anyway, I was hoping if there's some way to fetch the products using :
the_books = Product.all().filter("category.parent =", books).fetch(0,10)
so the_books will contain only 10 books that has parent category of books. Thank you.
vielen danke, a great port.
AntwortenLöschenVery helpful man.
AntwortenLöschenAwesome article. Thanks a lot.
AntwortenLöschenThanks ' for this article. I have problem. Actuaaly i ma working in Python language and want to know the searching creteria. Like in SQl we used "Like" operator but in GQL how we do this same Thing. So please help me..
AntwortenLöschenThanks..
Hi - LIKE queries are not possible with GAE, but what you can do to get some of this functionality is described here: http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Introducing_Indexes
AntwortenLöschenAlso, there is an undocumented (and unsupported) way of doing fulltext searches on GAE, you can find more information here: http://appengineguy.com/2008/06/how-to-full-text-search-in-google-app.html
hey.. this was a really really nice article.. I was in need of such..
AntwortenLöschenthanks a lot.. :)
Best summary by far, and all in one spot.
AntwortenLöschenIch danke dir.
Thanks
I really appreciate,nice.
AntwortenLöschenI've written about it on my following blog:-
http://bygsoft.wordpress.com/2010/01/09/cloudy-combo-google-app-engine-and-amazon-s3-combo-pack/
2 years later this is still one of the best posts on the internet for this topic.
AntwortenLöschenI know this post is quite old, but it seems it is still relevant, so I'll post the comment anyway.
AntwortenLöschen1. Is there a way to ensure 1-to-1 on the database level?
2. In the car-driver example, I think it would be more logical for the car to have an owner property. Car ownership is specific to cars (i.e., every car needs a driver in order to move), whereas a person is not necessarily a car owner, or even interested in cars.
Interesting post. You have gained a new fan. Please keep up the good work and I look forward to more of your amazing posts.
AntwortenLöschenSouthwest Engines Company
No doubt this is an excellent post I got a lot of knowledge after reading good luck.
AntwortenLöschenSouthwest Engines