Mittwoch, 16. April 2008

ER-Modeling with Google App Engine (updated)

3rd updated version with a lot of input from the GAE group. Thanks to everyone!

ER-Modeling with Google App Engine is somewhat different to "normal" modeling for a relational database.

Here is a small tutorial on how to create well-known relationship models One-to-One (1:1), One-to-Many (1:n), Many-to-Many (m:n) and a special one (Cascading relations) with GAE:

We need the following Model classes for our example. Also if you want to run it, don't forget to import the sys module - we need it for printing to stdout:

import sys #import sys module for printing to stdout

class Car(db.Model):
brand = db.StringProperty(required=True)
wheels = db.ListProperty(db.Key)

class Human(db.Model):
name = db.StringProperty(required=True)
drives = db.ReferenceProperty(reference_class=Car)
spouse = db.SelfReferenceProperty()
owns = db.ListProperty(db.Key)

class Wheel(db.Model):
isBroken = db.BooleanProperty(default=False)
position = db.StringProperty(choices=set(["left_front",
"left_back",
"right_front",
"right_back"]))

One-to-One (1:1)

A simple relationship between two entities.

Let's say a human called Jack drives one car.
We can model this relationship as following:

# one-to-one
jack = Human(name="Jack")
mercedes = Car(brand="Mercedes")
jack.drives = mercedes.put()
jack.put()
print >> sys.stdout, "Jack drives a "+jack.drives.brand

# Jack drives a Mercedes

As you can see, we create a human called "Jack" and a car "Mercedes". After that we assign the car to the drives-property of Jack and save Jack.

Hint (Thanks to Miguel for pointing this out): be careful with 1:1 relationships done with ReferenceProperty-properties - you could easily write something like this:

jack        = Human(name="Jack")
mike = Human(name="Mike")
mercedes = Car(brand="Mercedes")
mercedesid = mercedes.put()

jack.drives = mercedesid
jack.put()

mike.drives = mercedesid
mike.put()
which won't be a 1:1 relation any more. So if you really need to make sure that an entity is only referenced once, you need to do this by your code design (by searching within your existing model kinds and throwing exceptions).
Even using a cascading-relationship with a parent entity (see later in this article) does not make sure, that there is only one car per human and therefore includes the same difficulty just the other way round.

Special: self references

Now a special sort of references are self-references. That means a Model references an entity of the same Model class (e.g. a Human references a Human):
# one-to-one self
bob = Human(name="Bob")
jane = Human(name="Jane")

bob.spouse = jane.put()
bob.put()
b_spouse = Human.get(bob.spouse.key())
print >> sys.stdout, "Bob's spouse is "+b_spouse.name

# Bob's spouse is Jane
We created two humans (Bob and Jane) and set Bob's spouse-property to reference Jane.
Now we can easily find out who Bob's spouse is.

Special: mutual references

But if Bob is married to Jane, Jane is also married to Bob, isn't she?
So let's do this semantically correct:
# one-to-one self mutual
bob = Human(name="Bob")
jane = Human(name="Jane")

bob.spouse = jane.put()
jane.spouse = bob.put()
jane.put()

j_spouse = Human.get(jane.spouse.key())
print >> sys.stdout, "Jane's spouse is "+j_spouse.name
b_spouse = Human.get(bob.spouse.key())
print >> sys.stdout, "Bob's spouse is "+b_spouse.name

# Jane's spouse is Bob
# Bob's spouse is Jane

Be careful to check if reflexive references are semantically allowed or not – in our case it wouldn't be valid if bob's spouse is himself.

Also in a monogamous society, it wouldn't be valid if Bob is the spouse of more than one other Human entity (!)
If you don't want this behavior, you need to prevent it by throwing exceptions. You could use the validator-parameter of every Property class.

One-to-Many (1:n)

Sometimes we need to reference more than one entity from another.
When modeling 1:n relationships, a special of the Model class comes in handy: parent models (or ancestors).
Let's think of a car having four wheels. Those wheels belong exactly to one car, so we define the car as parent for every wheel we create:

# one-to-many using parent
bmw = Car(brand="BMW")
bmw.put()

lf = Wheel(parent=bmw,position="left_front")
lf.put()

lb = Wheel(parent=bmw,position="left_back")
lb.put()

rf = Wheel(parent=bmw,position="right_front")
rf.put()

# uh, snap, the 4th wheel is broken!
rb = Wheel(parent=bmw,position="right_back",isBroken=True)
rb.put()

# from car to wheels
bmwWheels = Wheel.all().ancestor(bmw)
print >> sys.stdout, "The BMW has the wheels: "
for wheel in bmwWheels:
print >> sys.stdout, "- "+wheel.position

# The BMW has the wheels:
# - left_front
# - right_back
# - right_front
# - left_back

# from wheel to car
brokenWheels = Wheel.gql("WHERE isBroken = :broken",
broken=True)
print >> sys.stdout, "The following cars are broken: "
for wheel in brokenWheels:
print >> sys.stdout, "- "+wheel.parent().brand

# The following cars are broken:
# - BMW

We can see: it is easy to get the wheels for a car and it is also possible to get the car a certain wheel belongs to. Also an entity can only have one parent at a time - so we made sure, that a wheel is not used by more than one car at the same time. You could even add some spare wheels to a car (there is no way to define a maximum number an entity can be used as parent entity).

The other way round

It is also possible to do this the other way round. Lets say we have an additional Model like this:

class OwnedCar(db.Model):
brand = db.StringProperty(required=True)
owner = db.ReferenceProperty(Human, required=True)

then we could add cars to a person as following:

paul = Human(name="Paul")
paul.put()

pauls_bmw = OwnedCar(brand="BMW", owner=paul)
pauls_bmw.put()

pauls_mercedes = OwnedCar(brand="Mercedes", owner=paul)
pauls_mercedes.put()

pauls_cars = paul.ownedcar_set
print >> sys.stdout, "Paul's cars: "
for car in pauls_cars:
print >> sys.stdout, "- "+car.brand

# Paul's cars:
# - BMW
# - Mercedes

This makes for example sure, that one car is only owned by one human at a time.

Also notice the part on how to get Paul's cars. You don't even need to create a GQL query - the property modelname_set holds the references. You can find more about this in the docs.

Thanks to Aprigio for his input!

Special: One-to-Many using a list

You can also create 1:n relationships using a list:

# one-to-many using list
dodge = Car(brand="Dodge")
w1 = Wheel(position="left_front")
w2 = Wheel(position="left_back")
w3 = Wheel(position="right_front")
w4 = Wheel(position="right_back")
dodge.wheels = [w1.put(),w2.put(),w3.put(),w4.put()]
dodge.put()

dodgeWheels = Wheel.get(dodge.wheels)
print >> sys.stdout, "The Dodge has the wheels: "
for wheel in dodgeWheels:
print >> sys.stdout, "-"+wheel.position

# The Dodge has the wheels:
# -left_front
# -left_back
# -right_front
# -right_back

but: be careful, this model does not make sure, a wheel does belong to exactly one car! You could reference a wheel from more than one car, which would be semantically wrong in our example. So rather use this model for n:m relationships!

Many-to-Many (m:n)

For creating m:n relationships, use the following model:

# many-to-many using list+db.Key
jack = Human(name="Jack")
bob = Human(name="Bob")

vw = Car(brand="VW")
chevy = Car(brand="Chevy")

carpool = [vw.put(),chevy.put()]

jack.owns = carpool
jack.put()

bob.owns = carpool
bob.put()

chrysler = Car(brand="Chrysler")
jack.owns.append(chrysler.put())
jack.put()

jackOwns = Car.get(jack.owns)
print >> sys.stdout, "Jack owns: "
for car in jackOwns:
print >> sys.stdout, "- "+car.brand

# Jack owns:
# - VW
# - Chevy
# - Chrysler

whoOwnsTheChevy = Human.gql("WHERE owns = :car",car=chevy)
print >> sys.stdout, "These humans own the Chevy: "
for who in whoOwnsTheChevy:
print >> sys.stdout, "- "+who.name

# These humans own the Chevy:
# - Jack
# - Bob

First we create our two guys Jack and Bob. Then they decide on sharing cars. Their carpool consists of a VW and a Chevy. After that, Jack decides on buying an additional car for himself (the Chrysler).

When adding to a list like in this example, there is no check if the value already is present within the list - so it would be possible to reference the same entity multiple times. If you want to prevent this, you need to run a function on the list, which makes sure there are only unique entities:

def unique(lst):
d = {}
for item in lst:
d[item] = 1
return d.keys()

jeep = Car(brand="Jeep")
jack.owns = unique(jack.owns + [jeep.put()])
jack.put()
You could also check to content of the list before adding an entity, if it is already referenced.

Most flexible: using mapping entities

It might be a better idea to implement a different entity to represent the relationship
between a human and the cars they own.
The reason for this is that you could add more fields to this which may be beneficial later in a query. Lets say we add a bought field that contains the date the car was bough. Then one could get cars owned by Jack that he bought after a certain date.

class CarOwner(db.Model):
car = db.Reference(Car, required=True)
owner = db.Reference(Human, required=True)
bought = db.DateProperty(auto_now_add=True)

@staticmethod
def get_owner_cars(human, bought):
"""Returns the cars that the given human
bought before a specified date"""
if not human: return []
query =
db.Query(CarOwner)
query.filter(
'owner =',human)
query
.filter('bought >= ',bought)
return [entry.car for entry in query]

also we'd like to add a function to the Car-Model, so that it looks like this:

class Car(db.Model):
brand = db.StringProperty(required=True)
wheels = db.ListProperty(db.Key)

def human_owns(self, human):
"""Returns true if the given human owns this car."""
if not human: return False
query = db.Query(CarOwner)
query.filter('car =', self)
query.filter('owner =', human)
return query.get()
When we have the Models defined like that, we can go on using them:
max = Human(name="Max")
max.put()
saab = Car(brand="Saab")
saab.put()

ownership = CarOwner(car=saab,owner=max)
ownership.put()

import datetime
delta = datetime.timedelta(days=-1)
yesterday = (datetime.datetime.utcnow() + delta)

maxs_cars = CarOwner.get_owner_cars(max,yesterday)
print >> sys.stdout, "Max's cars since yesterday: "
for car in maxs_cars:
print >> sys.stdout, "- "+car.brand

# Max's cars since yesterday:
# - Saab

max_owns_saab = bool(saab.human_owns(max))
print >> sys.stdout, "Max owns the Saab: "+str(max_owns_saab)

# Max owns the Saab: True

As you can see, the additional property on the mapping entity comes in handy when selecting the cars Max bought since yesterday.
And also the method on the car-Model is pretty neat to bundle functionality related to a model within it.

Note: you can use this sort of referencing also use for 1:1 and 1:n relationships. You just need to make sure, that either one referenced entity (1:n) is unique, or both (the combination of those - for 1:1)

Thanks to Brian, who brought this up!

Cascading relations

With the parent-property, cascading relationships are possible. This is very useful, if you have a folder-like structure (e.g. categories and products of an online shop). Let's have a look on how to do this:
import sys

class Category(db.Model):
name = db.StringProperty(required=True)

class Product(db.Model):
name = db.StringProperty(required=True)
price = db.FloatProperty()
categories = db.ListProperty(db.Key)

root = Category(name="Products")
root.put()
tech = Category(parent=root,name="Tech stuff").put()
books = Category(parent=root,name="Books")
books.put()
fantasy = Category(parent=books,name="Fantasy")
fantasy.put()
scifi = Category(parent=books,name="Science Fiction")
scifi.put()
scomics = Category(parent=scifi,name="SciFi comics")
scomics.put()

somebook = Product(name="Some book")
somebook.categories.append(books.key())
somebook.price = 9.99
somebook.put()

lotr = Product(name="Lord Of The Rings")
lotr.categories.append(fantasy.key())
lotr.price = 29.99
lotr.put()

allFantasyBooks = Product.gql("WHERE categories = :cat",
cat=fantasy)
print >> sys.stdout, "All fantasy books: "
for book in allFantasyBooks:
print >> sys.stdout, "- "+book.name



path = []
p = scomics
while p.parent():
path.append(p.name)
p = p.parent()

path.reverse()
print >> sys.stdout, " > ".join(path)

# Books > Science Fiction > SciFi comics

Now getting all books in a category is pretty easy, huh? And also generating a breadcrumb-like "where am I" is simple by iterating through all parent entities for a given entity (here it is the science fiction comics Category).

Attention: while the use of parent relationships if fine for the use case here, it would not be very efficient in the case where there are a lot of children associated with a parent.
This is because this results in a very large entity group.
The more entity groups your application has - that is, the more root entities there are - the more efficiently the datastore can distribute the entity groups across datastore nodes.
Thus for efficiency you should avoid the case where there are a lot of children.
According to the docs:

"A good rule of thumb for entity groups is that they should be about the size of a
single user's worth of data or smaller."

Thanks to Ben, Michael and Brian for pointing this out.

Kommentare:

  1. Das ist zehr gut. Veilen Danke.

    Venkatesh

    AntwortenLöschen
  2. Very, very useful and helpful ! Thanks.

    AntwortenLöschen
  3. I'm sure I speak for all when i say how helpful your writeup has been!

    My question involves 'entity groups':

    Shouldn't entity groups be applied in all cases where you have a reference property?

    For instance, I have blog posts, and comments which have a reference property to the blog post. If I want to delete a post and all its comments, I want to run them as a transaction, and therefore I need the comments to have the blog post as its 'parent'.

    So, should your analysis mention this for ALL reference property use cases?

    Thanks again!

    AntwortenLöschen
  4. @g-man:
    actually in the case of blog posts and comments it might be a good idea to store all entities in one entity group if (and only if) as far as I understand you always fetch all comments with the blog post. If not, it might be better to store them in different groups...specifically if you expect a high number of comments.

    AntwortenLöschen
  5. Yes, I should use the entity group idea, but my problem has been getting Django to do that from its form editing.

    Because the GAE API requirement is:

    comment=Comment(parent=item)

    which must be assigned at the moment of instantiation, and the Django forms are immutable, it makes it tricky for me to figure out how to insert my parent information before the Model instance is saved.

    I appreciated you previous Django writing, but this is a tricky problem...

    AntwortenLöschen
  6. Quick question:

    Is it true that in your example (quoted below) bob != j_spouse?

    It seems this would be difficult to deal with if the model in question were mutable since every dereference will give you a different copy of the model. No?

    # one-to-one self mutual
    bob = Human(name="Bob")
    jane = Human(name="Jane")

    bob.spouse = jane.put()
    jane.spouse = bob.put()
    jane.put()

    j_spouse = Human.get(jane.spouse.key())
    print >> sys.stdout, "Jane's spouse is "+j_spouse.name
    b_spouse = Human.get(bob.spouse.key())
    print >> sys.stdout, "Bob's spouse is "+b_spouse.name

    AntwortenLöschen
  7. Hello, can you please help me:

    Given only the books category, how do I get all the products that has category's parent of books category?

    AntwortenLöschen
  8. @Anonymous: you can use the parent relationship to the categories to walk up in the tree and fetch any categories' items you like.

    AntwortenLöschen
  9. Hello Joscha, thanks for your reply, I assume you want me to do this:

    child_categories = Category.all().ancestor(books)

    the_books = []

    for category in child_categories: the_books.append(Product.all().filter("category =", category))

    This method seems to work just fine, but then I can't do the fetch with limit and offset just for the books category (since I have to do the fetch(limit, offset) on Product that has parent of books category):

    for category in child_categories: the_books.append(Product.all().filter("category =", category).fetch(0, 10))

    I hope I don't confuse you so far. Anyway, I was hoping if there's some way to fetch the products using :

    the_books = Product.all().filter("category.parent =", books).fetch(0,10)

    so the_books will contain only 10 books that has parent category of books. Thank you.

    AntwortenLöschen
  10. vielen danke, a great port.

    AntwortenLöschen
  11. Thanks ' for this article. I have problem. Actuaaly i ma working in Python language and want to know the searching creteria. Like in SQl we used "Like" operator but in GQL how we do this same Thing. So please help me..

    Thanks..

    AntwortenLöschen
  12. Hi - LIKE queries are not possible with GAE, but what you can do to get some of this functionality is described here: http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Introducing_Indexes
    Also, there is an undocumented (and unsupported) way of doing fulltext searches on GAE, you can find more information here: http://appengineguy.com/2008/06/how-to-full-text-search-in-google-app.html

    AntwortenLöschen
  13. hey.. this was a really really nice article.. I was in need of such..

    thanks a lot.. :)

    AntwortenLöschen
  14. Best summary by far, and all in one spot.
    Ich danke dir.
    Thanks

    AntwortenLöschen
  15. I really appreciate,nice.
    I've written about it on my following blog:-
    http://bygsoft.wordpress.com/2010/01/09/cloudy-combo-google-app-engine-and-amazon-s3-combo-pack/

    AntwortenLöschen
  16. 2 years later this is still one of the best posts on the internet for this topic.

    AntwortenLöschen
  17. I know this post is quite old, but it seems it is still relevant, so I'll post the comment anyway.

    1. Is there a way to ensure 1-to-1 on the database level?
    2. In the car-driver example, I think it would be more logical for the car to have an owner property. Car ownership is specific to cars (i.e., every car needs a driver in order to move), whereas a person is not necessarily a car owner, or even interested in cars.

    AntwortenLöschen
  18. Interesting post. You have gained a new fan. Please keep up the good work and I look forward to more of your amazing posts.


    Southwest Engines Company

    AntwortenLöschen
  19. No doubt this is an excellent post I got a lot of knowledge after reading good luck.

    Southwest Engines

    AntwortenLöschen