Google Big Table structurePosted: 27 January 2010
Do you ever have those moments of insight where you realize that you implicitly understand something, you more or less understand its advantages and disadvantages, but you haven’t really consciously thought through how it all works and how the pieces fit together? I had that moment this morning thinking about the Google App Engine data store, Big Table.
What I realized is that with Big Table the structure you’re forced to work with is the ubiquitous tree. It’s like the directory structure on disk; a directory can have zero or one parent and zero to many children; the same with inheritance with objects; one parent, many subclasses.
For data store transactions the google documentation often uses the term entity group; for example, when explaining how all of the objects in the transaction must be in the same entity group. I’m thinking that the term parenting would be clearer; all of the objects in a transaction must have the same parent (or grandparent, if they’re further down in the tree), or it’s the parent and its children or grandchildren, or if an object doesn’t have a parent (it’s a root object) you can operate on only that one object in the transaction (if it doesn’t have any children).
Compare to this to an sql relational database where your objects can have many relationships. Just add a foreign key to a table and you’ve got another relationship. And there aren’t any restrictions about which objects can be together in a transaction.
I think what adds to the confusion is that the object relational mapping packages they use on Google App Engine, JDO and JPA, are both designed for SQL relational databases. I think what’s needed is a specialized ORM that makes the parenting restrictions and issues obvious, and doesn’t have all of the unusable SQL relational database stuff.
So it can be very confusing coming to Google App Engine from the sql world and you’re trying to understand how to use Big Table. That was my epiphany this morning, realizing that it’s one big tree; the more you can bend the structure of the relationship between your objects into a tree the less likely you’ll be pulling your hair out.
For example, you’ve probably had tables that contain what is a property for other objects. Visually, with a web app, you can think of this table supplying the values in a drop down list. For example, if you were tracking clothing you might want to have a property for the available colors, Red, Green, Blue. With SQL I’d have a little table called Colors and my Shirts table would have a foreign key column for Colors. But that won’t work with App Engine because every time you add a Color object to a Shirt object it needs to make that Shirt the parent of the Color. You could denormalize and add a copy of the Color object to the Shirt object, but that has problems if you discover you misspelled a Color or want to change it; Blue becomes Cornflower Blue. The other alternative is to store the Color’s primary key in the Shirt object, which I also don’t like because then you’re doing the work that the ORM should be doing for you; you get the Shirt object then you have to fetch the Color object using its key that’s stored in the Shirt.
But in some cases you can turn things around; for example, suppose that instead of making Color a property of Shirt you make Color the aggregating (parent) object and in the Color object is a List of Shirt objects. That’s not the best example but I hope you can see what I mean.