In most applications we want to work and think with objects and the relationships between them. So for instance we might have student objects and course objects and also enrollment objects which contain in them a student object and a course object and maybe some more information about them. We may want to change the information of an enrollment for a student, for example when they withdraw from a class. We might want to check that each student is enrolled in a specific number of courses, and never exceed 4.5 credits, etc.
This is all what we might call the business logic of our application. This is the level at which we want to think about our application, and we want to write code that talks in these terms, so that it is easy for someone to read that code and understand the key business decisions. We often also refer to this understanding of our main objects as the domain model.
By contrast, at some point we need to persist our data to a database. Databases typically don't talk in terms of "objects" and "relationships". They have tables, columns, primary keys, tuples, foreign keys and a host of other limitations. If possible, we would rather not have to program in those terms.
This is where Object-Relational Mapping enters. After some initial setup, ORM takes care of the nitty gritty details of converting between "business logic" and "database queries". For example it converts the statement "add this student to the system" to an appropriate INSERT query that would actually carry out the creation.
This comes at a price: There are often complicated situations that are somewhat slow in an ORM setting but would be faster if we were to write the SQL query directly. The good news is that most ORM systems live in harmony with more "core" systems, so you can write the majority of your application in ORM form, only occasionally resorting to more direct queries.
This also comes with tremendous benefits. For example, if you suddenly want to completely change your storage, and move from one kind of database to a completely different kind that maybe doesn't use SQL queries the same way, then most of your business logic does not need to change at all.
In an ORM setting you typically declare a mapping between a certain class in your application and a certain table. You then leave many of the details up to the system to handle.
In ORM our domain model consists of a set of classes that implement the required behavior. Each class is typically mapped to a database table by means of a mapper. Our classes typicaly inherit from a base class that is responsible for handling that mapping behavior and other similar setup functionalities.
Work in ORM is maintained via a session. A session represents a group of operations that need to happen. It contains information about new objects that are to be created, updates to existing objects, as well as objects that are to be deleted. This is often implemented by the so called unit of work pattern.
Changes that exist in a session are not written to the database yet, they are just scheduled to be written. When a commit (typically called a flush) occurs, every change in the session will be converted to appropriate SQL queries and will be executed. We can also "back out" of the changes in the session by doing a rollback.
Objects in an ORM application can be in one of five states at any given time (see here):
The terminology may differ slightly from system to system.
Another essential component of an ORM system is the relationships between classes. There are roughly three kinds of relationships, which differ in the number of elements that can map from each side to the other.
While this section in general avoids discussing concepts at the level of the database, it is worth mentioning how these three kinds of relationships would typically be represented in the database.
Foreign key constraint along with a unique constraint
CREATE TABLE departments (
...
head VARCHAR(40) UNIQUE FOREIGN KEY REFERENCES instructors(login),
...
)
Foreign key without a unique constraint
CREATE TABLE sections (
...
term INT FOREIGN KEY REFERENCES terms(id),
...
)
Using a third "associations" table that links through foreign keys to both tables
CREATE TABLE enrollments (
...
section INT FOREIGN KEY REFERENCES sections(id),
student INT FOREIGN KEY REFERENCES students(id),
...
)
When working with relationships in an ORM setting, you typically can access the relationship from one end or the other. So for example in the many-to-many example above, a student
object may actually contain a list of the section
objects that the student is enrolled in, and conversely a section
object may contain a list of the students enrolled in it.
When you set up relationships, you have an option to make then one-directional or bi-directional. Our example above is bi-directional: We can access the relationship from either end. But for example we could easily imagine a setup where section
objects do not themselves contain a list of all the student
objects enrolled in them; we can only access the enrollments by looking at the student side.
Now we will discuss how ORM is implemented in SQLAlchemy. We need to start with the usual setup of an engine:
from sqlalchemy import *
## Set up however you need to
engine = create_engine('sqlite:///:memory:', echo=True)
The next essential component is to create a "base class" that knows how to set up ORM. SQLAlchemy provides a system called declarative_base
that has all the needed functionality:
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
Now we will build classes for our entities, and will make them extend
the functionality provided by Base
.
class Student(Base):
__tablename__ = 'en_students'
id = Column(Integer, primary_key = True)
login = Column(String(20), unique=True, nullable=False)
first = Column(String(20))
last = Column(String(20))
credits = Column(Integer, default=0)
gpa = Column(Float(precision=32), default=0)
def __repr__(self):
return "Student<%s %s>" % (self.first, self.last)
Student.__table__ # Shows us the corresponding Table construction
Base.metadata.create_all(engine) # Creates the tables in the database
Now let's create a student object. We would do it in a standard way. The Base
class that we inherit from provides a simple constructor, that expects key-value pairs and creates corresponding entries from them:
student = Student(last="Turing", first="Alan", login="turinga37")
student # See the printout we defined above via __repr__
student.last
student.id == None # Student has no id yet.
insp = inspect(student) # An inspection object for student
insp.transient # true
insp.persistent # false
In order to work with the system, we need to create a section. First we must create a "session maker" class:
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
Then we can use this to create a session when we need one:
session = Session()
This session can be used to manage the interaction with the database. Let us start by adding our student object to the session:
session.add(student)
insp.transient # false
insp.pending # true
student.id == None # Still no id
If we try to now perform a query, for example, the session will automatically issue a flush and save this student to the database. For example, let's ask for a student whose last name is "Turing":
alan = session.query(Student).filter_by(last="Turing").first()
There is of course a whole lot more to querying, look at the notes.
One important property of ORM systems is known as identity: Once an object is known to the system by a specific primary key, any query will return that exact same object, and not a different copy of that object. In our instance, the two objects student
, which we created directly, and alan
, which we obtained from the query, are literally the same object:
alan is student
alan.first = "Alamo"
student.first # Also is Alamo now
insp.persistent # True
session.dirty # The name change is not yet saved to the database.
session.flush()
session.dirty # Empty now
NOTE: flush vs commit: In SQLAlchemy a session manages "transactions" with the database. A transaction is a set of changes that the database knows about and remembers and maintains, but it has also given you the option to possibly undo all these changes. A flush moves any changes to objects in SQLAlchemy into the current transaction buffer, in other words it makes the database aware of them. This makes future queries know about them and have access to them. However, the transaction is not yet completed. You can complete the transaction by issuing a commit (session.commit()
). This closes the current transaction and permanently writes the changes to the database. Alternatively, you can use session.rollback()
to roll back the entire transaction.
Let's look at the basic relationship patterns we have in SQLAlchemy and how to set them up.
We have already defined a Student class. We will also consider a Course class that represents course sections being offered. We will develop two kinds of relationships between students and courses:
We start with the favorite
relationship. Here's how it might look:
class Course(Base):
__tablename__ = "en_courses"
id = Column(Integer, primary_key=True)
prefix = Column(String(4), nullable=False)
no = Column(String(20), nullable=False)
title = Column(String(55), nullable=False)
credits = Column(Integer, nullable=False, default=4)
__table_args__ = (
UniqueConstraint('prefix', 'no', name="fullCode")
)
favoritedBy = relationship("Student", order_by=Student.id,
back_populates="favoriteCourse")
#
def __repr__(self):
return "Course<%s%s %s>" % (self.prefix, self.no, self.title)
#
def isFullCredit(self):
return self.credits == 4
# Back in student class:
# Here are the "favorites" relationship bits
favorite_id = Column(Integer, ForeignKey("en_courses.id"))
# And we talk about the relationship:
favoriteCourse = relationship("Course", back_populates="favoritedBy")
So there are two parts to the definition:
relationship
method to create the actual relationship variables. If you want the relationship to be bidirectional, then you need to define the variables on both sides, like we did above with the favoritedBy
and favoriteCourse
variables.We can then access a student's favorite course by doing for example:
alan.favoriteCourse
And conversely if we have a course c
in mind, we can find all the students favoriting it by doing:
c.favoritedBy # A list of students
In order to establish a many-to-many relationship, we have to take some extra steps:
secondary=...
parameter to the relationship
call to point to the table to be used for the linkage.For example, let's suppose that we wanted the favorites
relationship above between students courses and courses to be many-to-many: Each student can favorite more than one course, and each course can be favorited by more than one student. Here's how that might be set up:
# We first create a table:
favoritesTbl = Table('en_favorites', Base.metadata,
Column('student_id', ForeignKey('en_students.id'), primary_key=True),
Column('course_id', ForeignKey('en_courses.id'), primary_key=True))
# Inside the Student class definition:
favoriteCourses = relationship("Course",
secondary=favoritesTbl,
back_populates="favoritedBy")
# Inside the Course class definition:
favoritedBy = relationship("Student",
secondary=favoritesTbl,
back_populates="favoriteCourses")
As an example, of using this, suppose we want to find out all the course that the student bob
has favorited. we could do:
bob.favoriteCourses
And if we want to find out all students who favorite the same course as bob, we could do a list comprehension:
[ student
for course in bob.favoriteCourses
for student in course.favoritedBy
if student != bob ]
When we're dealing with many relationships, one important consideration is the loading of objects from the database.
For example consider the above setting of students, courses, and students choosing courses as favorites:
Bob
object. Along with it comes the list of courses that Bob has favorited, via the favoriteCourses
relationship. So this brings along the CS220
object.favoritedBy
relationship, so for it to be fully loaded would require that we include all those students with it, like Zoe and many more.Clearly this is unsustainable. We need a way to load just the Bob
object, along with perhaps an inkling of the fact that there is a list of courses that Bob has favorited, but without fully loading those course objects unless our desired query somehow needs them. There are various loading strategies we can employ in such a situation, documented in this page. The broader classification is between lazy loading, eager loading and no loading.
These loading strategies can be set up when the relationship is defined, or they can be activated at specific queries.
In lazy loading, there will be no SELECT components created for objects that are not the primary queried object, unless the corresponding object is being accessed by the Python code.
Lazy loading is the default behavior.