We often need to communicate with databases from within a language like Python. One way is to form an SQL query and then communicate with the database server to submit that query. There are many reasons to not do that. One of these is the various forms of SQL injection, that have catastrophic consequences and are harder to guard against when writing your own query code.
Let's consider a simple example. We have someone type in their name in a web form, then query the database about their information. We may have in mind a query like:
SELECT * FROM users WHERE name = 'usernameHere';
Our script is in Python, and we'll need to create that string. We may do something like this:
username = .... # We've read the username from a webpage. User provided it
query = "SELECT * FROM users WHERE name ='" + username + "';"
If we are not careful, the provided username might be something like: '; DROP TABLE users; --
. In that case the query we are sending to SQL would be:
SELECT * FROM users WHERE name = ''; DROP TABLE users; --';
This would effectively execute the DROP TABLES command on the database, deleting our entire database. You must always take care to "clean up" your input and not blindly feed it into an SQL query. This is called sanitizing.
Always sanitize user input!
This is easier to do when you use built-in libraries for database access. We will learn exactly one such library for Python, called SQLAlchemy. But we would be remiss if we didn't first link to this awesome and relevant xkcd comic:
There are many different libraries to use in order to interface with SQL databases. We will see the basics of SQLAlchemy in this section. SQLAlchemy has two main user-facing components:
We will spend this section looking at parts of the core, and specifically the expression language. We will use this also as an opportunity to revisit the Twitter API and store tweets in the database.
Typically an interaction of SQLAlchemy with a database involves a number of steps:
We start by importing SQLAlchemy and setting up the database engine:
from sqlalchemy import *
## Reading database keys
import json
with open('keys.json', 'r') as f:
vault = json.loads(f.read())['vault']
engineString = 'mysql+mysqldb://{username}:{password}@{server}/{schema}'
engineUrl = engineString.format(**vault) # Learn about * and ** !!!
# Establishing a specific database connection
engine = create_engine(engineUrl, echo = True)
# Now engine can be used to interact with the database
This engine will be our connection to the database.
In order to make the above work, you will need to add some appropriate entries to the keys.json
file. Your values will be different of course:
"vault": {
"username": "skiadas",
"password": "....",
"server": "vault.hanover.edu",
"schema": "skiadas"
}
To make sure that the engine is set up properly, the following will create an actual connection to the database:
conn = engine.connect()
conn
In order to do further work with the database, we need to describe the tables (If we were using an existing database, can also let the system "infer" the table structure from the database).
Let us first discuss in SQL terms what we want. Let's consider the student enrollments tables we have been using. Here's how those were defined:
CREATE TABLE students (
id INT UNIQUE NOT NULL AUTO_INCREMENT,
login VARCHAR(20) UNIQUE NOT NULL,
first VARCHAR(20),
last VARCHAR(20),
credits INT DEFAULT 0,
gpa DOUBLE DEFAULT 0,
PRIMARY KEY (id)
);
CREATE TABLE courses (
id INT UNIQUE NOT NULL AUTO_INCREMENT,
prefix CHAR(4) NOT NULL,
no INT NOT NULL,
title VARCHAR(55) NOT NULL,
credits INT NOT NULL DEFAULT 4,
UNIQUE KEY fullCode (prefix, no),
PRIMARY KEY (id)
);
CREATE TABLE enrollments (
student_id INT NOT NULL,
course_id INT NOT NULL,
letter_grade CHAR(2) DEFAULT NULL,
point_grade DOUBLE DEFAULT NULL,
FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE,
FOREIGN KEY (course_id) REFERENCES courses(id) ON DELETE CASCADE,
PRIMARY KEY (student_id, course_id)
);
This is how we would create these tables in MySQL. These should hopefully be familiar to you by now. We will instead learn how to describe the same information in SQLAlchemy.
One of the challenges we will encounter is the AUTO_INCREMENT
part. SQLAlchemy tries to be "implementation agnostic". So you write your code once and it runs on all databases. But not all databases have the auto-increment feature. Some don't have anything like it, while others like Firebird use a SEQUENCE
system.
Such information is referred to as metadata. We start with creating a "Metadata object" with the Metadata
constructor, then use the Table
and Column
methods to add specifications:
metadata = MetaData()
metadata.bind(engine)
tblStudents = Table('en_students', metadata,
Column('id', Integer, primary_key = True),
Column('login', String(20), unique=True, nullable=False),
Column('first', String(20)),
Column('last', String(20)),
Column('credits', Integer, default=0),
Column('gpa', Float(precision=32), default=0)
)
tblCourses = Table('en_courses', metadata,
Column('id', Integer, primary_key=True),
Column('prefix', String(4), nullable=False),
Column('no', String(20), nullable=False),
Column('title', String(55), nullable=False),
Column('credits', Integer, nullable=False, default=4),
UniqueConstraint('prefix', 'no', name="fullCode")
)
tblEnrollments = Table('en_enrollments', metadata,
Column('student_id', Integer,
ForeignKey("en_students.id", ondelete="CASCADE"),
nullable=False),
Column('course_id', Integer,
ForeignKey("en_courses.id", ondelete="CASCADE"),
nullable=False),
Column('letter_grade', String(2)),
Column('point_grade', Float(32)),
PrimaryKeyConstraint('student_id', 'course_id')
)
# drop the tables if they existed already. Don't always need this.
metadata.drop_all(engine)
# Create these tables if they do not exist
metadata.create_all(engine)
Now let's look into creating some students. Typically the steps for inserting new values would be:
So let's take a look at how this might look in our case:
conn = engine.connect() # Only if it doesn't already exist
users = [
{ "login": "somebodyj1", "first": "Joe", "last": "Somebody" },
{ "login": "somebodyj2", "first": "Joel", "last": "Somebody" },
{ "login": "otherp1", "first": "Peter", "last": "Other" },
{ "login": "otherm1", "first": "Mary", "last": "Other" },
{ "login": "doem1", "first": "Mary", "last": "Doe" },
{ "login": "doep1", "first": "Peter", "last": "Doe" },
{ "login": "doed1", "first": "David", "last": "Doe" }
]
ins = tblStudents.insert() # Create an "insert object"
result = conn.execute(ins, users) # Execute the insert on a connection
dir(result) # examine what properties the result object has
result.rowcount # 7
Let's do the same for the courses:
courses = [
{ "prefix": "MAT", "no": 121, "title": "Calculus 1" },
{ "prefix": "CS", "no": 220, "title": "Intro to CS" },
{ "prefix": "MAT", "no": 122, "title": "Calculus 2" },
{ "prefix": "MAT", "no": 221, "title": "Calculus 3" },
{ "prefix": "CS", "no": 223, "title": "Data Structures" }
]
ins = tblCourses.insert()
result = conn.execute(ins, courses)
We will discuss the INSERT-SELECT variant later. Let us now turn to querying the data.
Now let us discuss how we can query the database for information from within SQLAlchemy. We start with some basic queries. For example, let's see how we would do a basic "select all" query:
SELECT * FROM students;
In SQLAlchemy, you would break this into steps:
Let's take a look:
s = select([tblStudents])
result = conn.execute(s)
result.fetchall() # A list of tuples
# The result object is enumerable
result = conn.execute(s)
for student in result:
print(student)
result.close() # Done using it
One important thing to notice is that the result object acts as what is called a DBAPI cursor: It offers you an iterative pattern over the results set, but once you process the set it then closes and is not available again. You need to execute a new query to process the list a second time, unless you stored the result of the iteration in some way (for example stored the result of result.fetchall
).
Another really important thing is that while the results are tuples, they are actually named tuples:
result = conn.execute(s)
student = result.fetchone() # Just grabbing one match
student.keys() # all the keys
student['login'] # value for key "login"
student.login # also works
Let's now do a specific query for some fields/columns only:
SELECT last, first FROM students;
In SQLAlchemy that might look like so:
s = select([tblStudents.c.last, tblStudents.c.first])
result = conn.execute(s)
result.fetchall()
result = conn.execute(s)
for last, first in result:
print(last + ", " + first)
We can modify the select object to add other components. This is usually done by so-called "method chaining": We add on method calls one after the other, and each one modifies and returns the object. For example we can add a condition for avoiding duplicates (DISTINCT):
s = select([tblStudents.c.last]).distinct()
Or let's suppose we want to get all students whose last name is "Somebody"
. We did this in MySQL via:
SELECT first
FROM students
WHERE last = "Somebody";
In SQLAlchemy, we would use the where
method that can be tacked on to a select
object:
s = select([tblStudents]).\
where(tblStudents.c.last=="Somebody")
Let's scale things up! We want to add two conditions:
SELECT *
FROM students
WHERE first = "Joe"
AND last = "Somebody";
In Python:
s = select([tblStudents]).\
where(tblStudents.c.first=="Joe").\
where(tblStudents.c.last=="Somebody");
Or we can add a GROUP BY
clause:
SELECT *
FROM students
ORDER BY last, first;
In Python:
s = select([tblStudents]).\
order_by(tblStudents.c.last, tblStudents.c.first);
Let's see how we can do an INSERT
querty that uses a SELECT
query to determine the values. For example we had the following:
INSERT INTO enrollments (student_id, course_id)
SELECT id, 1
FROM students;
In Python, this would become:
ins = tblEnrollments.insert().\
from_select([tblEnrollments.c.student_id, tblEnrollments.c.course_id],
select([tblStudents.c.id, literal(1)]));
We can perform joins in SQLAlchemy in two different ways, just as in MySQL. In MySQL, a join can be made implicitly by combining tables via appropriate where
clauses:
SELECT s.last, s.first, c.prefix, c.no
FROM students s, enrollments e, courses c
WHERE e.student_id = s.id
AND e.course_id = c.id;
In Python:
s = select([tblStudents.c.last, tblStudents.c.first,
tblCourses.c.prefix, tblCourses.c.no]).\
where(tblEnrollments.c.student_id == tblStudents.c.id).\
where(tblEnrollments.c.course_id == tblCourses.c.id);
There is also an alternative way to do multiple where
steps, using and_
:
s = select([tblStudents.c.last, tblStudents.c.first,
tblCourses.c.prefix, tblCourses.c.no]).\
where(and_(
tblEnrollments.c.student_id == tblStudents.c.id,
tblEnrollments.c.course_id == tblCourses.c.id
));
We also had another way of performing joins, with the JOIN ... ON
construct. Let's look at the same example in that setup:
SELECT s.last, s.first, c.prefix, c.no
FROM enrollments e
JOIN students s ON e.student_id = s.id
JOIN courses c ON e.course_id = c.id;
One of the nice things about SQLAlchemy is that it will automatically figure out which fields to compare in the ON
portion, by examining the FOREIGN KEY
restrictions. So we can omit that. We do however need an extra step to get our join started, via the select_from
construct:
s = select([tblStudents.c.last, tblStudents.c.first,
tblCourses.c.prefix, tblCourses.c.no]).\
select_from(tblStudents.\
join(tblEnrollments).\
join(tblCourses));
We can certainly also do these steps in parts, a value of using Python instead of MySQL directly, and taking advantage of the fact that Python has objects representing the various SQL elements. So for example we can give a name to the triple join and the selected columns:
allData = tblStudents.join(tblEnrollments).join(tblCourses);
columns = [tblStudents.c.last, tblStudents.c.first,
tblCourses.c.prefix, tblCourses.c.no];
s = select(columns).select_from(allData);