MongoDb is a premier document-based database with many features.
You can run MongoDb in various ways:
We will follow this last method. You gain access to 500MB space for free. Here are the steps:
wranglingclass
for the account name.There are two standard ways to interact with your Mongodb database. In this section we'll use the mongo
shell.
mongo ds011168.mlab.com:11168/wranging -u <dbuser> -p <dbpassword>
. Yours will have a different database number probably. You'll need to paste that link and change the <dbuser>
and <dbpassword>
entries to your setup. You can also omit the password part, and just type -p
, then you will be prompted to type your password.You should now be presented with a welcome message. This is an interactive shell from where you can ask for details from the database. Start by typing:
db
which should show you the name of the database you are currently using. In general db
is like an "object" that we will use to access the current database.We will now create a new collection. Collections are like tables in mysql. But since there is no strict form that documents in MongoDb need to follow, we don't need to really specify anything about the collection. We just start using it. We're going to call our collection "gpas". Here is how we can add an entry to it:
db.gpas.insert({ name: 'student0', gpa: _rand() * 4 })
db.gpas.find()
Let's remove the entry we added:
db.gpas.remove({ name: 'student0' })
db.gpas.find() // No results
The MongoDb shell uses a mini programming language that looks a bit like Javascript, for those familiar with Javascript. For example we used _rand()
to generate a random number. We will now insert a large number of values all at once. We first create them as a "Javascript" object:
var d = [];
for (var i = 0; i < 10000; i++) {
d.push({ name: "student" + i, gpa: _rand()*4 })
}
db.gpas.insertMany(d)
Let us now learn how to search for information in the documents. The main tool at our disposal is the find
method. Its parameter is an object that describes the query. For example we can get the entries with a specific student name:
db.gpas.find({ name: 'student100' })
or we can perform more complex queries. For example, this asks for all entries whose gpa is over 3.95:
db.gpas.find({ gpa: { $gt: 3.98 }})
Here are two ways to ask for all values between 3.78 and 3.98 gpa:
db.gpas.find({
$and: [
{ gpa: { $gte: 3.78 }},
{ gpa: { $lte: 3.98 }}
]
})
db.gpas.find({
gpa: { $gte: 3.78, $lte: 3.98 }
})
The shell will probably link only some of the results. We will discuss how to work with the result of a find
, which is what is known as a cursor. In the meantime, if we only want to know how many results there are, we can use count
:
db.gpas.count({ gpa: { $gt: 3.98 }})
Next we will do an update query: We will add an "atRisk" field to all students with a gpa of 2 or less. The query takes two parameters: The first specifies which entries to locate, the other specifies what changes to make.
db.gpas.updateMany(
{ gpa: { $lte: 2 }}, // <-- which values to change
{ $set: { atRisk: true }} // <-- how to change them
)
Let us now do a more complex query, that captures all students whose gpa is less than 2.5, and adjusts that gpa by up to plus/minus 1 point (we do this my picking a random number from 0 to 2, then subtracting 1).
db.gpas.updateMany(
{ gpa: { $lte: 2.5 }}, // <-- which values to change
{ $inc: { gpa: _rand() * 2 - 1 } // <-- how to change them
})
NOTE: We need to be careful here, as this does not do exactly what you think it should do. It would be nice it if generated a new random number for each entry, but it will in fact use the same random number for all entries. There is no way to do a batch update like that with variable values. We will see a way to do this in a few paragraphs.
Now we will look for students who were at-risk but whose gpa is now over 2. We will then mark all those to no-longer be at risk:
db.gpas.count({ gpa: { $gt: 2 }, atRisk: true })
db.gpas.updateMany(
{ gpa: { $gt: 2 }, atRisk: true },
{ $set: { atRisk: false }}
)
Now we want to search for all students that are not at risk. We cannot simply look for atRisk: false
because this doesn't include those students where there is no atRisk
entry at all. We can do this in a number of ways ways:
// Search for either atRisk not exists or atRisk equals false.
db.gpas.count({
$or: [
{ atRisk: { $exists: false } },
{ atRisk: false }
]
})
// Search for "not having atRisk equal to true"
db.gpas.count({
atRisk: { $ne: true }
})
db.gpas.count({
atRisk: { $not: { $eq: true }}
})
We will now arbitrarily assign all students into four groups. This may take a while as it has to update each entry:
db.gpas.find().forEach(function(doc) {
db.gpas.update(doc, { $set: { group: Math.ceil(_rand() * 4) }})
})
This is also the first time where we say the use of a cursor method: The result of the find
call is a "cursor", which is basically a fancy word for something that we can iterate over. We therefore perform a forEach
on it. That takes an arbitrary function as input, and it executes that function for every result. Let's see how the above worked. Everyone should be more or less equally divided into four "groups", identified by the numbers 1 through 4:
for (var i = 1; i < 5; i++) {
print(db.gpas.count({ group: i }))
}
Some practice problems: