Lambdas and Collections

Section 7.2 Lambdas and Collections

We interrupt our discussion of OOP ideas in order to address a tangential but important topic from functional programming, which has made its way into the mainstream. Namely lambda functions and their relation to processing collections. This discussion ties in well as a follow-up to the tell-don’t-ask principle.

Subsection 7.2.1 Lambda functions

Let’s revisit our GradeCollection example in Listing 7.1.1, and more specifically the getTotalPoints function. We have this whole "if it counts for credit, then tell me how many points that is worth and I will add to the total". It still feels like we are doing some work that should be done by the grade class. All we should be telling the grade class is what we want to have done with the points. So something like this:

double getTotalPoints() {
  double total = 0.0;
  for (Grade g : grades) {
    g.doIfCountsForCredit(p -> { total += p; });
  }
  return total;
}

Unfortunately Java won’t let us do that quite as it is written, but let’s discuss it first. We would like to imagine that a the Grade object has a function doIfCountsForCredit, which is supposed to do something if the grade it represents should count for credit, and do nothing otherwise. The thing in the parentheses, p -> { total += p; }, is what we call a lambda function. It is an anonymous function, that has no associated object to it, but it takes some parameters and has some body, and maybe returns some value. In this case that body performs an increment of the total value. This is in general a great thing to have: We simply pass to the Grade object a function telling it what to do in case it should count for credit. We don’t ask it any questions at all. And that is quite nice indeed. Compare this to the original solution, and make sure you understand how this version follows the tell-don’t-ask principle.

But alas, Java won’t let us do this, for an interesting but somewhat technical reason: In Java we can’t have a local variable whose value updates like that inside this other function. It won’t transfer correctly to an update of the total value we want. Other more functional languages would be perfectly OK with it, but Java not so much.

Luckily there is a relatively meaningful workaround, and we will discuss this here; we will turn the concept of keeping the total into a tiny inner class. Here’s how it works. We’ll create a Total class. Maybe you can find a better name for it. It starts with a total amount, and gives you a method for increasing it:

class Total {
  private double total = 0.0;
  void add(double points) { total += points; }
  double getTotal() { return total; }
}

Using this small class, we can rewrite our getTotalPoints method thus:

double getTotalPoints() {
  Total total = new Total();
  for (Grade g : grades) {
    if (g.countsForCredit()) { total.add(g.points); }
  }
  return total.getTotal();
}

Now let’s try the same trick, turning the whole if block into a method of the Grade object:

double getTotalPoints() {
  Total total = new Total();
  for (Grade g : grades) {
    g.doIfCountsForCredit(p -> total.add(p));
  }
  return total.getTotal();
}

Compare the last two function versions, and make sure you understand how they compare and how they differ.

In order for this to work, our Grade class needs to now have this new method, and here is how it would look like:

void doIfCountsForCredit(Consumer<Double> f) {
  if (countsForCredit()) f.accept(getPoints());
}

There’s something new here, this Consumer<Double> type for the parameter f. This is one of a class of so-called functional interfaces, which are there to represent the various instances where we pass lambda functions as parameters. They all have the common property of containing a single method in them, which represents the act of calling the function. In the case above, that’s what f.accept does: It calls the function we provided, giving it as input the result of getPoints().

Here is a list of the main functional interfaces, for reference:

Consumer<T>: a function that takes a single input, of type T, and has void return (i.e. it consumes values).
BiConsumer<T, U>: a function that takes two inputs, of types T and U respectively, and has void return.
Supplier<T>: a function that takes no input but returns a value of type T (i.e. it supplies values).
Function<T, R>: a function that takes input of type T and returns a value of type R.
BiFunction<T, U, R>: A function that takes two inputs of types T and U respectively, and returns a value of type R

Whenever Java encounters a lambda function (or function reference that we will see later) in a place that expects one of the above functional interfaces, it will automatically create such a class for us.

Before moving on, let’s discuss one more item that often comes up. Notice that in our example above, the function we provided had this form:

p -> total.add(p)

The function literally did nothing more than call another function with the same parameter, and return its resulting value. In this case there is a way to skip the middle-man so to speak, and provide the add method directly. This is called a method reference. There are a couple different method reference forms, but their general form is xx:yy where xx is a class or object, and yy is a method name. In our case we would say total::add, to indicate the method add of the total object. We can then write our code thus:

double getTotalPoints() {
  TotalTracker total = new TotalTracker();
  for (Grade g : grades) {
    g.doIfCountsForCredit(total::add);
  }
  return total.getTotal();
}

At this point we could probably nicely write the whole for loop in one line:

double getTotalPoints() {
  TotalTracker total = new TotalTracker();
  for (Grade g : grades) g.doIfCountsForCredit(total::add);
  return total.getTotal();
}

It’s up to you to decide if you are OK with that or not. In any case, passing the total::add reference in reads nicely, once you get a bit used to it.

Subsection 7.2.2 Collections, loops, forEach

The above piece of code is perhaps a good opportunity to discuss another application of the tell-don’t-ask principle, related to the collections classes.

So in the example above we used a for loop, in which we told each grade what to do if it should count. That’s all kind of nice, and all, but there is a problem: The use of the for loop in the first place.

So let’s discuss this for a minute, the very nature of using a for loop. In some sense we are intruding upon the job of the List interface, and more generally the Collection interface. These interfaces are perfectly capable of performing some task over all their elements thank you very much, they don’t need you taking that process over with your own for loop. In fact, they both provide a wonderful forEach method. All we need to do is tell them what to do for each element, and we can do that now using a lambda function or method reference. Here’s how this example would look like with that in mind:

double getTotalPoints() {
  TotalTracker total = new TotalTracker();
  grades.forEach(g -> g.doIfCountsForCredit(total::add));
  return total.getTotal();
  }

I’ll grant you this looks perhaps just as complicated, depending on your comfort level with lambda functions. But it has one important feature: It lets the collection be in charge of the iteration, as it should be. This is again the tell-don’t-ask principle: Let the collection take care of the iteration part, while you focus on what to do with the individual elements.

Subsection 7.2.3 Streams, stream-processing methods

This is still a bit unsatisfactory, let me try to explain why: for loops fall into a number of common patterns, and simply writing a for loop, or simply calling the forEach method, doesn’t immediately show this pattern. Here are some example patterns:

Perform some step on each element
Accumulate some value over all the elements
Filter the list of values by eliminating some that should not be considered
Transform each value of the list into a new value

Take a moment to recall various loops you have written in your life, and consider which of these patterns they fall under. They may actually combine more than one pattern.

These patterns, and combinations of these patterns, cover most of our use cases with for loops. But when we use a for loop we make no distinction between them: Someone has to read more of the loop structure to understand which case it is. And this is important: loops don’t accurately communicate the code’s intent. In our example for instance what we have is really the second case in the list (perhaps combined with the third case depending on whether you want to consider the "counts for credit" conditional or not); but we instead use the first form, by using the forEach method.

You are already familiar with a similar concept, even though you perhaps did not think of it this way. The for loops themselves are another example of this principle of "communicating intent". Do we need them, or could we just use while loops instead? Of course we could (and if you don’t know how you should pause the reading right now and write a while loop that does the job of a for loop); but we don’t do that, because a for loop better expresses the idea of an iteration with a clear and fixed number of steps, while a while loop is reserved for more complex iteration patterns. And we use the appropriate syntax structure for the job.

Choose code structures that best communicate your intent.

What I am talking about is the next step in this evolution, namely using different syntaxes for the different kinds of for loops. This is very common in functional programming languages, and it goes under the following names:

iterate, which is what forEach is for.
reduce or collect are other names for accumulate; they suggest reducing the whole list of values into one (e.g. summing them), or collecting all the values into one whole in some way.
filter where we want to restrict our focus to a subset of the elements.
map when we want to transform each element to a new form in a prescribed way.

Unfortunately lists don’t provide all this functionality. Lucky for us a new structure, that of a Stream, does, so modern Java versions can take advantage of this. I will not be doing streams justice by saying this, but for us, for the purposes of this book, you can basically think of streams like lists. And in fact we can convert a list to a stream by calling its stream() method. Streams then allow us to perform other operations back to back until a desired result. Here’s how this might look like in our example, returning for the moment to an earlier version that didn’t use the Total helper class:

double getTotalPoints() {
  return grades.stream()
      .filter(g -> g.countsForCredit())
      .map(g -> g.getPoints())
      .reduce(0.0, (t, p) ->t + p);
}

Let’s explain this a bit. We are using method chaining, which is something you need to get used to, especially when working with streams. The idea is that we have a sequence of method alls, each happening in the result of the previous call: a.do1(...).do2(...).do3(...)

In general you want to avoid such deep chains, because they suggest a violation of the tell-don’t ask principle, as we pry more and more into other objects’ business (depending on a friend of a friend of a friend of a friend ...). There is a code smell related to this, called train wreck. But there is one exception to this, when we are effectively indicating a sequence of steps to be performed on a stream, like in the example above. This is often referred to as a pipeline.

Before we adjourn, let’s clean the above up a bit, using method references:

double getTotalPoints() {
  return grades.stream()
    .filter(Grade::countsForCredit)
    .map(Grade::getPoints)
    .reduce(0.0, Double::sum);
}

Nice and clean! Once you get used to it. It clearly explains the steps. And here is a variant, using a so-called DoubleStream and its sum method, and mapToDouble. Lots of cool features in streams. We could probably spend whole chapters on them.

double getTotalPoints() {
  return grades.stream()
    .filter(Grade::countsForCredit)
    .mapToDouble(Grade::getPoints)
    .sum();
}

Prev Top Next