Based off of section 12.5 from the book.
Goal: Create an index for a document: makeIndex :: Doc -> Index
Question: What types should Doc
and Index
have?
It makes sense to decompose makeIndex
into a “pipeline” of steps:
makeIndex :: Doc -> Index
makeIndex
= lines
>.> numberLines
>.> allNumberedWords
>.> sortWords
>.> intsToLists
>.> groupByWord
>.> eliminateSmallWords
lines
takes the document and splits it into a list of linesnumberLines
takes the list of lines and adds line-numbers to them, forming pairs.allNumberedWords
replaces each numbered line with a list of number-word pairs.sortWords
reorders the list of number-word pairs by word.intsToLists
turns each integer into an 1-integer list.groupByWord
puts together those lists corresponding to the same word.eliminateSmallWords
eliminates all words of length at most 4.What should be the types for each of these intermediate functions?
lines
is a built-in method. What is its type? Does that match our usage of it?
numberLines
is supposed to replace each line with the pair of an increasing number and the line. How can we implement that using list functions?
allNumberedWords
is supposed to take each line and split it into a list of words, then put those words together with the line’s number. We can split this in steps:
Write each step.
sortWords
sorts the list based on the word comparison.
intsToLists
turns each integer into a 1-element list. You can do this via a map
.
groupByWord
needs to put together the lists corresponding to the same word. You need to actually write a function for that one, with cases for two consecutive elements having th same word.
eliminateSmallWords
is a simple filter
.