4. Scheduling with Deadlines
This write up is borrowed (with minor adaptations) from Chapter E of Jeff Erickson’s textbook on Algorithms.
Suppose you have tasks to complete in days; each task requires your attention for a full day. Each task comes with a deadline, the last day by which the job should be completed, and a penalty that you must pay if you do not complete each task by its assigned deadline. What order should you perform your tasks in to minimize the total penalty you must pay?
More formally, you are given:
- an array of deadlines and
- an array of penalties.
Each deadline is an integer between 1 and , and each penalty is a non-negative real number. A schedule is a permutation of the integers .
The scheduling problem asks you to find a schedule that minimizes the following cost:
This doesn’t look anything like a matroid optimization problem. For one thing, matroid optimization problems ask us to find an optimal set; this problem asks us to find an optimal permutation. Surprisingly, however, this scheduling problem is actually a matroid optimization in disguise!
For any schedule , call tasks such that late or delayed, and all other tasks on time.
To see this, consider that two schedules are equivalent from a cost perspective if they delay the same set of tasks. In particular, let be some subset of tasks. Suppose we have a schedule that delays all tasks in and finishes all tasks that are not in on time, and another schedule that also delays all tasks in and finishes all tasks that are not in on time. Observe that . Intuitively, we might as well turn our attention to subsets of completed tasks as opposed to the sequence in which we execute them. We summarize our discussion here in the following claim:
The cost of a schedule is determined by the subset of tasks that are on time.
Call a subset of the tasks realistic if there is a schedule in which every task in is on time. We can precisely characterize the realistic subsets as follows. Let denote the subset of tasks in whose deadline is on before :
In particular, and .
Lemma. Let be an arbitrary subset of the tasks. is realistic if and only if for every integer .
Proof. Let be a schedule in which every task in is on time. Let be the th task in to be completed. On the one hand, we have , since otherwise, we could not have completed other jobs in before . On the other hand, , because is on time. We conclude that , which immediately implies that .
Now suppose for every integer . If we perform the tasks in in increasing order of deadline, then we complete all tasks in with deadlines or less by day . In particular, for any , we perform task on or before its deadline . Thus, is realistic.
We can now define a canonical schedule for any set as follows: execute the tasks in in increasing deadline order, and then execute the remaining tasks in any order. The previous proof implies that a set is realistic if and only if every task in is on time in the canonical schedule for . Thus, our scheduling problem can be rephrased as follows:
So we’re looking for optimal subsets after all! Now we could describe a greedy algorithm and show that it works out, or…
Lemma. The collection of realistic sets of jobs forms a matroid.
Proof: The empty set is vacuously realistic, and any subset of a realistic set is clearly realistic. Thus, to prove the lemma, it suffices to show that the exchange property holds. Let and be realistic sets of jobs with .
Consider keeping two dated notebooks, where the cover page is day 0, and the last page is day . In a thought experiment, let us go over the days one by one, starting from the 0-th day, and on each day, we make a note of the number of tasks in on the first notebook, and the number of tasks in on the second notebook — and we keep doing this until we observe that the number on the first notebook is bigger than the second. Notice that this has to happen at some point, since and both notebooks start at zero.
Now, let be the smallest integer for which . Notice that this means that there is at least one task in with a deadline of which is not in . Pick any one such — say — for inclusion in : we claim that is realistic. Indeed, on any day that is before , we can appeal to the fact that was realistic up to day and we have introduced no new tasks in that have a deadline of or earlier. On the fated day , the number of tasks we have in is at most the number of tasks we have in up to day , and since that is at most , we are done.
This lemma implies that our scheduling problem is actually a matroid optimization problem, so the greedy algorithm finds the optimal schedule.
1 . . n], P[1 . . n]):
GREEDYSCHEDULE(D[in increasing order, and permute D to match
Sort P 0
j = for i in 1,n:
1] = i
X[j+if X[1,...,j+1] is realistic
1
j = j+for X[1...j] return the canonical schedule
We leave as an exercise to the reader to figure out an implementation of the oracle access to the family of realistic sets. In particular, this involves coming up an efficient procedure to test whether a given subset of jobs is realistic. It is possible to do this in linear time based on the properties of realistic sets established from before.