4. Scheduling with Deadlines
Suppose you have n tasks to complete in n days; each task requires your attention for a full day. Each task comes with a deadline, the last day by which the job should be completed, and a penalty that you must pay if you do not complete each task by its assigned deadline. What order should you perform your tasks in to minimize the total penalty you must pay?
More formally, you are given:
- an array D[1,...,n] of deadlines and
- an array P[1,...,n] of penalties.
Each deadline D[i] is an integer between 1 and n, and each penalty P[i] is a non-negative real number. A schedule is a permutation of the integers \{1,2, \ldots, n\}.
The scheduling problem asks you to find a schedule \pi that minimizes the following cost: \operatorname{cost}(\pi):=\sum_{i=1}^n P[i] \cdot[\pi(i)>D[i]].
This doesn’t look anything like a matroid optimization problem. For one thing, matroid optimization problems ask us to find an optimal set; this problem asks us to find an optimal permutation. Surprisingly, however, this scheduling problem is actually a matroid optimization in disguise!
For any schedule \pi, call tasks i such that \pi(i)>D[i] late or delayed, and all other tasks on time.
To see this, consider that two schedules are equivalent from a cost perspective if they delay the same set of tasks. In particular, let S \subseteq [n] be some subset of tasks. Suppose we have a schedule \pi_1 that delays all tasks in S and finishes all tasks that are not in S on time, and another schedule \pi_2 \neq \pi_1 that also delays all tasks in S and finishes all tasks that are not in S on time. Observe that \operatorname{cost}(\pi_1) = \operatorname{cost}(\pi_2). Intuitively, we might as well turn our attention to subsets of completed tasks as opposed to the sequence in which we execute them. We summarize our discussion here in the following claim:
Call a subset X of the tasks realistic if there is a schedule \pi in which every task in X is on time. We can precisely characterize the realistic subsets as follows. Let X(t) denote the subset of tasks in X whose deadline is on before t :
X(t):=\{i \in X \mid D[i] \leq t\}
In particular, X(0)=\varnothing and X(n)=X.
We can now define a canonical schedule for any set X as follows: execute the tasks in X in increasing deadline order, and then execute the remaining tasks in any order. The previous proof implies that a set X is realistic if and only if every task in X is on time in the canonical schedule for X. Thus, our scheduling problem can be rephrased as follows:
So we’re looking for optimal subsets after all! Now we could describe a greedy algorithm and show that it works out, or…
This lemma implies that our scheduling problem is actually a matroid optimization problem, so the greedy algorithm finds the optimal schedule.
1 . . n], P[1 . . n]):
GREEDYSCHEDULE(D[in increasing order, and permute D to match
Sort P 0
j = for i in 1,n:
1] = i
X[j+if X[1,...,j+1] is realistic
1
j = j+for X[1...j] return the canonical schedule
We leave as an exercise to the reader to figure out an implementation of the oracle access to the family of realistic sets. In particular, this involves coming up an efficient procedure to test whether a given subset of jobs is realistic. It is possible to do this in linear time based on the properties of realistic sets established from before.