Using it with PuLP¶
Copied from pulp
:
PuLP is a free open source software written in Python. It is used to describe optimisation problems as mathematical models. PuLP can then call any of numerous external LP solvers (CBC, GLPK, CPLEX, Gurobi etc) to solve this model and then use python commands to manipulate and display the solution.
Nomenclature¶
- Variable declaration¶
A group of decision variables that share the same naming and significance. In PuLP they are all created within a single call to
pulp.LpVariable.dicts()
and they are all accessed by slicing the created python variable as a dictionary. An optimization problem can have one variable declaration but thousands of variables.- Variable domain¶
The complete space over which variables in a single variable declaration are created. The length of the domain is the number of variables in the declaration. Each variable declaration has its own domain and is indicated with the argument index when calling the function
pulp.LpVariable.dicts()
. It usually consists of some kind of iterator where all its elements should be unique.
When using pulp
, I use the following guidelines:
Use
pulp.LpVariable.dicts()
for variable declaration, unless there is a very good reason not to.Use a list of tuples to declare the variable domain.
Use pytups to index parts of the domain for clean, pretty and fast constraint generation.
Example: Scheduling problem¶
This example was taken with permission from the Naval Postgraduate School course Mathematical Programming given by Matt Carlyle in 2019.
Assume we are given a set of jobs \(j\) with processing times \(p_j\). For a single machine non-preemptive scheduling problem (which we assume here), a schedule of the jobs \(N\) is given by specifying a start time, \(Y_j\) , for each job. If job j starts at time \(Y_j\), it is processing during the time interval \([Y_j , Y_j + p_j ]\). A schedule is feasible if no two jobs have overlapping processing times.
This means that, for every pair of jobs \(i\) and \(j\), we have that either \(Y_j \geq Y_i +p_i\) , or \(Y_i \geq Y_j +p_j\).
Job \(j\) will complete at time \(C_j =(Y_j + p_j )\), and if it also has an associated due date, \(d_j\) , its tardiness, \(T_j\) , is defined to be:
If we are given a set of weights, \(w_j\) , which indicate a relative importance between individual jobs, we can calculate the total weighted tardiness of a given feasible schedule:
The objective is to schedule all the jobs without overlapping them and minimizing the total weighted tardiness \(TWT\).
Mathematical formulation¶
The following is just one of many possible formulations to solve this problem.
SETS
\(j \in J\) |
jobs |
\(k \in K = \{1, ..., C_{max}\}\) |
periods |
DATA [UNITS]
\(p_j\) |
processing time of job j [periods] |
\(d_j\) |
due date of job j [period] |
\(w_j\) |
weight or priority of job j [dimensionless] |
DERIVED DATA
\(C_{max}\) |
Size of planning horizon. |
\(JK\) |
Al possible starting combinations (j, k) such that job \(j \in J\) can start in period \(k \in K\). |
\(K_j\) |
Al possible starting periods for job \(j \in J\). |
\(t_{jk}\) |
Penalty for starting job \(j \in J\) in period \(k \in K\). |
\(K2_{jk}\) |
Periods that become unavailable by starting job \(j\) in period \(k \in K\). |
\(JK_{k2}\) |
All possible starts of jobs \((j, k) \in JK\) that make period \(k2 \in K\) unavailable. |
DECISION VARIABLES
\(X_{jk}\). Binary. 1 if job \(j\) starts at time slot \(k\), 0 otherwise. \((j, k) \in JK\)
FORMULATION
Subject to:
Implementation using pulp and pytups¶
We import libraries and get input data.
"""
Example taken from the NPS course Mathematical Programming from Matt Carlyle in 2019.
"""
import pulp as pl
import pytups as pt
We then calculate intermediate sets and parameters. Note the use of pytups functions to filter and convert from tuple lists to dictionaries.
duration = {
1: 8,
2: 12,
3: 6,
4: 13,
5: 2,
6: 12,
7: 15,
8: 20,
9: 16,
10: 16,
11: 10,
12: 14,
13: 17,
14: 3,
15: 20,
16: 19,
17: 13,
18: 6,
19: 18,
20: 11,
}
priority = {
1: 4,
2: 5,
3: 6,
4: 8,
5: 4,
6: 2,
We now create the PuLP model and solve it.
9: 1,
10: 8,
11: 8,
12: 4,
13: 4,
14: 8,
15: 9,
16: 11,
17: 6,
18: 10,
19: 6,
20: 9,
}
duedate = {
1: 52,
2: 80,
3: 133,
4: 150,
5: 53,
6: 133,
7: 113,
We get the solution from the variable contents. Not how we also use pytups to extract the content from the variable.
10: 133,
11: 77,
12: 126,
13: 117,
14: 72,
15: 111,
16: 111,
17: 117,
18: 69,
Finally, we do some tests on the solution using pytups to guarantee the solution is feasible:
# Get intermediate paramters, sets.
C_max = sum(duration.values())
periods = range(C_max)
tasks = duration.keys()
# all legal combinations of task-period assignment
jk_all = pt.TupList((t, p) for t in tasks for p in periods)
The whole code is shown below:
"""
Example taken from the NPS course Mathematical Programming from Matt Carlyle in 2019.
"""
import pulp as pl
import pytups as pt
# Get input data
duration = {
1: 8,
2: 12,
3: 6,
4: 13,
5: 2,
6: 12,
7: 15,
8: 20,
9: 16,
10: 16,
11: 10,
12: 14,
13: 17,
14: 3,
15: 20,
16: 19,
17: 13,
18: 6,
19: 18,
20: 11,
}
priority = {
1: 4,
2: 5,
3: 6,
4: 8,
5: 4,
6: 2,
7: 5,
8: 6,
9: 1,
10: 8,
11: 8,
12: 4,
13: 4,
14: 8,
15: 9,
16: 11,
17: 6,
18: 10,
19: 6,
20: 9,
}
duedate = {
1: 52,
2: 80,
3: 133,
4: 150,
5: 53,
6: 133,
7: 113,
8: 107,
9: 75,
10: 133,
11: 77,
12: 126,
13: 117,
14: 72,
15: 111,
16: 111,
17: 117,
18: 69,
19: 115,
20: 68,
}
# Get intermediate paramters, sets.
C_max = sum(duration.values())
periods = range(C_max)
tasks = duration.keys()
# all legal combinations of task-period assignment
jk_all = pt.TupList((t, p) for t in tasks for p in periods)
# we filter the starts that are too late to be possible:
JK = jk_all.vfilter(lambda x: x[1] + duration[x[0]] <= C_max)
# we create a set of tasks that can start at time period k
K_j = JK.to_dict(result_col=1)
# all combinations (t, p, p2) such that I start a task j
# in time period k and is active in period k2
jkk2 = pt.TupList((j, k, k2) for j, k in JK for k2 in range(k, k + duration[j]))
# given a period k2, what starts affect make it unavailable:
JK_k2 = jkk2.to_dict(result_col=[0, 1])
# given a start (j, k), what periods k2 are made unavailable:
K2_jk = jkk2.to_dict(result_col=2)
# for each possible start: how late will the task be
t_jk = {(j, k): max(duration[j] + k - 1 - duedate[j], 0) * priority[j] for j, k in JK}
# model construction with PuLP
model = pl.LpProblem("Scheduling", pl.LpMinimize)
X = pl.LpVariable.dicts(
name="start", indexs=JK, lowBound=0, upBound=1, cat=pl.LpInteger
)
# objective function:
model += pl.lpSum(X[j, k] * t_jk[j, k] for j, k in JK)
# one and only one start per task
for j in tasks:
model += pl.lpSum(X[j, k] for k in K_j[j]) == 1
# only one task is active at any moment:
for k2 in periods:
model += pl.lpSum(X[j, k] for j, k in JK_k2[k2]) == 1
# solve model:
solver = pl.PULP_CBC_CMD(msg=True)
# solver = pl.CPLEX_CMD(msg=True)
model.solve(solver)
# get tasks starts
starts_out = pt.SuperDict(X).vapply(pl.value).clean().keys_l()
solution = [-1 for p in periods]
for j, k in starts_out:
for p2 in K2_jk[j, k]:
solution[p2] = j
print("solution is \n{}".format(solution))
# verify that all periods are filled:
# counting if there is any with a minus -1
pt.TupList(solution).vfilter(lambda v: v == -1)
# verify that each task t appears exactly duration[t] times
task_count = pt.SuperDict().fill_with_default(keys=duration.keys())
for period, task in enumerate(solution):
task_count[task] += 1
task_count.kvapply(lambda k, v: duration[k] - v).clean()