Using it with PuLP

Copied from pulp:

PuLP is a free open source software written in Python. It is used to describe optimisation problems as mathematical models. PuLP can then call any of numerous external LP solvers (CBC, GLPK, CPLEX, Gurobi etc) to solve this model and then use python commands to manipulate and display the solution.

Nomenclature

Variable declaration

A group of decision variables that share the same naming and significance. In PuLP they are all created within a single call to pulp.LpVariable.dicts() and they are all accessed by slicing the created python variable as a dictionary. An optimization problem can have one variable declaration but thousands of variables.

Variable domain

The complete space over which variables in a single variable declaration are created. The length of the domain is the number of variables in the declaration. Each variable declaration has its own domain and is indicated with the argument index when calling the function pulp.LpVariable.dicts(). It usually consists of some kind of iterator where all its elements should be unique.

When using pulp, I use the following guidelines:

  1. Use pulp.LpVariable.dicts() for variable declaration, unless there is a very good reason not to.

  2. Use a list of tuples to declare the variable domain.

  3. Use pytups to index parts of the domain for clean, pretty and fast constraint generation.

Example: Scheduling problem

Assume we are given a set of jobs \(j\) with processing times \(p_j\). For a single machine non-preemptive scheduling problem (which we assume here), a schedule of the jobs \(N\) is given by specifying a start time, \(Y_j\) , for each job. If job j starts at time \(Y_j\), it is processing during the time interval \([Y_j , Y_j + p_j ]\). A schedule is feasible if no two jobs have overlapping processing times.

This means that, for every pair of jobs \(i\) and \(j\), we have that either \(Y_j \geq Y_i +p_i\) , or \(Y_i \geq Y_j +p_j\).

Job \(j\) will complete at time \(C_j =(Y_j + p_j )\), and if it also has an associated due date, \(d_j\) , its tardiness, \(T_j\) , is defined to be:

\begin{eqnarray} T_j & = & (C_j - d_j)^+ \\ T_j & = & (Y_j + p_j - d_j)^+ \end{eqnarray}

If we are given a set of weights, \(w_j\) , which indicate a relative importance between individual jobs, we can calculate the total weighted tardiness of a given feasible schedule:

\begin{eqnarray} TWT & = & \sum_{j \in N} w_j T_j \\ TWT & = & \sum_{j \in N} w_j (Y_j + p_j - d_j)^+ \end{eqnarray}

The objective is to schedule all the jobs without overlapping them and minimizing the total weighted tardiness \(TWT\).

Mathematical formulation

The following is just one of many possible formulations to solve this problem.

SETS

\(j \in J\)

jobs

\(k \in K = \{1, ..., C_{max}\}\)

periods

DATA [UNITS]

\(p_j\)

processing time of job j [periods]

\(d_j\)

due date of job j [period]

\(w_j\)

weight or priority of job j [dimensionless]

DERIVED DATA

\(C_{max}\)

Size of planning horizon.

\(JK\)

Al possible starting combinations (j, k) such that job \(j \in J\) can start in period \(k \in K\).

\(K_j\)

Al possible starting periods for job \(j \in J\).

\(t_{jk}\)

Penalty for starting job \(j \in J\) in period \(k \in K\).

\(K2_{jk}\)

Periods that become unavailable by starting job \(j\) in period \(k \in K\).

\(JK_{k2}\)

All possible starts of jobs \((j, k) \in JK\) that make period \(k2 \in K\) unavailable.

\begin{eqnarray} &C_{max} &= &\sum_{j \in J} p_j \\ &JK &= &\{(j \in J, k \in K) \mid k + p_j \leq C_{max} \} \\ &K_j &= &\{k \in K \mid (j, k) \in JK \} & j \in J \\ &K2_{jk} &= &\{k \in K \mid k \leq k2 \leq k + p_j \} & (j, k) \in JK \\ &JK_{k2} &= &\{(j, k) \in JK \mid k_2 \in K2_{jk} \} & k_2 \in K \\ &t_{jk} &= &\max\{k + p_j -1 - d_j, 0\} \times w_j & (j, k) \in JK \\ \end{eqnarray}

DECISION VARIABLES

\(X_{jk}\). Binary. 1 if job \(j\) starts at time slot \(k\), 0 otherwise. \((j, k) \in JK\)

FORMULATION

\begin{eqnarray} \min \sum_{(j, k) \in JK} t_{jk} X_{jk} \\ \end{eqnarray}

Subject to:

The first set of constraints enforces that each job is scheduled only once. The second set of constraints guarantees that each period of time is used by only one job. \begin{eqnarray} & \sum_{k \in K_j} X_{jk} = 1 & j \in J \\ & \sum_{(j, k) \in JK_{k2}} X_{jk} = 1 & k2 \in K \\ \end{eqnarray}

Implementation using pulp and pytups

We import libraries and get input data.

import pulp as pl
import pytups as pt

# Get input data
duration = {1: 8, 2: 12, 3: 6, 4: 13, 5: 2, 6: 12, 7: 15, 8: 20, 9: 16, 10: 16, 11: 10, 12: 14, 13: 17, 14: 3, 15: 20, 16: 19, 17: 13, 18: 6, 19: 18, 20: 11}
priority = {1: 4, 2: 5, 3: 6, 4: 8, 5: 4, 6: 2, 7: 5, 8: 6, 9: 1, 10: 8, 11: 8, 12: 4, 13: 4, 14: 8, 15: 9, 16: 11, 17: 6, 18: 10, 19: 6, 20: 9}
duedate = {1: 52, 2: 80, 3: 133, 4: 150, 5: 53, 6: 133, 7: 113, 8: 107, 9: 75, 10: 133, 11: 77, 12: 126, 13: 117, 14: 72, 15: 111, 16: 111, 17: 117, 18: 69, 19: 115, 20: 68}

We then calculate intermediate sets and parameters. Note the use of pytups functions to filter and convert from tuple lists to dictionaries.

# Get intermediate paramters, sets.
C_max = sum(duration.values())
periods = range(C_max)
tasks = duration.keys()

# all legal combinations of task-period assignment
jk_all = pt.TupList((t, p) for t in tasks for p in periods)

# we filter the starts that are too late to be possible:
JK = jk_all.filter_list_f(lambda x: x[1] + duration[x[0]] <= C_max)

# we create a set of tasks that can start at time period k
K_j = JK.to_dict(result_col=1)

# all combinations (t, p, p2) such that I start a task j 
# in time period k and is active in period k2
jkk2 = pt.TupList((j, k, k2) for j, k in JK 
				  for k2 in range(k, k + duration[j]))

# given a period k2, what starts affect make it unavailable:
JK_k2 = jkk2.to_dict(result_col=[0, 1])

# given a start (j, k), what periods k2 are made unavailable:
K2_jk = jkk2.to_dict(result_col=2)

# for each possible start: how late will the task be
t_jk = {(j, k): max(duration[j] + k -1 - duedate[j], 0) * 
				priority[j] for j, k in JK}

We now create the PuLP model and solve it.

# model construction with PuLP
model = pl.LpProblem("Scheduling", pl.LpMinimize)
X = pl.LpVariable.dicts(name='start', indexs=JK,
                            lowBound=0, upBound=1, cat=pl.LpInteger)

# objective function:
model += pl.lpSum(X[j, k] * t_jk[j, k] for j, k in JK)

# one and only one start per task
for j in tasks:
    model += pl.lpSum(X[j, k] for k in K_j[j]) == 1

# only one task is active at any moment:
for k2 in periods:
    model += pl.lpSum(X[j, k] for j, k in JK_k2[k2]) == 1

# solve model:
solver = pl.PULP_CBC_CMD(msg=True)
# solver = pl.CPLEX_CMD(msg=True)
model.solve(solver)

We get the solution from the variable contents. Not how we also use pytups to extract the content from the variable.

# get tasks starts
starts_out = pt.SuperDict(X).vapply(pl.value).clean().keys_l()

solution = [-1 for p in periods]
for j, k in starts_out:
    for p2 in K2_jk[j, k]:
        solution[p2] = j

print('solution is \n{}'.format(solution))

Finally, we do some tests on the solution using pytups to guarantee the solution is feasible:

# verify that all periods are filled:
# counting if there is any with a minus -1
pt.TupList(solution).filter_list_f(lambda v: v == -1)

# verify that each task t appears exactly duration[t] times
task_count = pt.SuperDict().fill_with_default(keys=duration.keys())
for period, task in enumerate(solution):
    task_count[task] += 1
task_count.apply(lambda k, v: duration[k] - v).clean()

The whole code is shown below:

import pulp as pl
import pytups as pt

# Get input data
duration = {1: 8, 2: 12, 3: 6, 4: 13, 5: 2, 6: 12, 7: 15, 8: 20, 9: 16, 10: 16, 11: 10, 12: 14, 13: 17, 14: 3, 15: 20, 16: 19, 17: 13, 18: 6, 19: 18, 20: 11}
priority = {1: 4, 2: 5, 3: 6, 4: 8, 5: 4, 6: 2, 7: 5, 8: 6, 9: 1, 10: 8, 11: 8, 12: 4, 13: 4, 14: 8, 15: 9, 16: 11, 17: 6, 18: 10, 19: 6, 20: 9}
duedate = {1: 52, 2: 80, 3: 133, 4: 150, 5: 53, 6: 133, 7: 113, 8: 107, 9: 75, 10: 133, 11: 77, 12: 126, 13: 117, 14: 72, 15: 111, 16: 111, 17: 117, 18: 69, 19: 115, 20: 68}



# Get intermediate paramters, sets.
C_max = sum(duration.values())
periods = range(C_max)
tasks = duration.keys()

# all legal combinations of task-period assignment
jk_all = pt.TupList((t, p) for t in tasks for p in periods)

# we filter the starts that are too late to be possible:
JK = jk_all.filter_list_f(lambda x: x[1] + duration[x[0]] <= C_max)

# we create a set of tasks that can start at time period k
K_j = JK.to_dict(result_col=1)

# all combinations (t, p, p2) such that I start a task j 
# in time period k and is active in period k2
jkk2 = pt.TupList((j, k, k2) for j, k in JK 
				  for k2 in range(k, k + duration[j]))

# given a period k2, what starts affect make it unavailable:
JK_k2 = jkk2.to_dict(result_col=[0, 1])

# given a start (j, k), what periods k2 are made unavailable:
K2_jk = jkk2.to_dict(result_col=2)

# for each possible start: how late will the task be
t_jk = {(j, k): max(duration[j] + k -1 - duedate[j], 0) * 
				priority[j] for j, k in JK}



# model construction with PuLP
model = pl.LpProblem("Scheduling", pl.LpMinimize)
X = pl.LpVariable.dicts(name='start', indexs=JK,
                            lowBound=0, upBound=1, cat=pl.LpInteger)

# objective function:
model += pl.lpSum(X[j, k] * t_jk[j, k] for j, k in JK)

# one and only one start per task
for j in tasks:
    model += pl.lpSum(X[j, k] for k in K_j[j]) == 1

# only one task is active at any moment:
for k2 in periods:
    model += pl.lpSum(X[j, k] for j, k in JK_k2[k2]) == 1

# solve model:
solver = pl.PULP_CBC_CMD(msg=True)
# solver = pl.CPLEX_CMD(msg=True)
model.solve(solver)



# get tasks starts
starts_out = pt.SuperDict(X).vapply(pl.value).clean().keys_l()

solution = [-1 for p in periods]
for j, k in starts_out:
    for p2 in K2_jk[j, k]:
        solution[p2] = j

print('solution is \n{}'.format(solution))



# verify that all periods are filled:
# counting if there is any with a minus -1
pt.TupList(solution).filter_list_f(lambda v: v == -1)

# verify that each task t appears exactly duration[t] times
task_count = pt.SuperDict().fill_with_default(keys=duration.keys())
for period, task in enumerate(solution):
    task_count[task] += 1
task_count.apply(lambda k, v: duration[k] - v).clean()