Get notified of new posts
Timefold to solve the IHTC problem¶
I recently attended a webinar on open source solvers organized by Optimization4All community. I asked what could the community do to help the development of these solvers. Geoffey De Smet answered that for the solver he maintains, Timefold, what helped was spreading the word: writing about its usages and experiences, good or bad.
This is me writing about my (so far only) experience using Timefold in a reasonably large problem. I have two goals:
- spreading the word about the existence of Timefold and
- find out if the way I used the solver was the correct one.
At the end of the article I leave the github repository with the code in case someone wants to look at it and, even better, make a PR to fix / improve it.
The problem¶
The problem is the same I described in a previous blog post: The Integrated Healthcare Timetable Competition. You can check there a (more) complete description of the problem.
When I decided to join the IHTC'24 competition my humble goal was to use it as an excuse to try out new solvers. And I did: I tried the CPSAT solver and Timefold. The previous blog post already covers part of the CPSAT model I built.
For those who do not want to read the previous post with the description, I leave here a very brief summary of the IHTC'24 problem: we want to schedule patients surgeries (admission date + operating theater + room), and assign nurses to rooms. We need to take into account the following constraints: patient urgency, room capacity, patient compatibility, surgeon and operating theater capacity, nurse workload and patient-nurse compatibility, among others. The problem instances are sufficiently large that they require some kind of decomposition / heuristic to find good solutions in a reasonable time.
The direct link to the contest, including the winners, is here: https://ihtc2024.github.io/
Timefold¶
Timefold is an open source solver built in Java that uses a combination of metaheuristics to explore solutions. What's most special about it (in my opinion) is that it offers an Object Oriented Programming (OOP) API that does not require optimization knowledge to use.
Timefold has a Timetable example and Employee Scheduling in their documentation. So my intuition was that this Scheduling / Timetabling problem is a good fit for the solver.
Timefold python => SolverForge
At the time, I used the python API for the solver, which has limitations (in performance, most importantly). In recent months, they have deprecated it and it is no longer supported by the Timefold company but by the community. Having said that, the syntax, modelling and best practices (if any) still apply.
The modeling¶
As someone that is more used to having all the pre-processing out of (and agnostic from) the engine implementation, I found the formulation of a Timefold-based engine a bit too verbose and opinionated. I had to create an object for each relevant part of the problem (Nurse, Theater, Room, Occupant, Patient, Surgeon, OperatingTheater, etc.) and I had to fit (i.e., format) the data I already had into attributes for each object.
Having said that, this step was relatively straightforward, following the documentation and examples.
The variables¶
The variables where included in the Patient object (admission date, theater and room) and on the ShiftAssignment (nurse for each room).
I start with the ShiftAssignment, as it's more simple. I did not find intuitive the way to tell the solver that I had a restricted set of nurses to assign to a given room-shift.
Below, I had to create an attribute named nurse_options and then pass a reference to that attribute in the actual variable attribute (nurse):
from timefold.solver.domain import (
ValueRangeProvider,
PlanningVariable,
PlanningId,
planning_entity,
)
from typing import Annotated
from pydantic import Field
from .json_serialization import JsonDomainBase
@planning_entity
class ShiftAssignment(JsonDomainBase):
id: Annotated[str, PlanningId]
room: Annotated[Room, Field(default=None)]
shift: Annotated[int, Field(default=0)]
nurse_options: Annotated[
set[Nurse], Field(default_factory=set), ValueRangeProvider(id="nurse_options")
]
nurse: Annotated[
Nurse,
PlanningVariable(value_range_provider_refs=["nurse_options"]),
Field(default=None),
]
The Patient variables were a bit more tricky. One thing that I chose to do, which may or may not have been a good idea, was to add shadow variables inside the Patient that stores all the shifts and days the patient will stay in the room. These two variables are really based on the admission date as the duration of the stay is a parameter.
Here's the implementation for the Patient object with the two main variable attributes room and admission, and the two shadow variables stay and stay_shifts:
from timefold.solver.domain import (
ValueRangeProvider,
PlanningVariable,
PlanningId,
planning_entity,
DeepPlanningClone,
ShadowVariable,
PiggybackShadowVariable,
)
from typing import Annotated
from pydantic import Field
from .json_serialization import JsonDomainBase
class OccupantPatient(JsonDomainBase):
age_group: int
gender: str
length_of_stay: int
skill_level_required: list[int]
workload_produced: list[int]
@planning_entity
class Patient(OccupantPatient):
id: Annotated[str, PlanningId]
mandatory: bool
surgeon: Surgeon
surgery_duration: int
surgery_release_day: int
room_options: Annotated[set[Room], ValueRangeProvider(id="room_options")]
theater: Annotated[Theater | None, PlanningVariable, Field(default=None)]
# only admission is allows_unassigned=True
admission: Annotated[
int | None,
PlanningVariable(value_range_provider_refs=["admission_options"]),
Field(default=None),
]
room: Annotated[
Room | None,
PlanningVariable(
value_range_provider_refs=["room_options"], allows_unassigned=True
),
Field(default=None),
]
# TODO: we could use CountableValueRange?
admission_options: Annotated[set[int], ValueRangeProvider(id="admission_options")]
config: Configuration
# shadow variable with the patient complete stay days:
stay: Annotated[
list[int],
DeepPlanningClone,
ShadowVariable(
variable_listener_class=PatientUpdatingVariableListener,
source_variable_name="admission",
),
Field(default_factory=list),
]
# shadow variable with the patient complete stay shifts:
stay_shifts: Annotated[
list[list[int]],
DeepPlanningClone,
PiggybackShadowVariable(shadow_variable_name="stay"),
Field(default_factory=list),
]
I had a particular issue when generating a list of "days to stay" as a shadow variable of the admission date. Because I discovered, via very unhelpful errors, that I was not aloud to assign a new list to the shadow variable: I could only edit it. And so, I had to "clean" + "add elements" instead, every time I wanted to edit it. I never found this to be documented.
See below the variable listener that I had to implement to fill the shadow variables:
PatientUpdatingVariableListener definition
from timefold.solver.domain import VariableListener
from timefold.solver.score import ScoreDirector
class PatientUpdatingVariableListener(VariableListener):
def after_entity_added(self, score_director: ScoreDirector, patient):
self.update_stay(score_director, patient)
def after_variable_changed(self, score_director: ScoreDirector, patient):
self.update_stay(score_director, patient)
@staticmethod
def update_stay(score_director: ScoreDirector, patient) -> None:
score_director.before_variable_changed(patient, "stay")
score_director.before_variable_changed(patient, "stay_shifts")
# for some reason we cannot replace the contents of the attribute, only modify it
if patient.room is None:
patient.stay.clear()
patient.stay_shifts.clear()
else:
admission = patient.admission
patient.stay.clear()
patient.stay_shifts.clear()
end = min(patient.config.num_days, admission + patient.length_of_stay)
patient.stay.extend(range(admission, end))
patient.stay_shifts.extend(enumerate(range(admission * 3, end * 3)))
score_director.after_variable_changed(patient, "stay")
score_director.after_variable_changed(patient, "stay_shifts")
The constraints¶
While learning about constraints in the documentation, I learned that there are two ways of adding constraints and one is the "recommended one". And that was the one I followed. It involves using special iterators to map the lists of objects and apply filters, joins and transformations (very similar to SQL really) into getting the correct number of violations, and giving them a penalty.
Most of the constraints, once the syntax was clear, where straightforward to implement (if you know SQL that is). There were a few situations though that required reading and re-reading the documentation several times and just trying it out to make it work.
For example, understanding how the flattening of vectors worked. And how to carry several objects (tuples, for example) over to the next statement and how to extract them through lambdas. I think a few additional examples on cases like these ones would have helped.
Here is an example of a simple constraint that:
- Groups all patients and occupants by room and day of stay
- Gets the max and min age of each group
- Applies a soft penalty based on the difference
def age_groups(factory):
return (
factory.for_each(OccupantPatient)
.filter(lambda p: p.room is not None)
.map(lambda p: p.room, lambda p: p.age_group, lambda p: p.stay)
.flatten_last(lambda p: p)
.group_by(
lambda r, ag, d: (r, d),
ConstraintCollectors.max(lambda r, ag, d: ag),
ConstraintCollectors.min(lambda r, ag, d: ag),
)
.penalize(
HardSoftScore.ONE_SOFT, lambda tup, max_age, min_age: max_age - min_age
)
)
Here's a more complex example that re-uses code among more than one constraint.
The function mega_join_patient_shift just returns a large join between patients, the shifts they're staying and the shift-assignments in those shifts.
The function nurse_workload takes that large join and penalizes the excess workload of each nurse in each shift.
I'm not sure how Timefold caches these kind of expressions, and if it helps at all the solving task. In any case, this function was used in several places and helped with the code maintenance.
def mega_join_patient_shift(factory):
return (
factory.for_each(OccupantPatient)
.filter(lambda p: p.room is not None)
.map(lambda p: p, lambda p: p.stay_shifts)
.flatten_last(lambda p: p)
# p, (pos, s)
.join(
ShiftAssignment,
Joiners.equal(lambda p, s: p.room, lambda sa: sa.room),
Joiners.equal(lambda p, s: s[1], lambda sa: sa.shift),
)
# p, (pos, s), sa
)
def nurse_workload(factory):
return (
mega_join_patient_shift(factory)
# p, (pos, s), sa
.group_by(
lambda p, s, sa: (sa.nurse, s[1]),
ConstraintCollectors.sum(lambda p, s, sa: p.workload_produced[s[0]]),
)
# (n, s), sum
.map(lambda tup, _sum: _sum - tup[0].max_load[tup[1]])
# number
.filter(lambda v: v > 0)
.penalize(HardSoftScore.ONE_SOFT, lambda v: v)
.as_constraint(CONSTRAINTS.NURSE_WORKLOAD)
)
The penalties¶
The penalties part was easy to classify since the original problem had "hard" and "soft" constraints already and Timefold supports having both types. I did find a bit confusing to actually pass the weights to the engine though: I would have understood that I could do it as yet another attribute of a class in the model, because for me weights are really just any other parameter in a problem. Instead, I discovered that I had to pass it through a very different set of classes to the solving engine, as if weights are not really problem data.
Here's the model declaration with all its components (and weights):
from timefold.solver.domain import (
ValueRangeProvider,
planning_solution,
PlanningEntityCollectionProperty,
ProblemFactCollectionProperty,
PlanningScore,
ConstraintWeightOverrides,
)
from timefold.solver.score import (
HardSoftScore,
)
from typing import Annotated
from .json_serialization import JsonDomainBase, ScoreSerializer, ScoreValidator
from dataclasses import dataclass, field
# TODO: if I use pedantic (JsonDomainBase), it complains about ConstraintWeightOverrides
@dataclass
@planning_solution
class SurgerySchedule:
patients: Annotated[
list[Patient], PlanningEntityCollectionProperty, ValueRangeProvider
]
shiftAssignments: Annotated[
list[ShiftAssignment], PlanningEntityCollectionProperty, ValueRangeProvider
]
nurses: Annotated[list[Nurse], ProblemFactCollectionProperty, ValueRangeProvider]
rooms: Annotated[list[Room], ProblemFactCollectionProperty, ValueRangeProvider]
theaters: Annotated[
list[Theater], ProblemFactCollectionProperty, ValueRangeProvider
]
surgeons: Annotated[
list[Surgeon], ProblemFactCollectionProperty, ValueRangeProvider
]
occupants: Annotated[
list[Occupant], ProblemFactCollectionProperty, ValueRangeProvider
]
weight_overrides: ConstraintWeightOverrides
score: Annotated [
HardSoftScore | None,
PlanningScore,
ScoreSerializer,
ScoreValidator,
] = field(default=None)
The testing¶
As a result of some of the constraint and variable issues I had to solve, unit testing became necessary and I was actually pleasantly surprised of how easy it was to create unit tests of very simple problems by defining example classes.
Throughout the development, I extensively tested the engine to make sure I was actually considering the right feasible region. Here I made use of the unit tests too.
Here's a snippet of a single test for a constraint:
from timefold.solver.test import ConstraintVerifier
from ihtc2024.solver.timefold.constraints import room_capacity, define_constraints
from ihtc2024.solver.timefold.domain import Room, Patient, Surgeon, Theater, SurgerySchedule, ShiftAssignment
constraint_verifier = ConstraintVerifier.build(
define_constraints, SurgerySchedule, Patient, ShiftAssignment
)
class TestTimefold(unittest.TestCase):
def setUp(self):
self.room0 = Room(
id="r0",
capacity=[1, 1, 1],
min_max_age=[1, 1, 1, 1],
max_min_age=[3, 3, 3, 3],
)
self.surgeon1 = Surgeon(id="s1", maxSurgeryTime=[100])
self.theater1 = Theater(id="t1", availability=[100])
# (... also defines patient1 and patient2)
def test_room_capacity(self):
(
constraint_verifier.verify_that(room_capacity)
.given(
self.room0, self.patient1, self.patient2, self.surgeon1, self.theater1
)
.penalizes_by(1)
)
The performance¶
When I build the Timefold engine, I already had a CPSAT implementation of the problem that performed more or less well for testing instances and not so well for the competition instances (which were usually larger).
When testing Timefold, it did not performed well in the testing instances: it didn't find feasible solutions after 20 minutes (the time limit). Meaning, it always produced solutions that violated some hard constraint.
I tried passing feasible solutions to it and testing if it at least performed well at improving already feasible solutions. It did not do very well here either, unfortunately.
Further work¶
After I had checked everything and decided that it may just be the solver, I conceived trying the Kotlin API: it's not that different from the python one, and it is supposed to have the full efficiency of the java API. I never got around finishing the code, though.
I'm curious whether the underwhelming performance was related to:
- the python API inefficiency.
- the modeling of variables/ constraints not being "well done".
- the Timefold engine not being good enough to solve this problem (because of its nature, size or both).
Other issues¶
python multiprocessing and Timefold python
There was another thing that happened after dropping the timefold engine that drove me crazy for several hours.
I implemented an unrelated heuristic based on a graph-shortest-path approach as a way to getting good initial solutions for this problem. The heuristic had nothing to do with the timefold engine. And I was making use of the multiprocessing python library to speed up the creation of this very big graph.
And I started seeing the python child processes randomly going idle without finishing, showing any error or warning. I went nuts trying to find the cause. It ended up being some obscure incompatibility with the underlying java API that the timefold python package uses, apparently. Once I stop importing the package, everything started working as expected. I have no idea why this happened, but it did.
So I had to leave a comment in the solver catalog I had for this problem:
from typing import Union, Type
from ..core import Experiment
from .cp_sat import CpSAT
from .cp_sat_2step import CpSAT2Step
from .graph import Graph
from .graph_tw import GraphTW
from .graph_GRASP import GraphGRASP
from .graph_cpsat import GraphCP
# TODO: timefold imports jpype and it breaks the multiprocessing of python
# from .timefold import TimefoldPy
solvers = dict(
cpsat=CpSAT,
graph=Graph,
graph_tw=GraphTW,
cpsat2step=CpSAT2Step,
grahpGRASP=GraphGRASP,
graphCP=GraphCP,
)
testing being currently broken
I tried to run the tests as I was writing this article (many months after writing the code) and all of sudden they fail with a mysterious error, which I'd swear I did not have before. It would seem I'm missing some planning_entity decorator somewhere, but I'm unsure where:
java.lang.java.lang.IllegalStateException: java.lang.IllegalStateException: The entityClass (class ai.timefold.jpyinterpreter.types.wrappers.PythonObjectWrapper) has been specified as a planning entity in the configuration, but does not have a @PlanningEntity annotation.
Conclusions¶
I tried Timefold for IHTC competition problem. The OOP API is intuitive for simple problems, but it forces the expert user to spend time reading documentation in order to find a way to constraint combinations, add shadow variables and find the correct sequence of joins, filters and lambda functions to penalize constraints.
The repository with the code and the engine are available here in case anyone wants to try to find a way to make it work well/better. If someone wants to give Kotlin or Java a go with the contest data, that would also be great.