Get notified of new posts
PuLP: past, present and future¶
I started using PuLP circa 2015 (already 10 years ago!). Looking at my personal and professional emails, I realize today that my first experience using this python library was following the Discrete Optimization Course from Coursera (great course by the way!, although it seems to not exist anymore). It was also at that time that I joined the PuLP user mailing list. I've realized that the best way I find the motivation to learn something technology-wise is to subscribe to a user mailing list.
At the time I was also starting my journey into python and software development. Until then, and still for many more years, I was an AIMMS user and developer. As such, I saw the world through the lens of an algebraic language: multi-dimensional parameters, indices, sets and many, many subsets. It was a very comfortable way of picturing problems, decisions, logic and data.
In 2017 I started my PhD in Operations Research and I had to choose the tools I wanted to use. I went for open source (except solvers) and PuLP fitted naturally as I had some experience with it. By 2019 I had become annoying active enough with questions, answers and change proposals that Stuart Mitchell (the maintainer at the time) asked if I was interested in becoming the new maintainer of the library. I said I had no experience doing that and he kindly offered to help for a while.
And that's how I became the maintainer for the last 6 years.
What changed when I joined¶
I think the most important thing that I did to the project was making it more developer friendly.
PuLP is a very small project, and that usually makes it very easy to browse the code. But at the time, it was not very easy for users to contribute new solvers or fixes to existing solvers. For example: all solver APIs were mixed together in the same file which made the file very hard to read and edit.
The other part that I think helped greatly was the increase in tests and integration with github: having a modern continuous integration flow meant we could try most solvers in most OS and python versions each time we changed something.
Now, PuLP supports 14 solvers: almost double the number of solvers we had before I joined.
Finally, and this is both an on-going challenge and proof that PuLP is finally developer friendly: we've type-hinted most of the repository. It's an on-going challenge because there are still parts that are being ignored. And the proof of the latter comes from the fact that most of the work (if not all) has been carried out by several new and motivated contributors to the repository.
What makes PuLP special¶
I remember once Stuart mentioning PuLP's popularity was based on it including CBC (an open source solver) "by default". I agree this may be the single most important reason why new users choose it: it "just" works. Now it's even possible to install alternative solvers (such as gurobi, highs and cplex) as optional dependencies when installing PuLP (e.g., pip install pulp[highspy]).
Other valid reasons for the popularity include:
- it's very straight-forward and not very verbose.
- it connects with most well-known solvers.
- it offers enough functionality for 90% of use cases.
These reasons often compensate many of its weaknesses: slowness, and lack of specialized solver functionality.
My take on the PuLP vs Pyomo debate¶
Most people that write mathematical models in python choose Pyomo or pulp. There are many reasons to choose one or the other and I have the feeling that Pyomo is overall more used professionally. I will only comment about the style differences below:
I've seen that Pyomo is closer to how algebraic languages define mathematical models (e.g., GAMS and AIMMS): you formally declare variables (\(x_{ij} \forall i \in I, j \in J_i\)), constraints over those variables (e.g., \(\sum_{j \in J_i} x_{ij} = 1 \forall i \in I\)) and then define their domains (e.g., the sets \(I\), \(J_i\)). This approach is very intuitive if you come from Academia, where you (1)formulate the model in paper, (2) then write it in code, and (3) finally test it with data. It's more rigorous (and verbose) as you need to formally declare every set, subset or combination of indices that you will use, as you'd have to in a scientific paper.
PuLP on the opposite side, is closer to a python-centered or software-centered modelling: variables and constraints are python objects that can be created and edited using python code. (1) you programmatically declare a list of variables [x_00, x_01, x_10, ...] and (2) then you use code to declare the right constraints over those variables (e.g., x_00 + x_01=1). This approach can be a bit more messy and less structured. But it is also simpler and more flexible. I prefer this one but I can see how others prefer the other one.
To take from better known differences between python packages: Pyomo is Django while PuLP is flask.
What's expected for the future in PuLP¶
PuLP needs to evolve with its user base and the user base is very wide: from students that are learning mathematical modelling + python to professionals as myself who have been using both for a decade or more.
I imagine if we think on the first group, we should be working in better documentation, more examples and guides. Maybe even re-open the user mailing list. I'd love to hear from users with actual requests.
For the second group, I see a few possibilities:
- make the modeller more performant re-writing the core in rust as other python libraries have done.
- adding more solvers. Such as Google's CPSAT & Glop.
- add integrations to optimization clouds (e.g., NEOS, NextMV, Cornflow, Gurobi cloud, etc.).