Skip to content
Get notified of new posts

Subscribe

* indicates required

Intuit Mailchimp

Decision science & standards

(Here I use the terms engine and solver interchangeably).

What helps a standard become a standard is a mix of luck, popularity and actual usefulness. The optimization community hasn't agreed on many standards, languages or libraries in the past. As a result, there are many ways to code, run and deploy optimization engines. And often people re-invent the wheel.

Examples of success stories using existing standards include: the MPS and LP file formats for Mixed Integer Programming (MIP) problem storage and communication. Even if these files are a bit outdated and no one has agreed on a single way to tackle their limitations (they're several decades old!).

Another example is Constraint Programming (CP), where there are not many widely used standards (MiniZinc is the one I know).

Here's an example of an LP and MPS file that represent the same toy problem:

\* test_zero_constraint *\
Minimize
obj: x + 4 y + 9 z
Subject To
c1: x + y <= 5
c2: x + z >= 10
c3: - y + z = 7
c4: w >= 0
_dummy: __dummy = 0
c5: __dummy <= 0
Bounds
 x <= 4
 -1 <= y <= 1
End
*SENSE:Minimize
NAME          test_zero_constraint
ROWS
 N  obj
 L  c1
 G  c2
 E  c3
 G  c4
 L  c5
COLUMNS
    w         c4         1.000000000000e+00
    x         c1         1.000000000000e+00
    x         c2         1.000000000000e+00
    x         obj        1.000000000000e+00
    y         c1         1.000000000000e+00
    y         c3        -1.000000000000e+00
    y         obj        4.000000000000e+00
    z         c2         1.000000000000e+00
    z         c3         1.000000000000e+00
    z         obj        9.000000000000e+00
RHS
    RHS       c1         5.000000000000e+00
    RHS       c2         1.000000000000e+01
    RHS       c3         7.000000000000e+00
    RHS       c4         0.000000000000e+00
    RHS       c5         0.000000000000e+00
BOUNDS
 UP BND       x          4.000000000000e+00
 LO BND       y         -1.000000000000e+00
 UP BND       y          1.000000000000e+00
ENDATA

Why (open) standards are useful

The Internet we know today exists because of open standards being adopted since the beginning of its history. Common standards allowed code and applications to be compatible, and to build one on top of the another, promoting an explosion of apps.

In the case of decision science, more widely-used standards would allow people to do just that: build on top of one another. For example, someone working in an optimization engine will more easily re-use the code from another engine and expand it with functionality. Another example: someone working in a planning application will easily (seamlessly?) change from one engine to another in order to solve a particular problem without editing the logic of the tool itself (the integration, the storage or the graphical user interface).

python as a rosetta stone

In the last decade python became the de-facto standard as the main modelling language for optimization, even if it's rarely part of an engine's core codebase. There are many python libraries that build a MIP problem and communicate with MIP solvers. In fact some would say too many: PuLP, pyomo, python-mip, ortools MathOpt. And MIP solvers, thanks to the MPS/LP formats, are ones of the most standardized. But also in CP, heuristics and metaheuristics, many engine implementations offer a python API.

Of course, there are many other modelling languages that are still based on other technologies. Like GAMS, AIMMS, AMPL for MIP problems. And many engines that do not have a python API yet. E.g., many CP solvers do not have a python API and timefold recently announced it's dropping support for python.

Still, python offers the best overall set of functionality: data wrangling (polars, pandas, pytups), data I/O (postgresql, json, sqlite, excel, ...), graphical user interface (pyside, ...), wide catalogue (pypi), ease of deployment (flask, fastapi) and ease of learning. Among many many other useful functionality (testing, Machine Learning, ...).

Everything has a python API, and that's why everyone wants to have a python API.

Below, I leave two examples where I explain how standardization could help (or already does) the use of other tools in the ecosystem.

Example 1: Vehicle Routing

I've been lately testing specialized Vehicle Routing Problem (VRP) solvers from python to solve a routing-like problem.

During my research, I found several routing engines: ortools routing, pyvrp, nextRoute, vroom, vrpy, vrpsolverEasy, HGS-CVRP. I even tested a few.

While doing the benchmarking, I honestly missed a python modeller that connects with all of these VRP solvers with a common API and allows to switch from one to the other. I really think someone should build that. And then maybe add some efficient implementations of more generic engines to the catalogue. For example, a nice column-generation implementation with Gurobi or HiGHS. And some models built on Hexaly or Timefold.

I was tempted to ask some LLM to write one for me. Because the VRP is such a common problem: it already includes a more-or-less standard file format and the functionality available for solvers is well documented and easy to compare. Still, standardization is not 100% and thus a trivial problem to tackle: not all engines offer exactly the same functionality and a lot of checks would have to be done to be sure the user is aware of simplifications or plain ignoring of some functionality.

Once built, it would look a bit like this:

from vrp_api_library import ORTOOLS_ROUTING, PYVRP, NEXT_ROUTE

# here you build your model with orders, depots, vehicles, time windows, cost and time matrices, etc.
# with the help of methods and objects inside the vrp_api_library package
my_model = ...

# you choose the right solver
my_solver = PYVRP(my_model)
# my_solver = NEXT_ROUTE(my_model)
# my_solver = ORTOOLS_ROUTING(my_model)

# then you solve the problem, passing the configuration to solve
#  which could generic, or solver dependent
status = my_solver.solve(timeLimit=60, msg=True)

# then you retrieve the solution if one is available:
if status==1:
  my_solution = my_solver.get_routes()
  print(my_solution)
else:
  print("No feasible solution found after time limit")

Example 2: Progress logs

Progress logs are very useful both during the solve phase and after it. During the solve, because they're addictive to watch they can motivate a user to stop the solving before the time limit is reached. Or because they sometimes give enough information to stop the search (is there a feasible solution for this problem? is the engine faster with this configuration?)

After the solve they're particularly useful to benchmark the engine performance. The most basic ones: how fast does it take for the engine to find an initial solution? and how much before an acceptable one is found? Other less basic: how good is the lineal relaxation of the root node, and what is the performance of the adding cuts phase in the performance? Is the solver spending its time with heuristics, with cuts, or pre-processing?

Here are a few truncated examples of logs from a few different engines:

Set parameter TimeLimit to value 7200
Set parameter MIPGap to value 0.0
Set parameter Threads to value 1

Gurobi Optimizer version 7.0.0 build v7.0.0rc3 (linux64)
Copyright (c) 2016, Gurobi Optimization, Inc.

Read MPS format model from file instances/miplib2010/bab5.mps.gz
Reading time = 0.57 seconds
bab5: 4964 rows, 21600 columns, 155520 nonzeros
Optimize a model with 4964 rows, 21600 columns and 155520 nonzeros
Variable types: 0 continuous, 21600 integer (0 binary)
Coefficient statistics:
  Matrix range     [9e-02, 8e+00]
  Objective range  [2e+01, 4e+03]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 5e+01]
Presolve removed 226 rows and 202 columns
Presolve time: 0.27s
Presolved: 4738 rows, 21398 columns, 68369 nonzeros
Variable types: 0 continuous, 21398 integer (21398 binary)

Root relaxation: objective -1.164152e+05, 16508 iterations, 0.58 seconds

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0 -116415.20    0   94          - -116415.20      -     -    1s
     0     0 -114009.28    0  279          - -114009.28      -     -    1s
     0     0 -114007.98    0  269          - -114007.98      -     -    1s
     0     0 -113171.75    0  295          - -113171.75      -     -    1s
     0     0 -113171.75    0  243          - -113171.75      -     -    1s
     0     0 -112661.95    0  253          - -112661.95      -     -    1s
     0     0 -112661.95    0  278          - -112661.95      -     -    1s
Starting CP-SAT solver v9.10.4067
Parameters: max_time_in_seconds: 30 log_search_progress: true relative_gap_limit: 0.01
Setting number of workers to 16

Initial optimization model '': (model_fingerprint: 0x1d316fc2ae4c02b1)
#Variables: 450 (#bools: 276 #ints: 6 in objective)
  - 342 Booleans in [0,1]
  - 12 in [0][10][20][30][40][50][60][70][80][90][100]
  - 6 in [0][10][20][30][40][100]
  - 6 in [0][80][100]
  - 6 in [0][100]
  - 6 in [0,1][34][67][100]
  - 12 in [0,6]
  - 18 in [0,7]
  - 6 in [0,35]
  - 6 in [0,36]
  - 6 in [0,100]
  - 12 in [21,57]
  - 12 in [22,57]
#kBoolOr: 30 (#literals: 72)
#kLinear1: 33 (#enforced: 12)
#kLinear2: 1'811
#kLinear3: 36
#kLinearN: 94 (#terms: 1'392)

Starting presolve at 0.00s
  3.26e-04s  0.00e+00d  [DetectDominanceRelations] 
  6.60e-03s  0.00e+00d  [PresolveToFixPoint] #num_loops=4 #num_dual_strengthening=3 
  2.69e-05s  0.00e+00d  [ExtractEncodingFromLinear] #potential_supersets=44 #potential_subsets=12 
[Symmetry] Graph for symmetry has 2'224 nodes and 5'046 arcs.
[Symmetry] Symmetry computation done. time: 0.000374304 dtime: 0.00068988
[Symmetry] #generators: 2, average support size: 12
[Symmetry] 12 orbits with sizes: 2,2,2,2,2,2,2,2,2,2,...
[Symmetry] Found orbitope of size 6 x 2
[SAT presolve] num removable Booleans: 0 / 309
[SAT presolve] num trivial clauses: 0
[SAT presolve] [0s] clauses:570 literals:1152 vars:303 one_side_vars:268 simple_definition:35 singleton_clauses:0
[SAT presolve] [3.0778e-05s] clauses:570 literals:1152 vars:303 one_side_vars:268 simple_definition:35 singleton_clauses:0
[SAT presolve] [4.6758e-05s] clauses:570 literals:1152 vars:303 one_side_vars:268 simple_definition:35 singleton_clauses:0
  1.10e-02s  9.68e-03d  [Probe] #probed=1'738 #new_bounds=12 #new_binary_clauses=1'111 
  2.34e-03s  0.00e+00d  [MaxClique] Merged 602(1374 literals) into 506(1960 literals) at_most_ones. 
  3.31e-04s  0.00e+00d  [DetectDominanceRelations] 
  1.89e-03s  0.00e+00d  [PresolveToFixPoint] #num_loops=2 #num_dual_strengthening=1 
  5.45e-04s  0.00e+00d  [ProcessAtMostOneAndLinear] 
  8.19e-04s  0.00e+00d  [DetectDuplicateConstraints] #without_enforcements=306 
  8.62e-05s  7.21e-06d  [DetectDominatedLinearConstraints] #relevant_constraints=114 #num_inclusions=42 
  1.94e-05s  0.00e+00d  [DetectDifferentVariables] 
  1.90e-04s  8.39e-06d  [ProcessSetPPC] #relevant_constraints=560 #num_inclusions=24 
  2.01e-05s  0.00e+00d  [FindAlmostIdenticalLinearConstraints] 
  3.11e-04s  3.99e-04d  [FindBigAtMostOneAndLinearOverlap] 
  9.85e-05s  1.96e-04d  [FindBigVerticalLinearOverlap] 
  1.61e-05s  5.11e-06d  [FindBigHorizontalLinearOverlap] #linears=42 
  2.88e-05s  1.80e-07d  [MergeClauses] 
  3.51e-04s  0.00e+00d  [DetectDominanceRelations] 
  1.79e-03s  0.00e+00d  [PresolveToFixPoint] #num_loops=1 #num_dual_strengthening=1 
  3.21e-04s  0.00e+00d  [DetectDominanceRelations] 
  1.74e-03s  0.00e+00d  [PresolveToFixPoint] #num_loops=1 #num_dual_strengthening=1 
Welcome to IBM(R) ILOG(R) CPLEX(R) Interactive Optimizer 12.7.1.0
  with Simplex, Mixed Integer & Barrier Optimizers
5725-A06 5725-A29 5724-Y48 5724-Y49 5724-Y54 5724-Y55 5655-Y21
Copyright IBM Corp. 1988, 2017.  All Rights Reserved.

Type 'help' for a list of available commands.
Type 'help' followed by a command name for more
information on commands.

CPLEX> CPLEX> Logfile 'cplex.log' closed.
CPLEX> New value for default parallel thread count: 1
CPLEX> New value for mixed integer optimality gap tolerance: 0
CPLEX> New value for type of clock used to measure time: 2
CPLEX> New value for time limit in seconds: 7200
CPLEX> New value for parallel optimization mode: 1
CPLEX> Selected objective sense:  MINIMIZE
Selected objective  name:  R5997
Selected RHS        name:  B
Selected bound      name:  BOUND
Problem 'instances/miplib2010/satellites1-25.mps.gz' read.
Read time = 0.02 sec. (3.44 ticks)
CPLEX> Problem is a minimization problem.
New sense ['max' or 'min']: No changes made.
CPLEX> CPXPARAM_TimeLimit                               7200
CPXPARAM_Threads                                 1
CPXPARAM_Parallel                                1
CPXPARAM_Read_APIEncoding                        "UTF-8"
CPXPARAM_MIP_Tolerances_MIPGap                   0
Tried aggregator 2 times.
MIP Presolve eliminated 930 rows and 846 columns.
MIP Presolve modified 3354 coefficients.
Aggregator did 408 substitutions.
Reduced MIP has 4658 rows, 7759 columns, and 50343 nonzeros.
Reduced MIP has 7716 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.04 sec. (49.97 ticks)
Found incumbent of value 3.000000 after 0.07 sec. (90.43 ticks)
Probing fixed 504 vars, tightened 0 bounds.
Probing time = 0.69 sec. (941.08 ticks)
Tried aggregator 2 times.
MIP Presolve eliminated 423 rows and 651 columns.
MIP Presolve modified 430 coefficients.
Aggregator did 21 substitutions.
Reduced MIP has 4214 rows, 7087 columns, and 45544 nonzeros.
Reduced MIP has 7044 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.04 sec. (54.77 ticks)
Probing time = 0.01 sec. (11.97 ticks)
Clique table members: 31378.
MIP emphasis: balance optimality and feasibility.
MIP search method: dynamic search.
Parallel mode: none, using 1 thread.
Root relaxation solution time = 0.59 sec. (960.37 ticks)

        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer    Best Bound    ItCnt     Gap

*     0+    0                            3.0000     -224.0000              --- 
      0     0      -20.0000   652        3.0000      -20.0000     4571  766.67%
      0     0      -20.0000   701        3.0000      Cuts: 16     6139  766.67%
      0     0      -20.0000   445        3.0000      Cuts: 13     6906  766.67%
      0     0      -20.0000   666        3.0000      Cuts: 84     8943  766.67%
      0     2      -20.0000   598        3.0000      -20.0000     8943  766.67%
Welcome to the CBC MILP Solver 
Version: 2.9.8 
Build Date: Jun 10 2016 

command line - /home/beck/math/opt/miplib/miplib2010-1.1.0/bin/cbc -import instances/miplib2010/enlight14.mps.gz -sec 7200 -threads 1 -ratio 0.0 -timeMode elapsed -solve -solution /home/beck/math/opt/miplib/miplib2010-1.1.0/results/solutions/benchmark.cbc.enlight14.mps.sol (default strategy 1)
At line 1 NAME           enlight14
At line 2 ROWS
At line 200 COLUMNS
At line 1519 RHS
At line 1521 BOUNDS
At line 1914 ENDATA
Problem enlight14 has 196 rows, 392 columns and 1120 elements
Coin0008I enlight14 read with 0 errors
seconds was changed from 1e+100 to 7200
threads was changed from 0 to 1
ratioGap was changed from 0 to 0
Option for timeMode changed from cpu to elapsed
Continuous objective value is 0 - 0.00 seconds
Cgl0004I processed model has 196 rows, 392 columns (392 integer (199 of which binary)) and 1120 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 2 integers unsatisfied sum - 1
Cbc0038I Solution found of 1
Cbc0038I Branch and bound needed to clear up 2 general integers
Cbc0038I Mini branch and bound could not fix general integers
Cbc0038I No solution found this major pass
Cbc0038I Before mini branch and bound, 390 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.00 seconds)
Cbc0038I After 0.00 seconds - Feasibility pump exiting - took 0.00 seconds
Cbc0031I 55 added rows had average density of 16.381818
Cbc0013I At root node, 55 cuts changed objective from 1 to 5.1670955 in 100 passes
Cbc0014I Cut generator 0 (Probing) - 1382 row cuts average 2.0 elements, 0 column cuts (1 active)  in 0.035 seconds - new frequency is 1
Cbc0014I Cut generator 1 (Gomory) - 1851 row cuts average 124.7 elements, 0 column cuts (0 active)  in 0.183 seconds - new frequency is 1
Cbc0014I Cut generator 2 (Knapsack) - 16 row cuts average 5.7 elements, 0 column cuts (0 active)  in 0.031 seconds - new frequency is -100
Cbc0014I Cut generator 3 (Clique) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.002 seconds - new frequency is -100
Cbc0014I Cut generator 4 (MixedIntegerRounding2) - 6 row cuts average 4.0 elements, 0 column cuts (0 active)  in 0.003 seconds - new frequency is -100
Cbc0014I Cut generator 5 (FlowCover) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.019 seconds - new frequency is -100
Cbc0014I Cut generator 6 (TwoMirCuts) - 494 row cuts average 6.8 elements, 0 column cuts (0 active)  in 0.005 seconds - new frequency is 1
Cbc0010I After 0 nodes, 1 on tree, 1e+50 best solution, best possible 5.1670955 (0.57 seconds)
Cbc0010I After 1000 nodes, 33 on tree, 1e+50 best solution, best possible 5.1670955 (10.55 seconds)
Cbc0010I After 2000 nodes, 20 on tree, 1e+50 best solution, best possible 5.1670955 (18.64 seconds)
Cbc0010I After 3000 nodes, 14 on tree, 1e+50 best solution, best possible 5.1670955 (27.18 seconds)
Cbc0010I After 4000 nodes, 19 on tree, 1e+50 best solution, best possible 5.1670955 (34.96 seconds)
Cbc0010I After 5000 nodes, 11 on tree, 1e+50 best solution, best possible 5.1670955 (41.29 seconds)
Cbc0010I After 6000 nodes, 8 on tree, 1e+50 best solution, best possible 5.1670955 (50.23 seconds)
Cbc0010I After 7000 nodes, 15 on tree, 1e+50 best solution, best possible 5.1670955 (59.69 seconds)
Cbc0010I After 8000 nodes, 26 on tree, 1e+50 best solution, best possible 5.1670955 (68.63 seconds)
Cbc0010I After 9000 nodes, 62 on tree, 1e+50 best solution, best possible 5.1670955 (77.13 seconds)
Cbc0010I After 10000 nodes, 76 on tree, 1e+50 best solution, best possible 5.1670955 (85.94 seconds)
Cbc0010I After 11000 nodes, 71 on tree, 1e+50 best solution, best possible 5.1670955 (94.21 seconds)
Cbc0010I After 12000 nodes, 531 on tree, 1e+50 best solution, best possible 20.678562 (113.61 seconds)

Many engines already offer an official (see Gurobi logtools) or unofficial (see cpsat logutils). The issue is that they are not compatible either.

In theory an engine's progress log is quite standard. The minimum needed is: elapsed time, number of iterations and the best solution found. Then, depending on the technique used, other relevant information can be added. Some are able to show some kind of lower bound (i.e., an estimation of the best possible solution yet to be found), or the "current" solution. Branch-and-bound based engines will show the number of nodes explored and left to be explored.

Yet, I've yet to see two solvers that share a common log format. This time, I did do something about it and created the orloge python package to parse output logs from different solvers. It's currently compatible with four solvers: CBC, GUROBI, CPLEX and CP-SAT. But it's fairly easy to add support for other solvers: I'm looking for contributions to add SCIP and HiGHS, if anyone's interested.

Orloge parses the output of a file or stream and, among other things, produces a pandas table with the complete progress.

As can be expected, it offers a common interface:

import orloge as ol
ol.get_info_solver('tests/data/cbc298-app1-2.out', 'CBC')
# for GUROBI
# ol.get_info_solver('tests/data/gurobi700-app1-2.out', 'GUROBI')

It even allows to personalize your own log, using the same API:

import orloge as ol

# There is a lot more things that can be parsed (mostly for MIP solvers) but this gives an idea of the main things to obtain

class MyLog(ol.LogFile):
    def __init__(self, path, **options):
        super().__init__(path, **options)
        self.name = "my_solver_log"

    def get_stats(self):
        # parse status, objective, bound, gap_rel
        return None, None, None, None

    def get_status_codes(self, status, objective):
        # parse status codes of the solver and solution
        return None, None

    def get_version(self):
        # parse solver version
        return ""

    def get_progress(self):
        # parse the progress and return a pandas DataFrame
        import pandas as pd

        return pd.DataFrame()

my_log = MyLog(path="PATH_TO_MY_LOG_FILE")
my_log.get_log_info()

# this is equivalent as running the previous method:
# ol.get_info_solver('tests/data/cbc298-app1-2.out', 'CBC')

Regardless of the solver, the progress table always has the same columns (some will be empty depending on the solver):

So you get an output with something like this for GUROBI:

Node NodesLeft Objective Depth IInf BestInteger CutsBestBound Gap ItpNode Time
0 0 0 -178.943 0 282 -41 -178.943 336% 4s
1 0 0 -171.917 0 268 -41 -171.917 319% 15s
2 0 0 -170.977 0 267 -41 -170.977 317% 15s
3 0 0 -169.27 0 275 -41 -169.27 313% 15s
4 0 0 -168.879 0 271 -41 -168.879 312% 15s

And something like this for CP-SAT (for a different problem):

Time CutsBestBound BestInteger Gap NumVars RemVars NumCons RemCons
0 0.71 1 17 94.1176
1 0.72 1 16 93.75
2 0.74 1 15 93.3333
3 1.3 8 15 46.6667

Conclusion

Combinatorial optimization would greatly benefit from more standardization.

Even when things are not 100% standard, using the right tools (modellers, log libraries), users have a single consistent interface to many optimization engines.

Open source tools are particularly powerful in this regard, as they can easily be extended by the community (including the solver providers) with more functionality and fixes.

Comments