Integer step size in scipy optimize minimize

The Nelder-Mead minimize method now lets you specify the initial simplex vertex points, so you should be able to set the simplex points far apart, and the simplex will then flop around and find the minimum and converge when the simplex size drops below 1.

https://docs.scipy.org/doc/scipy/reference/optimize.minimize-neldermead.html#optimize-minimize-neldermead


Unfortunately, Scipy's built-in optimization tools don't easily allow for this. But never fear; it sounds like you have a convex problem, and so you should be able to find a unique optimum, even if it won't be mathematically pretty.

Two options that I've implemented for different problems are creating a custom gradient descent algorithm, and using bisection on a series of univariate problems. If you're doing cross-validation in your tuning, your loss function unfortunately won't be smooth (because of noise from cross-validation on different datasets), but will be generally convex.

To implement gradient descent numerically (without having an analytical method for evaluating the gradient), choose a test point and a second point that is delta away from your test point in all dimensions. Evaluating your loss function at these two points can allow you to numerically compute a local subgradient. It is important that delta be large enough that it steps outside of local minima created by cross-validation noise.

A slower but potentially more robust alternative is to implement bisection for each parameter you're testing. If you know that the problem in jointly convex in your two parameters (or n parameters), you can separate this into n univariate optimization problems, and write a bisection algorithm which recursively hones in on the optimal parameters. This can help handle some types of quasiconvexity (e.g. if your loss function takes a background noise value for part of its domain, and is convex in another region), but requires a good guess as to the bounds for the initial iteration.

If you simply snap the requested x values to an integer grid without fixing xtol to map to that gridsize, you risk having the solver request two points within a grid cell, receiving the same output value, and concluding that it is at a minimum.

No easy answer, unfortunately.


Snap your floats x, y (a.k.a. winsize, threshold) to an integer grid inside your function, like this:

def func( x, y ):
    x = round( x )
    y = round( (y - 1) / 2 ) * 2 + 1  # 1 3 5 ...
    ...

Then Nelder-Mead will see function values only on the grid, and should give you near-integer x, y.

(If you'd care to post your code someplace, I'm looking for test cases for a Nelder-Mead with restarts.)


Assuming that the function to minimize is arbitrarily complex (nonlinear), this is a very hard problem in general. It cannot be guaranteed to be solved optimal unless you try every possible option. I do not know if there are any integer constrained nonlinear optimizer (somewhat doubt it) and I will assume you know that Nelder-Mead should work fine if it was a contiguous function.

Edit: Considering the comment from @Dougal I will just add here: Set up a coarse+fine grid search first, if you then feel like trying if your Nelder-Mead works (and converges faster), the points below may help...

But maybe some points that help:

  1. Considering how the whole integer constraint is very difficult, maybe it would be an option to do some simple interpolation to help the optimizer. It should still converge to an integer solution. Of course this requires to calculate extra points, but it might solve many other problems. (even in linear integer programming its common to solve the unconstrained system first AFAIK)
  2. Nelder-Mead starts with N+1 points, these are hard wired in scipy (at least older versions) to (1+0.05) * x0[j] (for j in all dimensions, unless x0[j] is 0), which you will see in your first evaluation steps. Maybe these can be supplied in newer versions, otherwise you could just change/copy the scipy code (it is pure python) and set it to something more reasonable. Or if you feel that is simpler, scale all input variables down so that (1+0.05)*x0 is of sensible size.
  3. Maybe you should cache all function evaluations, since if you use Nelder-Mead I would guess you can always run into duplicat evaluation (at least at the end).
  4. You have to check how likely Nelder-Mead will just shrink to a single value and give up, because it always finds the same result.
  5. You generally must check if your function is well behaved at all... This optimization is doomed if the function does not change smooth over the parameter space, and even then it can easily run into local minima if you should have of those. (since you cached all evaluations - see 2. - you could at least plot those and have a look at the error landscape without needing to do any extra evluations)