Convex Sets and convex Functions

Marcel Angenvoort

2025-07-01

Convex Sets

Before we start with optimisation algorithms, we first need to introduce some concepts.

Convex Set

A set \(M\) is called convex if for any \(\mathbf{x}_1, \mathbf{x}_2 \in M\): \[ \theta \mathbf{x}_1 + (1-\theta) \mathbf{x}_2 \in M \qquad \forall \theta \in [0, 1] \]

In other words, a set is convex if any line connecting two of its points is also contained within the set.

convex set
source: wikimedia.org license: CC BY-SA 4.0

 

nonconvex set
source: wikimedia.org license: CC BY-SA 4.0
Figure 1: Examples for convex and nonconvex sets

The expression \(\theta \mathbf{x}_1 + (1-\theta) \mathbf{x}_2\) is called an convex-combination.

This can be generalized to more than two points. Let’s say we have multiple points \(x_1, \dotsc, x_n\), then we have \[ \sum_{i=1}^n \theta_i \mathbf{x}_i \in M \] where \[ \theta_i \ge 0 \quad \forall 1 \leq i \leq n, \quad \sum_{i=1}^n \theta_i = 1. \]

For anyone familiar with statistics, this will look very similar to the expectation of a probability distribution.

And indeed, this can be generalised to continuous functions: Suppose \(p(\mathbf{x})\) satisfies \(p(\mathbf{x}) \gt 0\) and \[ \int_M p(\mathbf{x}) \, \mathrm{d}\mathbf{x} = 1, \] then \[ \int_M p(\mathbf{x}) \mathbf{x} \, \mathrm{d}\mathbf{x} \in M. \]

In other words, the expectation value must be within the convex set.

Examples for convex sets:

  1. Simplices: \[ S = \left\{ \mathbf{x} \in \mathbb{R}^n: \exists \theta_1, \dotsc, \theta_n \in [0, 1], \, \textstyle\sum_{i=1}^n \displaystyle \theta_i = 1 : \mathbf{x} = \sum_{i=1}^n \theta_i \mathbf{v}_i \right\} \]
  2. Polyhedra \[ P = \Bigl\{ \mathbf{x} \in \mathbb{R}^n : \mathbf{a}_j^\mathrm{T} \mathbf{x} \leq b_j, \, \mathbf{c}_k^\mathrm{T} \mathbf{x} = d_k, \, j=1,\dotsc, m, \, k=1,\dotsc,p \Bigr\} \]

Simplex
author: Robert Webb
source: wikimedia.org url: Stella software

 

Polyhedron
author: Tilman Piesk
source: wikimedia.org license: CC BY-SA 4.0
Figure 2: Simplices and Polyhedra are examples for convex sets.

Operations that preserve Convexity

  • Intersection: If \(A\) and \(B\) are convex, then the intersection \[ A \cap B \] is also convex.
  • affine Transformations: If \(M\) is a convex set and \(f:\mathbb{R}^n \to \mathbb{R}^n\) is an affine transformation, i.e. \(f(\mathbf{x}) = A \mathbf{x} + \mathbf{b}\), then the image \[ f(M) \] is also convex.
  • The projection of a convex set onto some of it’s coordinates is convex: If \(S \subseteq \mathbb{R}^m \times \mathbb{R}^n\), then \[ T = \{ x_1 \in \mathbb{R}^m | (x_1, x_2) \in S \;\text{for some $x_2 \in \mathbb{R}^n$} \} \] is convex.(Boyd and Vandenberghe 2004)

As a simple corollary, we can see that the perspective function \(f: \mathbb{R}^n \times \mathbb{R}^+\), \[ f(\mathbf{x}, t) = \frac{\mathbf{x}}{t} \] preserves convexity.

Visually, we can interpret the perspective function as a pinhole camera.

The Separating Hyperplane Theorem

Separating Hyperplane Theorem

Suppose \(A\) and \(B\) are non-empty, disjoint sets, i.e. \(A \cap B = \emptyset\). Then there exists \(\mathbf{a} \ne \mathbf{0}\) and \(b\) such that \(\mathbf{a}^\mathrm{T} \mathbf{x} \leq b\) for all \(\mathbf{x} \in A\) and \(b \leq \mathbf{a}^\mathrm{T} \mathbf{x}\) for all \(\mathbf{x} \in B\).

Figure 3: Separating Hyperplane Theorem
author: Oleg Alexandrov
source: Wikimedia license: public domain (CC0)

Proof. See notes.

Note

This result can be generalized to the Hahn-Banach separation theorem.

Convex Functions

Convex functions

Let \(D\) be a convex set. A function \(f: D \to \mathbb{R}\) is called convex if \[ f\bigl(\theta \mathbf{x}_1 + (1-\theta) \mathbf{x}_2\bigr) \leq \theta f(\mathbf{x}_1) + (1-\theta) f(\mathbf{x}_2), \qquad \theta \in [0, 1] \]

Visually, this means that the graph between two points always lies below the secant line connecting them.

Code
using Plots
using LaTeXStrings
theme(:dark)
#default(size = (800, 600))  # set default plot size

# Define objective function f
f(x) = -0.01548523*x^4 + 0.09320228*x^3 + 0.21426876*x^2 + -0.83797152*x + 0.90722445

xmin = -0.654
xmax = 4.074
x = range( xmin, xmax, length=100)

x1 = 0.56
x2 = 3.284
theta = 0.3
x_comb = theta * x1 + (1 - theta) * x2

# Plot function f
plot(x, f, color=:darkblue, linewidth=4,
    legend=false, 
    xticks = ([x1, x2], ["x₁", "x₂"]),
    yticks = ([f(x1), f(x2)], ["f(x₁)", "f(x₂)"]))

# Draw the line segment connecting (x1, f(x1)) and (x2, f(x2))
plot!([x1, x2], [f(x1), f(x2)],
    lw = 1.5, ls = :dash, color = :green)

# Plot horizontal and vertical lines
plot!([xmin, x1, x1], [f(x1), f(x1), 0],
        lw = 1.5, ls = :dash, color = :gray )
plot!([x2, x2, xmin], [0, f(x2), f(x2)],
    lw = 1.5, ls = :dash, color = :gray )

# Plot points (x1, f(x1)) and (x2, f(x2))
scatter!([x1, x2, x_comb], [f(x1), f(x2), f(x_comb)],
        markersize = 6, color = :red)

# Annotations
annotate!(x_comb + 0.05, f(x_comb) - 0.05, text(L"f(\theta x_1 + (1-\theta) x_2)", :white, :left, :top, 15))
annotate!(x_comb + 0.05, theta * f(x1) + (1-theta) * f(x2), text(L"\theta f(x_1) + (1-\theta) f(x_2)", :white, :right, 15))
Figure 4: A convex function

Examples for convex function

Example:

Himmelblau-function

The Himmelblau function is a common test function for optimization problems. It is defined by:

\[ f(x,y) = (x^2 + y - 11)^2 + (x + y^2 - 7)^2 \]

This function has four identical local minima, one of which is \(f(3, 2) = 0\).

Code
using Plots
theme(:dark)

# Himmelblau function
f(x,y) = (x^2 + y - 11)^2 + (x + y^2 - 7)^2

x = range(-5, 5, length=100);
y = range(-5, 5, length=100);
z = @. f(x', y);

contour(x, y, z, levels=20, color=:turbo, cbar=false, lw=1)
scatter!([3.0, -2.805118, -3.779310, 3.584428], [2.0, 3.131312, -3.283186, -1.848126], legend=false)

title!("Contour Plot of the Himmelblau function")
xlabel!("x")
ylabel!("y")
Figure 5: Contour plot of the Himmelblau function

Solution

Calculate the gradient: \[ \begin{aligned} \nabla f &= \begin{bmatrix} 2(x^2 + y - 11) 2x + 2(x + y^2 - 7) \\ 2(x^2 + y - 11) + 2(x + y^2 - 7) 2y \end{bmatrix} \\ &= \begin{bmatrix} 4x^3 + 2y^2 + 4xy - 42x - 14 \\ 4 y^3 + 2x^2 + 4xy - 26y - 22 \end{bmatrix} \end{aligned} \]

Calculate the Hessian: \[ \begin{aligned} H &= \begin{bmatrix} 12x^2 + 4y - 42 & 4x + 4y \\ 4x + 4y & 12y^2 + 4x - 26 \end{bmatrix} \end{aligned} \]

Substitue \((3, 2)\) for \(x\) and \(y\): \[ H(3, 2) = \begin{bmatrix} 74 & 20 \\ 20 & 34 \end{bmatrix} \]

Determinte the characteristic polynomial: \[ \begin{aligned} \det (H - \lambda I) &= \begin{vmatrix} 74 - \lambda & 20 \\ 20 & 34 - \lambda \end{vmatrix} \\ &= (74 - \lambda) (34 - \lambda) - 20^2 \\ &= \lambda^2 - 108 \lambda + 2116 \\ &\overset{\text{!}}{=} 0 \end{aligned} \]

Solve for \(\lambda\) to obtain the Eigenvalues: \[ \begin{aligned} \lambda_1 = \frac{108}{2} - \sqrt{ \left( \frac{108}{2} \right)^2 - 2116} \approx 25.716 \\ \lambda_2 = \frac{108}{2} + \sqrt{ \left( \frac{108}{2} \right)^2 - 2116} \approx 82.284 \end{aligned} \]

All Eigenvalues are greater than 0, so \(f\) is in (3, 2) positive definite.

Alternatively, we can also solve this using the Symbolics package in Julia:

using Symbolics
using LinearAlgebra

# Himmelblau function
f(x, y) = (x^2 + y - 11)^2 + (x + y^2 - 7)^2

vars = @variables x, y
f_sym = f(x, y)

H = Symbolics.hessian(f_sym, vars)
H_at_point = substitute.(H, Ref(Dict( x=>3.0, y=>2.0) ))

isposdef(H_at_point)
import numpy as np
from sympy import symbols, hessian
from scipy import linalg

f = lambda x, y: (x**2 + y - 11)**2 + (x + y**2 - 7)**2  

x, y = symbols("x y")
f_sym = f(x, y)

H = hessian(f_sym, (x,y))
H_at_point = np.array( H.subs( {x:3, y:2} ) )

linalg.cholesky(H_at_point)
pkg load symbolic

% Himmelblau function
f = @(x, y) (x^2 + y - 11)^2 + (x + y^2 - 7)^2

syms x; syms y;

f_sym = f(x, y)
H_sym = hessian(f_sym)
H = function_handle(H_sym)
H_at_point = H(3, 2)

chol(H_at_point)
import autograd.numpy as np
from autograd import hessian
from scipy import linalg

f = lambda X: (X[0]**2 + X[1] - 11)**2 + (X[0] + X[1]**2 - 7)**2  

H = hessian(f)
point = np.array([3.0, 2.0])
H_at_point = H(point)

linalg.cholesky(H_at_point)

TODO:

  • definition conxex functions
  • examples: norm(x), affine fct. f(x) = c.T x + b, quadratic fct. x.T A x
  • sublevel sets of convex functions are convex sets (proof trivial)
  • Epigraphs: A function is convex if and only if it’s epigraph is convex.
  • Jennsen’s inequality
  • Operations that preserve convexity: Composition with affine mapping
  • Definition of convex conjugate of a function (generalization of Legendre transformation)
  • Examples for convex conjugate (see p. 92)
Boyd, Stephen, and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press. https://web.stanford.edu/~boyd/cvxbook/.