Piecewise Linear Functions in NN

June 5, 2026

Neural networks (nn) only learn continuous piecewise linear (CPWL) function if their activation functions are also CPWL. If a network uses smooth activation functions like Simoid, Tanh or Swish/GELU, the resulting network represents a smooth, non-linear function. However, the vast majority of models (or LLM or VLM) rely on ReLU or its variants.

The Mathematics of Composition:

A standard feedforward nn is essentially a series of alternating affine transformations (matrix multiplications and bias additions) and non-linear activation functions. Mathematically, a layer at ii computes:

hi=σ(Wihi1+bi)h_i = \sigma(W_i h_{i-1} +b_i)

Real analysis: The composition of continuous piecewise linear function is strictly a continuous piecewise linear function. Combination of piecewise linear segments will not create true curves but instead more linear 'hinges'.

The Geometry of Convex Polytopes

Each neuron in a hidden dimension defines a hyperplane in the input space. This hyperplane acts as a "fold."

  • One one side of the hyperplane, the neuron's pre-activation is negative, the ReLU outputs zero, and the neuron is "dead" (via a gradient becoming zero).
  • On the other side, the neuron is active and passes the linear signal forward.

When the network consists of thousands or millions of neurons, these hyperplanes intersect, fracturing the high-dimensional input space into a massive grid of distinct, non-overlapping regions called convex polytopes. Inside any single polytope, the activation state of every single neuron in the network is fixed (either ON or OFF). This is because the non-linear "decisions" are locked in the entire neural network collapses into a single, massive matrix multiplication for any input that falls within that specific region.

For an input xx inside a specific polytope PP, the network's output is exactly:

y=Wpx+bpy = W_p x + b_p

where WpW_p and bpb_p are the effective weight matrix and bias for that specific region. The network changes its slope only when the input crosses a boundary into a neighboring polytope.

Implications for AI and Robustness

  • Universal approximation: Even through the nn are composed on straight lines, networks can approximate any continuous function given enough hinges (or neurons), just like you can approximate a smooth curve by drawing enough, small straight lines segments.
  • Adversarial vulnerabilities: The CPWL nature of networks is a primary reason why adversarial examples are so effective. Because the network is locally linear, methods like the Fast Gradient Sign Method (FGSM) can easily exploit the linear slope within a polytope to push an input across a decision boundary using tiny, calculated perturbations.