← All posts
QUANTUM SERIES #4
NISQ Algorithms & Barren Plateaus
Back to top
Thesis-Driven Analysis

Why VQE/QAOA Don't Scale Yet
The Barren Plateau Problem

Published · 13 min read

THESIS: NISQ = Noisy Intermediate-Scale Quantum. VQE and QAOA are variational algorithms designed for NISQ, but they hit two hard walls: (1) barren plateaus kill gradient descent at scale, (2) SWAP overhead and noise kills circuit depth. Current NISQ advantage claims are premature. Fault-tolerance is still the unlock.

Table of Contents

1. The NISQ Promise vs Reality

NISQ devices with 50–1000 physical qubits and ~10−3 error rates were supposed to deliver “quantum advantage” before fault-tolerance arrived. The flagship algorithms: VQE for chemistry, QAOA for optimization, VQLS for linear systems. All variational. All hybrid. All hitting walls.

INTUITION

Variational algorithms are quantum ML: try an ansatz circuit, measure a cost expectation value, use a classical optimizer to tune parameters. Like training a neural network, but the “network” is a parameterized qubit circuit and the “loss” is $\langle\psi(\theta)|H|\psi(\theta)\rangle$. It works when gradients exist and noise is low. At scale, neither condition holds.

Reality Check: NISQ Flagship Algorithms

Algorithm Problem Qubits Needed Circuit Depth NISQ Viable? Primary Failure Mode
VQE Chemistry ground state 50–200 for FeMoco 103–104 No Barren plateaus + noise kills accuracy before useful depth
QAOA MaxCut, CSP 50+ for real graphs 2p ≈ 40 for p=20 No p=1 beaten by classical; p>5 noise dominates; no sweet spot
VQLS Linear systems Ax=b log N + overhead Variable No Barren plateaus + conditioning issues; HHL advantage lost to dequantization
SO WHAT

Every NISQ “advantage” claim to date has been on toy problems with fewer than 20 qubits or contrived benchmarks. At 50+ qubits, where problems become industrially relevant, two exponentials fight you simultaneously: gradient variance scales as $2^{-n}$ and SWAP depth scales as $n^2$. Without QEC to suppress noise, both kill you before you reach useful depth.

2. Variational Quantum Algorithms 101

VQE and QAOA share the same hybrid structure: a parameterized quantum circuit plus a classical optimizer. The quantum computer evaluates a cost function $C(\theta) = \langle\psi(\theta)|H|\psi(\theta)\rangle$. The classical computer updates $\theta$ via gradient descent or gradient-free methods. The loop repeats until $C(\theta)$ converges.

The Hybrid Loop

Quantum CPU U(θ) → measure C(θ) = ⟨H⟩ Classical CPU Compute gradient θ ← θ − α∇C C(θ) via shots θ_new Iterate until C(θ) converges (or barren plateau)

Basic VQE for H2 in Qiskit

from qiskit_nature.second_q.drivers import PySCFDriver
from qiskit_algorithms import VQE
from qiskit_algorithms.optimizers import COBYLA
from qiskit.circuit.library import TwoLocal

# H2 at 0.735 Angstrom bond length
driver = PySCFDriver(atom="H 0 0 0; H 0 0 0.735")
problem = driver.run()
hamiltonian = problem.hamiltonian.second_q_op()

# Hardware-efficient ansatz: 2 qubits, depth 2
ansatz = TwoLocal(2, "ry", "cz", reps=2)

vqe = VQE(ansatz, optimizer=COBYLA(maxiter=500))
result = vqe.compute_minimum_eigenvalue(hamiltonian)
# Works for 2 qubits. Fails at 14+ (H2O).
REALITY CHECK

H2 works. 2 qubits, depth 4, exact answer in seconds. Scale to H2O (14 qubits) and the optimizer gets lost. Scale to FeMoco (200 qubits) and the landscape becomes exponentially flat. That flatness has a name: barren plateaus.

3. Barren Plateaus — The Gradients Vanish

McClean et al. 2018 [1] proved the central result: for a random parameterized quantum circuit with $n$ qubits, the variance of the gradient of the cost function decays as:

$$\text{Var}\!\left[\frac{\partial C}{\partial \theta_k}\right] \;\in\; \mathcal{O}\!\left(\frac{1}{2^n}\right)$$

Gradient variance decays exponentially in the number of qubits. At n=50, you need ~250 ≈ 1015 shots to resolve gradient direction.

Gradient Variance vs Qubits (log scale)

10⁰ 10⁻⁵ 10⁻¹⁰ 10⁻¹⁵ 10⁻²⁰ 0 10 20 30 40 50 Shot noise floor Var ~ 2⁻ⁿ (linear on log scale) Number of Qubits (n) Var[∂C/∂θ] (log scale)

Why This Happens

1
Haar Random Unitaries

Deep random circuits converge to 2-designs, scrambling information across the Hilbert space. The average gradient over the unitary group is exactly zero. Random initialization lands in the plateau.

2
Entanglement-Induced Concentration

Global cost functions on highly entangled states exhibit concentration of measure: the landscape is flat almost everywhere. The phenomenon is the quantum analogue of vanishing gradients in deep networks — but without batch norm or residual connections to save you.

3
Expressibility vs Trainability Trade-off

Holmes et al. 2022 [5] showed that expressibility of an ansatz is directly tied to gradient magnitudes: more expressive ansätze have exponentially smaller gradients. Hardware-efficient ansätze are too expressive. You can't have both rich enough to represent the answer and trainable enough to find it.

INTUITION

Optimizing a 50-qubit VQE is like searching for a golf hole on a course the size of Earth, where the landscape looks perfectly flat from every vantage point. Random initialization drops you in Kansas. Your gradient reads zero. You have no directional signal. You’d need to sample the entire planet to detect any slope at all.

4. SWAP Overhead Reality

Most NISQ chips have linear or 2D-grid qubit connectivity. Chemistry algorithms require all-to-all entanglement. To entangle non-adjacent qubits on a linear chain, you must shuttle quantum information through intermediate qubits using SWAP gates. For $n$ qubits in a line, connecting opposite ends requires $\mathcal{O}(n)$ SWAPs. Achieving full all-to-all connectivity: $\mathcal{O}(n^2)$ SWAPs.

Connectivity Impact on Circuit Depth

Ideal: All-to-All Depth: 1 (no SWAP needed) NISQ: Nearest-Neighbor Requires 2 SWAP gates Depth: O(n) per interaction SWAP fidelity cost (typical 2-qubit gate error ~10⁻³): After 100 SWAPs: 0.999¹⁰⁰ ≈ 0.90 | After 1000 SWAPs: ≈ 0.37

H2 VQE: Ideal vs Realistic Hardware

Ideal Device (all-to-all) F = 0.99

4 qubits, depth 4, no SWAP gates required

Linear-chain Device F = 0.62

+48 SWAP gates inserted for connectivity routing

Linear-chain + Realistic Noise F = 0.51

T1/T2 decoherence + gate error 10−3

SO WHAT

Barren plateaus mean you cannot find the cost function minimum. SWAP overhead means that even if you found it, the circuit cannot run deep enough to evaluate it accurately. These are independent exponential barriers that compound. This is precisely why NISQ VQE has not simulated FeMoco. From series #3: NISQ fails because it has no QEC. These are the concrete symptoms.

5. VQE Chemistry Case Study

The flagship “killer app” for VQE is nitrogen fixation: simulating the iron–molybdenum cofactor (FeMoco) of nitrogenase to design better catalysts for fertilizer production. Here is the requirement gap in precise terms.

Molecule Exact E (Ha) VQE NISQ Best Error Status
H2 -1.137 -1.136 ~10−3 Demonstrated
LiH -7.882 -7.880 ~10−3 Demonstrated
H2O -75.49 -75.21 ~10−1 Struggles
FeMoco unknown Unreachable on NISQ

FeMoco Requirements (Fault-Tolerant QPE)

  • Logical qubits: ~200 (active orbital space)
  • T-gate depth: ~1010 for quantum phase estimation
  • Chemical accuracy: 1.6 mHa required

Current NISQ VQE Reality

  • Physical qubits: 12 demonstrated for chemistry, 50 max on device
  • Practical depth: ~100 gates before noise dominates
  • Accuracy achieved: ~10−2 Ha best case (10× too large)
REALITY CHECK

The gap: ~10 orders of magnitude in circuit depth, 4× in qubit count, 10× in accuracy. VQE cannot bridge this without error correction. The variational approach was a clever attempt to circumvent QEC requirements. Barren plateaus deliver the verdict: you cannot dodge the exponential.

6. QAOA MaxCut Case Study

QAOA [7] alternates cost and mixer Hamiltonians for $p$ layers. Theory: as $p \to \infty$, QAOA approximates adiabatic evolution and converges to the optimum. Practice: noise kills fidelity at $p \approx 5$, which is well below the $p > \log n$ threshold needed to beat classical algorithms.

Approximation Ratio vs Circuit Depth p

0.5 0.7 0.9 1.0 0 5 10 15 20 25 Ideal (noiseless) GW classical: 0.878 NISQ (noisy) p=1: 0.69 peak ~p=3 QAOA Depth p Approximation Ratio

The QAOA Myths

Myth: “QAOA has advantage at p=1”

For MaxCut, p=1 QAOA achieves approximation ratio ~0.69. The Goemans–Williamson classical algorithm achieves 0.878 and runs in milliseconds on a laptop. QAOA p=1 is strictly and significantly worse.

Myth: “Just increase p”

Bravyi et al. 2020 [2] showed QAOA needs $p > \log n$ to beat GW. For $n=1000$ this requires $p > 10$, depth $> 20$. But NISQ coherence dies at depth ~5 due to noise. Noisy QAOA actually degrades in approximation quality past p≈3–5 as noise accumulates faster than the signal improves.

Reality

QAOA is a heuristic with a beautiful theoretical convergence guarantee in the noiseless limit. On NISQ hardware, noise makes it a worse heuristic than readily available classical approximation algorithms. Its current value is as a hardware benchmark, not a problem solver.

SO WHAT

Do not invest in QAOA expecting commercial optimization advantage on NISQ. p=1 loses to classical. p>5 loses to noise. The noisy approximation ratio turns downward before you reach the depth needed to compete. There is no sweet spot without error correction.

7. Mitigation That Doesn’t Work Yet

The field has tried hard to fix barren plateaus: layer-wise training, smart initialization, local cost functions, error mitigation. They help for $n < 20$. At $n=50$, the landscape is still flat.

Layer-wise Training

Train ansatz layers sequentially to avoid barren plateaus in early layers.

FAILS at scale: Later layers still hit barren plateaus once depth exceeds O(log n).

Parameter Initialization

Identity-block initialization (Cerezo et al.) keeps gradients non-zero at the start of training.

FAILS at scale: Classical optimizer drives parameters into barren plateau regions during training.

Local Cost Functions

Measure local observables rather than global Hamiltonian to reduce entanglement-induced flatness.

PARTIALLY HELPS: Delays the plateau but does not eliminate it beyond depth O(poly(n)).

Error Mitigation: Not a Free Lunch

Zero-Noise Extrapolation (ZNE)

Run the circuit at artificially amplified noise scales (1×, 3×, 5×) and extrapolate to zero noise.

Cost: Exponential sampling overhead. Variance of the extrapolated estimate grows faster than the bias shrinks. Breaks down at moderate error rates.
Virtual Distillation (VD)

Entangle k independent copies of the noisy state; measure a joint observable to project out errors.

Cost: k-fold qubit overhead. k=2 doubles your qubit requirement. At n=50 this means n=100 physical qubits — with the same gate error rate.
INTUITION

Error mitigation trades quantum resources for classical post-processing. ZNE needs ~10× more shots. VD needs 2× more qubits. To suppress 10−3 physical error to 10−6 effective error, you pay roughly 1000× overhead in shots or qubits. At that overhead, you have essentially reconstructed a bad quantum error correction code. Just do proper QEC.

8. What Might Actually Work on NISQ

NISQ is not useless. It is a research instrument, not a production computer. Here is where it genuinely has a role:

Small Molecules with Symmetry

H2, LiH, BeH2 using symmetry-adapted ansätze. Physical symmetry constraints reduce parameter count and delay barren plateaus.

STATUS: Scientifically demonstrated as valid benchmarks; not commercially useful yet.

Quantum Kernels (<20 Features)

Shallow circuits as kernel functions in SVMs. Classical overhead is low; avoids deep VQA training loops entirely.

STATUS: No proven advantage yet, but theoretically plausible for specific data geometries.

QEC Co-design & Error-Detection Codes

Test distance-3 surface code fragments on real hardware. Learn QEC in practice before fault-tolerance arrives. Google and IBM are doing this now.

STATUS: The most strategically valuable NISQ use case. Building the fault-tolerant stack.

Analog Simulation

Directly emulate condensed-matter Hamiltonians without gate-based compilation. No discretized gates, no SWAP overhead, no barren plateau from variational ansatz.

STATUS: Best current NISQ use case for actual physics insight.
SO WHAT

Be precise: NISQ is a testbed for co-designing qubits, gates, and error correction codes — not a computer that solves production problems. Use it to benchmark gate fidelity, test QEC fragments, and explore analog simulation. Do not bet your roadmap on VQE delivering industrial chemistry results by 2027. That bet loses to barren plateaus every time.

9. Implications for Roadmaps

Don’t Do This

  • • Bet company strategy on NISQ advantage
  • • Promise customers VQE chemistry results in 2 years
  • • Claim QAOA beats classical on real industry data
  • • Dismiss or ignore barren plateau literature

Do This Instead

  • • Use NISQ devices for QEC co-design
  • • Benchmark gate fidelity improvements with VQE as probe
  • • Explore analog simulation for condensed matter
  • • Build toolchains targeting the fault-tolerant era

Security Implications

  • • NISQ cannot break RSA — requires ~106 logical qubits
  • • NISQ cannot run long proofs or zk-SNARKs
  • • Post-quantum cryptography migration timeline unchanged
  • • Wait for logical qubits before re-evaluating security posture

The Bottom Line

Series #3 established that NISQ fails because it lacks QEC. This post showed how it fails mechanistically: barren plateaus remove trainability by exponential gradient suppression; SWAP overhead and gate noise destroy circuit fidelity before useful depth is reached. The scaling laws are exponential in the wrong direction on both axes.

Fault-tolerance is still the unlock. Logical qubits operating at 10−6 error rates change every calculation in this post. Until then, treat NISQ as a research instrument for building the fault-tolerant stack — not as the destination itself.

10. References

  1. [1]
    McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R., & Neven, H. (2018)
    Barren plateaus in quantum neural network training landscapes. Nature Communications 9, 4812.
  2. [2]
    Bravyi, S., Kliesch, A., Koenig, R., & Tang, E. (2020)
    Obstacles to variational quantum optimization from symmetry protection. Physical Review Letters 125, 260505.
  3. [3]
    Google AI Quantum & collaborators (2020)
    Hartree-Fock on a superconducting qubit quantum computer. Science 369, 1084–1089.
  4. [4]
    Cerezo, M., Arrasmith, A., Babbush, R., et al. (2021)
    Variational quantum algorithms. Nature Reviews Physics 3, 625–644.
  5. [5]
    Holmes, Z., Sharma, K., Cerezo, M., & Coles, P. J. (2022)
    Connecting ansatz expressibility to gradient magnitudes and barren plateaus. PRX Quantum 3, 010313.
  6. [6]
    Preskill, J. (2018)
    Quantum computing in the NISQ era and beyond. Quantum 2, 79. [The paper that named NISQ.]
  7. [7]
    Farhi, E., Goldstone, J., & Gutmann, S. (2014)
    A quantum approximate optimization algorithm. arXiv:1411.4028. [Original QAOA paper.]
  8. [8]
    Peruzzo, A., et al. (2014)
    A variational eigenvalue solver on a photonic chip. Nature Communications 5, 4213. [Original VQE paper.]
  9. [9]
    Kandala, A., et al. (2017)
    Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549, 242–246.

Quantum Series #4 — Manish KL Skeptical, thesis-driven analysis of NISQ algorithm limitations. Part of a series connecting NISQ failure modes to the necessity of fault-tolerance. Series Navigation ← #3: Why NISQ Fails Without QEC #5: Surface Codes & Logical Qubits → Key Takeaway Barren plateaus + SWAP overhead = no NISQ advantage for VQE/QAOA at scale. Use NISQ to co-design and benchmark the fault-tolerant stack. The unlock is logical qubits at 10 −6 error rates. © 2026 Manish KL — Quantum Series. Educational content. Not investment advice.