THESIS: NISQ = Noisy Intermediate-Scale Quantum. VQE and QAOA are variational algorithms designed for NISQ, but they hit two hard walls: (1) barren plateaus kill gradient descent at scale, (2) SWAP overhead and noise kills circuit depth. Current NISQ advantage claims are premature. Fault-tolerance is still the unlock.
NISQ devices with 50–1000 physical qubits and ~10−3 error rates were supposed to deliver “quantum advantage” before fault-tolerance arrived. The flagship algorithms: VQE for chemistry, QAOA for optimization, VQLS for linear systems. All variational. All hybrid. All hitting walls.
Variational algorithms are quantum ML: try an ansatz circuit, measure a cost expectation value, use a classical optimizer to tune parameters. Like training a neural network, but the “network” is a parameterized qubit circuit and the “loss” is $\langle\psi(\theta)|H|\psi(\theta)\rangle$. It works when gradients exist and noise is low. At scale, neither condition holds.
| Algorithm | Problem | Qubits Needed | Circuit Depth | NISQ Viable? | Primary Failure Mode |
|---|---|---|---|---|---|
| VQE | Chemistry ground state | 50–200 for FeMoco | 103–104 | No | Barren plateaus + noise kills accuracy before useful depth |
| QAOA | MaxCut, CSP | 50+ for real graphs | 2p ≈ 40 for p=20 | No | p=1 beaten by classical; p>5 noise dominates; no sweet spot |
| VQLS | Linear systems Ax=b | log N + overhead | Variable | No | Barren plateaus + conditioning issues; HHL advantage lost to dequantization |
Every NISQ “advantage” claim to date has been on toy problems with fewer than 20 qubits or contrived benchmarks. At 50+ qubits, where problems become industrially relevant, two exponentials fight you simultaneously: gradient variance scales as $2^{-n}$ and SWAP depth scales as $n^2$. Without QEC to suppress noise, both kill you before you reach useful depth.
VQE and QAOA share the same hybrid structure: a parameterized quantum circuit plus a classical optimizer. The quantum computer evaluates a cost function $C(\theta) = \langle\psi(\theta)|H|\psi(\theta)\rangle$. The classical computer updates $\theta$ via gradient descent or gradient-free methods. The loop repeats until $C(\theta)$ converges.
from qiskit_nature.second_q.drivers import PySCFDriver
from qiskit_algorithms import VQE
from qiskit_algorithms.optimizers import COBYLA
from qiskit.circuit.library import TwoLocal
# H2 at 0.735 Angstrom bond length
driver = PySCFDriver(atom="H 0 0 0; H 0 0 0.735")
problem = driver.run()
hamiltonian = problem.hamiltonian.second_q_op()
# Hardware-efficient ansatz: 2 qubits, depth 2
ansatz = TwoLocal(2, "ry", "cz", reps=2)
vqe = VQE(ansatz, optimizer=COBYLA(maxiter=500))
result = vqe.compute_minimum_eigenvalue(hamiltonian)
# Works for 2 qubits. Fails at 14+ (H2O).
H2 works. 2 qubits, depth 4, exact answer in seconds. Scale to H2O (14 qubits) and the optimizer gets lost. Scale to FeMoco (200 qubits) and the landscape becomes exponentially flat. That flatness has a name: barren plateaus.
McClean et al. 2018 [1] proved the central result: for a random parameterized quantum circuit with $n$ qubits, the variance of the gradient of the cost function decays as:
Gradient variance decays exponentially in the number of qubits. At n=50, you need ~250 ≈ 1015 shots to resolve gradient direction.
Deep random circuits converge to 2-designs, scrambling information across the Hilbert space. The average gradient over the unitary group is exactly zero. Random initialization lands in the plateau.
Global cost functions on highly entangled states exhibit concentration of measure: the landscape is flat almost everywhere. The phenomenon is the quantum analogue of vanishing gradients in deep networks — but without batch norm or residual connections to save you.
Holmes et al. 2022 [5] showed that expressibility of an ansatz is directly tied to gradient magnitudes: more expressive ansätze have exponentially smaller gradients. Hardware-efficient ansätze are too expressive. You can't have both rich enough to represent the answer and trainable enough to find it.
Optimizing a 50-qubit VQE is like searching for a golf hole on a course the size of Earth, where the landscape looks perfectly flat from every vantage point. Random initialization drops you in Kansas. Your gradient reads zero. You have no directional signal. You’d need to sample the entire planet to detect any slope at all.
Most NISQ chips have linear or 2D-grid qubit connectivity. Chemistry algorithms require all-to-all entanglement. To entangle non-adjacent qubits on a linear chain, you must shuttle quantum information through intermediate qubits using SWAP gates. For $n$ qubits in a line, connecting opposite ends requires $\mathcal{O}(n)$ SWAPs. Achieving full all-to-all connectivity: $\mathcal{O}(n^2)$ SWAPs.
4 qubits, depth 4, no SWAP gates required
+48 SWAP gates inserted for connectivity routing
T1/T2 decoherence + gate error 10−3
Barren plateaus mean you cannot find the cost function minimum. SWAP overhead means that even if you found it, the circuit cannot run deep enough to evaluate it accurately. These are independent exponential barriers that compound. This is precisely why NISQ VQE has not simulated FeMoco. From series #3: NISQ fails because it has no QEC. These are the concrete symptoms.
The flagship “killer app” for VQE is nitrogen fixation: simulating the iron–molybdenum cofactor (FeMoco) of nitrogenase to design better catalysts for fertilizer production. Here is the requirement gap in precise terms.
| Molecule | Exact E (Ha) | VQE NISQ Best | Error | Status |
|---|---|---|---|---|
| H2 | -1.137 | -1.136 | ~10−3 | Demonstrated |
| LiH | -7.882 | -7.880 | ~10−3 | Demonstrated |
| H2O | -75.49 | -75.21 | ~10−1 | Struggles |
| FeMoco | unknown | — | ∞ | Unreachable on NISQ |
The gap: ~10 orders of magnitude in circuit depth, 4× in qubit count, 10× in accuracy. VQE cannot bridge this without error correction. The variational approach was a clever attempt to circumvent QEC requirements. Barren plateaus deliver the verdict: you cannot dodge the exponential.
QAOA [7] alternates cost and mixer Hamiltonians for $p$ layers. Theory: as $p \to \infty$, QAOA approximates adiabatic evolution and converges to the optimum. Practice: noise kills fidelity at $p \approx 5$, which is well below the $p > \log n$ threshold needed to beat classical algorithms.
For MaxCut, p=1 QAOA achieves approximation ratio ~0.69. The Goemans–Williamson classical algorithm achieves 0.878 and runs in milliseconds on a laptop. QAOA p=1 is strictly and significantly worse.
Bravyi et al. 2020 [2] showed QAOA needs $p > \log n$ to beat GW. For $n=1000$ this requires $p > 10$, depth $> 20$. But NISQ coherence dies at depth ~5 due to noise. Noisy QAOA actually degrades in approximation quality past p≈3–5 as noise accumulates faster than the signal improves.
QAOA is a heuristic with a beautiful theoretical convergence guarantee in the noiseless limit. On NISQ hardware, noise makes it a worse heuristic than readily available classical approximation algorithms. Its current value is as a hardware benchmark, not a problem solver.
Do not invest in QAOA expecting commercial optimization advantage on NISQ. p=1 loses to classical. p>5 loses to noise. The noisy approximation ratio turns downward before you reach the depth needed to compete. There is no sweet spot without error correction.
The field has tried hard to fix barren plateaus: layer-wise training, smart initialization, local cost functions, error mitigation. They help for $n < 20$. At $n=50$, the landscape is still flat.
Train ansatz layers sequentially to avoid barren plateaus in early layers.
Identity-block initialization (Cerezo et al.) keeps gradients non-zero at the start of training.
Measure local observables rather than global Hamiltonian to reduce entanglement-induced flatness.
Run the circuit at artificially amplified noise scales (1×, 3×, 5×) and extrapolate to zero noise.
Entangle k independent copies of the noisy state; measure a joint observable to project out errors.
Error mitigation trades quantum resources for classical post-processing. ZNE needs ~10× more shots. VD needs 2× more qubits. To suppress 10−3 physical error to 10−6 effective error, you pay roughly 1000× overhead in shots or qubits. At that overhead, you have essentially reconstructed a bad quantum error correction code. Just do proper QEC.
NISQ is not useless. It is a research instrument, not a production computer. Here is where it genuinely has a role:
H2, LiH, BeH2 using symmetry-adapted ansätze. Physical symmetry constraints reduce parameter count and delay barren plateaus.
Shallow circuits as kernel functions in SVMs. Classical overhead is low; avoids deep VQA training loops entirely.
Test distance-3 surface code fragments on real hardware. Learn QEC in practice before fault-tolerance arrives. Google and IBM are doing this now.
Directly emulate condensed-matter Hamiltonians without gate-based compilation. No discretized gates, no SWAP overhead, no barren plateau from variational ansatz.
Be precise: NISQ is a testbed for co-designing qubits, gates, and error correction codes — not a computer that solves production problems. Use it to benchmark gate fidelity, test QEC fragments, and explore analog simulation. Do not bet your roadmap on VQE delivering industrial chemistry results by 2027. That bet loses to barren plateaus every time.
Series #3 established that NISQ fails because it lacks QEC. This post showed how it fails mechanistically: barren plateaus remove trainability by exponential gradient suppression; SWAP overhead and gate noise destroy circuit fidelity before useful depth is reached. The scaling laws are exponential in the wrong direction on both axes.
Fault-tolerance is still the unlock. Logical qubits operating at 10−6 error rates change every calculation in this post. Until then, treat NISQ as a research instrument for building the fault-tolerant stack — not as the destination itself.