Why do deep neural networks generalise so well?

Status: Queued

Classical statistical learning theory predicts that models with many more parameters than training examples should overfit catastrophically. Deep networks routinely have orders of magnitude more parameters than training points and still generalise.

Proposed explanations include implicit regularisation by stochastic gradient descent, double descent, neural tangent kernel theory and feature learning. No single theory yet predicts generalisation performance from architecture alone.

Sources

Wikipedia: Generalization (machine learning)

Runs

No runs yet — this question is queued.