720. SGD Convergence Speed
easy

Why does SGD often converge faster in practice than batch gradient descent for large datasets?