748. Statistical Significance of Performance Differences
hard

Two models are compared using the same train/test split. Model A achieves 84% test accuracy and Model B achieves 85%. A data scientist concludes Model B is better. What is missing from this analysis?