> 2021年07月29日信息消化 ### Week 6 Diagnosing Bias vs. Variance ##### Diagnosing Bias vs. Variance **High bias (underfitting)**: both $J_{train}(\Theta)$ and $J_{CV}(\Theta)$ will be high. Also, $J_{CV}(\Theta)\approx J_{train}(\Theta)$. **High variance (overfitting)**: $J_{train}(\Theta)$ will be low and $J_{CV}(\Theta)$ will be much greater than .$J_{train}(\Theta)$ The is summarized in the figure below: ![img](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/PicGo/I4dRkz_pEeeHpAqQsW8qwg_bed7efdd48c13e8f75624c817fb39684_fixed.png) ###### Follow-up test ![image-20210729081701325](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/PicGo/image-20210729081701325.png) ##### Regularization and Bias/Variance ![img](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/PicGo/3XyCytntEeataRJ74fuL6g_3b6c06d065d24e0bf8d557e59027e87a_Screenshot-2017-01-13-16.09.36.png) In the figure above, we see that as \lambda*λ* increases, our fit becomes more rigid. On the other hand, as \lambda*λ* approaches 0, we tend to over overfit the data. So how do we choose our parameter \lambda*λ* to get it 'just right' ? In order to choose the model and the regularization term λ, we need to: 1. Create a list of lambdas (i.e. λ∈{0,0.01,0.02,0.04,0.08,0.16,0.32,0.64,1.28,2.56,5.12,10.24}); 2. Create a set of models with different degrees or any other variants. 3. Iterate through the *λ*s and for each *λ* go through all the models to learn some Θ. 4. Compute the cross validation error using the learned Θ (computed with λ) on the$ J_{CV}(\Theta)$ **without** regularization or λ = 0. 5. Select the best combo that produces the lowest error on the cross validation set. 6. Using the best combo Θ and λ, apply it on $J_{test}(\Theta)$ to see if it has a good generalization of the problem. ![image-20210729084256589](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/PicGo/image-20210729084256589.png) ##### Learning Curves **Experiencing high bias:** **Low training set size**: causes $J_{train}(\Theta)$ to be low and $J_{CV}(\Theta)$ to be high. **Large training set size**: causes both $J_{train}(\Theta)$ and $J_{CV}(\Theta)$ to be high with $J_{train}(\Theta)≈J_{CV}(\Theta)$. ![img](https://raw.githubusercontent.com/Phalacrocorax/memo-image-host/master/PicGo/bpAOvt9uEeaQlg5FcsXQDA_ecad653e01ee824b231ff8b5df7208d9_2-am.png) **Experiencing high variance:** **Low training set size**: $J_{train}(\Theta)$ will be low and $J_{CV}(\Theta)$ will be high. **Large training set size**: $J_{train}(\Theta)$ increases with training set size and $J_{CV}(\Theta)$ continues to decrease without leveling off. Also, b$J_{train}(\Theta)