- What would happen to the US Economy if taxes exceed 20% of GDP
- What is the economic incentive to cheat? How does an experiment capture exogenous deviations?
- Can my character have a pet mammoth?
- Where is the mention of Surya Loka(सूर्य लोकं) in Hinduism?
- Mysql_close ()-why few people apply
- 25x25 slitherlink puzzle
- Is there a quick way to speed up ICP in python using a cached KD-tree
- HC 05, Arduino Uno
- Remove lag between PS3 input via bluetooth to arduino
- Use esp32 as a secure sockets
- Best way to organize many pdf's?
- Is EPUB ready for most phones?
- Where is the fuel stored on an aircraft’s wing?
- What are the aileron lengths of commercial and military aircraft?
- Do jet aircraft have an emergency propeller?
- Are there different configurations of cabin crew seating arrangement for one aircraft model?
- How do PPL, CPL, and ATPL compare?
- To what extent is remuneration under a PPL enforced in the UK?
- What criteria are used for exiting an airplane in an emergency?
- Getting my dog ready to Adopt a new cat
Why PCA feature reduction make accuracy worse?
I'm trying to estimate how much feature reduction using PCA can help with increasing accuracy in case of classification using different ml methods. I'm using digits dataset available in scikit-learn. To do it, I'm checking accuracy using 64 features available, later using PCA, I reduce it to 63 features and accuracy decreases extremely:
64 | 0.966 +- 0.008
63 | 0.132 +- 0.0116619037897
64 | 0.96 +- 0.0
63 | 0.54 +- 0.0
64 | 0.974 +- 0.008
63 | 0.12 +- 0.022803508502
64 | 0.802 +- 0.0172046505341
63 | 0.11 +- 0.0126491106407
All calculations were repeated 5 times to get statistics. Before using PCA (64 features) scores where quite good in all cases. After, In case of all tested methods apart from SVM, it was practically random (there're 10 classes). I would understand that accuraccy dropped a little because we loose some
I've created a notebook that almost replicates your drop in accuracy.
I think that most likely error is actually retraining PCA - if you fit PCA on train set, then fit classifier, and then try to run it on principal components retrieved from the test set, then you use incorrect parameter space for the classifier - classifier uses train set principal components as coordinates, and then you run it on test set PCs.2017-09-18 02:02:50
I think you have a wrong hypothesis to verify.
In general, applying PCA before building a model will NOT help to make the model perform better (in terms of accuracy)!
This is because PCA is an algorithm that does not consider the response variable / prediction target into account. PCA will treat the feature has large variance as important features, but the feature has large variance can have noting to do with the prediction target.
This means, you can produce a lot of useless features and eliminate useful features after PCA.
Please check my answer here for details and some demo.
How to decide between PCA and logistic regression?2017-09-18 02:03:29