- What does 望むところだ mean?
- Should a non native try to adapt his German when traveling to Austria?
- Can one be compelled to testify?
- Why divorcing your first wife should be done only in extreme cases?
- Using sql to get testing job
- Hibernate Criteria Date restriction between two dates
- Need help isolating a laptop camera related issue (camera does not start on Chrome, but would otherwise work fine, only happens to 1 tester)
- how do you tune speaker systems?
- Other than tone, are there reasons to consider a semi-hollow over a solid body electric guitar?
- Is there any reason not to use TRS cables for typically TS applications?
- Tools to automate “tape-like” audio manipulation
- How to add SPBuiltInFieldId.LinkFilenameNoMenu to library view
- How to display a column of another list in display form of a list
- Sharepoint On Premise with Azure Application insights
- Enable Document Set Missing from Site Settings - Site Features
- TPMS sensor manufacturers
- What lug pattern fits a 1994 Dodge Dakota?
- How does a tire's diameter and width impact fuel economy?
- engine not working properly after the distributor change
- bent tabular rear control arm
Why PCA feature reduction make accuracy worse?
I'm trying to estimate how much feature reduction using PCA can help with increasing accuracy in case of classification using different ml methods. I'm using digits dataset available in scikit-learn. To do it, I'm checking accuracy using 64 features available, later using PCA, I reduce it to 63 features and accuracy decreases extremely:
64 | 0.966 +- 0.008
63 | 0.132 +- 0.0116619037897
64 | 0.96 +- 0.0
63 | 0.54 +- 0.0
64 | 0.974 +- 0.008
63 | 0.12 +- 0.022803508502
64 | 0.802 +- 0.0172046505341
63 | 0.11 +- 0.0126491106407
All calculations were repeated 5 times to get statistics. Before using PCA (64 features) scores where quite good in all cases. After, In case of all tested methods apart from SVM, it was practically random (there're 10 classes). I would understand that accuraccy dropped a little because we loose some
I've created a notebook that almost replicates your drop in accuracy.
I think that most likely error is actually retraining PCA - if you fit PCA on train set, then fit classifier, and then try to run it on principal components retrieved from the test set, then you use incorrect parameter space for the classifier - classifier uses train set principal components as coordinates, and then you run it on test set PCs.2017-09-18 02:02:50
I think you have a wrong hypothesis to verify.
In general, applying PCA before building a model will NOT help to make the model perform better (in terms of accuracy)!
This is because PCA is an algorithm that does not consider the response variable / prediction target into account. PCA will treat the feature has large variance as important features, but the feature has large variance can have noting to do with the prediction target.
This means, you can produce a lot of useless features and eliminate useful features after PCA.
Please check my answer here for details and some demo.
How to decide between PCA and logistic regression?2017-09-18 02:03:29