- Called of God or called of man?
- How would a Reformed theologian distinguish his or her view of atonement?
- Evidence that Peter alone was made a Bishop by Christ?
- How do you use the future of the verb devoir when using another verb aswell?
- clarify diffence between present, perfect, imperfect and pluperfect in French
- AR filter length
- (Faster) RCNN dataset physical markers
- how to create a DFT with scalloping/spectal leakage
- correctnes of multi prime RSA algorithm
- Can PRF F with generator P be secure?
- What to expect about the mobile phone usage during a flight to China and based on the CAAC regulations
- At what time of day does US visa expire?
- Travel Issues to Turkey
- Can a US green card holder transiting through China leave the airport? Recommendation for sight-seeing in Shanghai?
- Amtrak Seating NY to DC
- The soul as the form of the body – considering massive changes of the body
- Why can't uniformity of nature (in principle) be proven deductively?
- Need help to understand Kierkegaard's “An Ecstatic Discourse”
- Understanding Herr and its various meanings
- What is the english translation of ,,dagobertinisch"?
Why PCA feature reduction make accuracy worse?
I'm trying to estimate how much feature reduction using PCA can help with increasing accuracy in case of classification using different ml methods. I'm using digits dataset available in scikit-learn. To do it, I'm checking accuracy using 64 features available, later using PCA, I reduce it to 63 features and accuracy decreases extremely:
64 | 0.966 +- 0.008
63 | 0.132 +- 0.0116619037897
64 | 0.96 +- 0.0
63 | 0.54 +- 0.0
64 | 0.974 +- 0.008
63 | 0.12 +- 0.022803508502
64 | 0.802 +- 0.0172046505341
63 | 0.11 +- 0.0126491106407
All calculations were repeated 5 times to get statistics. Before using PCA (64 features) scores where quite good in all cases. After, In case of all tested methods apart from SVM, it was practically random (there're 10 classes). I would understand that accuraccy dropped a little because we loose some
I've created a notebook that almost replicates your drop in accuracy.
I think that most likely error is actually retraining PCA - if you fit PCA on train set, then fit classifier, and then try to run it on principal components retrieved from the test set, then you use incorrect parameter space for the classifier - classifier uses train set principal components as coordinates, and then you run it on test set PCs.2017-09-18 02:02:50
I think you have a wrong hypothesis to verify.
In general, applying PCA before building a model will NOT help to make the model perform better (in terms of accuracy)!
This is because PCA is an algorithm that does not consider the response variable / prediction target into account. PCA will treat the feature has large variance as important features, but the feature has large variance can have noting to do with the prediction target.
This means, you can produce a lot of useless features and eliminate useful features after PCA.
Please check my answer here for details and some demo.
How to decide between PCA and logistic regression?2017-09-18 02:03:29