- How to query inside jsonb column
- Extract max of value in Fact table using date filter
- What are the .pgAdmin4.2123502176.addr and .log files?
- postgresql - Importing CSV Data with Custom Type
- Photoshop CC2017 “An integer between 96 and 8 is required. The closest value inserted.”
- Why are VIRB 360 such poor quality?
- Extract public and private keys from JBOSS
- Are there any redundancies in the way I create this payload in msfvenom and then listen with multi handler? Payload does not execute
- Does non-conservation of number of particles imply zero chemical potential?
- A moment of cohomology$.$
- What is 'past null infinity'?
- Why do we have energy loss same for every length of wire in capacitor charging
- Particle is not at rest with respect to itself around Kerr spacetime?
- Quantum entanglement, how do we know there was no spin?
- Mechanism for light generation
- Why do some chemical reactions require many steps?
- Do creatures that enter the battlefield at the same time see each other enter?
- Pandemic Legacy Season 1 September: Where to search if the conditions can’t be met?
- Is stereoscopic vision via OTG and 2 USB cameras possible?
- Nexus 6P - Fix “No Command” with a broken volume button
Why PCA feature reduction make accuracy worse?
I'm trying to estimate how much feature reduction using PCA can help with increasing accuracy in case of classification using different ml methods. I'm using digits dataset available in scikit-learn. To do it, I'm checking accuracy using 64 features available, later using PCA, I reduce it to 63 features and accuracy decreases extremely:
64 | 0.966 +- 0.008
63 | 0.132 +- 0.0116619037897
64 | 0.96 +- 0.0
63 | 0.54 +- 0.0
64 | 0.974 +- 0.008
63 | 0.12 +- 0.022803508502
64 | 0.802 +- 0.0172046505341
63 | 0.11 +- 0.0126491106407
All calculations were repeated 5 times to get statistics. Before using PCA (64 features) scores where quite good in all cases. After, In case of all tested methods apart from SVM, it was practically random (there're 10 classes). I would understand that accuraccy dropped a little because we loose some
I've created a notebook that almost replicates your drop in accuracy.
I think that most likely error is actually retraining PCA - if you fit PCA on train set, then fit classifier, and then try to run it on principal components retrieved from the test set, then you use incorrect parameter space for the classifier - classifier uses train set principal components as coordinates, and then you run it on test set PCs.2017-09-18 02:02:50
I think you have a wrong hypothesis to verify.
In general, applying PCA before building a model will NOT help to make the model perform better (in terms of accuracy)!
This is because PCA is an algorithm that does not consider the response variable / prediction target into account. PCA will treat the feature has large variance as important features, but the feature has large variance can have noting to do with the prediction target.
This means, you can produce a lot of useless features and eliminate useful features after PCA.
Please check my answer here for details and some demo.
How to decide between PCA and logistic regression?2017-09-18 02:03:29