- buffer-local advice
- What methods could humans use to survive chemical poisoning?
- Can two adjacent, life-sustaining planets orbit a star such that they are rarely near each other?
- What's the worst natural disaster that could hit New York City in our lifetime?
- How small could an Earth-like planet be while still realistically being able to sustain human life?
- How would lack of sunlight affect a human population?
- How would industrial civilizations contacting each other for the first time protect themselves from diseases?
- Could've Venus and Mercury be like Earth and the moon?
- After-life transmission of spiritual power (Shakthi)
- Programmatically Setting Entry Type Via Javascript
- A “Puzzling” Cipher
- Can anyone decipher this?
- The ancient Japanese runes
- Code Cracking 1984
- Gradient flow through concatenation operation
- Can I scrape data from government websites if there is no mention about commercial usage?
- Transfer Joomla website to other server Akeeba
- How do spectral methods work in the context of numerical weather prediction models?
- What is the significance of the D'' (D double prime) layer of the Earth's mantle?
- Is it possible to get work permit in Germany without degree but with solid offer/contract as a software developer

# K-means: why reduce dimensions first?

I'm a bit confused about the usefulness of reducing dimensions before doing a k-means clustering.

Suppose you want to apply k-means to a set points $(x_i)$ with high dimension. You want to minimize the cost function $\sum_i \|x_i-c_i\|^2$ where $c_i$ is the center of the cluster $x_i$ belongs to.

You have basically two methods:

A: do a k-means (Lloyd) directly on $(x_i)$

B: reduce the number of dimensions with some dimensional reduction method (such as SVD/PCA), and then apply k-means to the points with reduced dimensions

On the one hand, A is unlikely to find the global minimum, replications will help getting closer to it. It might have a high computational cost due to handling high dimension vectors and many replications.

On the other hand, B is more likely to get close to the global minimum (or even reach it) with fewer replications. But the minimum on the reduced version is not the original minimum (it is however known to be close to it). There is of course an