 How to solve a stiff nonlinear secondorder ode?
 Numerical Integral of a “Complicated” Function
 Evaluating standard functions of general quaternions symbolically
 Finding the centroid of the area between two curves
 How can I fit a matrix function with multiple variables to given eigenvalues?
 GPU Acceleration and the Predict function
 Artistic image vectorization
 KFold Cross Validation
 Histogram Bar Line Thickness
 Solve equations to get the expression for specific varible
 Lefthandedness and Righthandedness  Are they genetic?
 Was ORen Ishii's sword made by Hattori Hanzo?
 Why IBM workers didn't impress NASA?
 What is the name of this movie/episode that aired on PBS?
 Dativo superfluo: Meaning of “Me cansé de que me tomes la cerveza”
 I don't know where exactly to put this, but i need help with my research paper
 Accuracy of Alfred Edersheim's “Life and Times of Jesus the Messiah”
 Semantic Relationships Between Sentences
 How fast do people lose their accents and regain them?
 How is the situation of “divorce her quietly” in Matthew 1:19 according to Catholic?
Pandas categorical variables encoding for regression (onehot encoding vs dummy encoding)
Pandas has a method called get_dummies() that creates a dummy encoding of a categorical variable. Scikitlearn also has a OneHotEncoder that needs to be used along with a LabelEncoder. What are the pros/cons of using each of them? Also both yield dummy encoding (k dummy variables for k levels of a categorical variable) and not onehot encoding (k1 dummy variables), how can one get rid of the extra category? How much of a problem does this dummy encoding create in regression models (collinearity issues  a.k.a. dummy variable trap)?
One advantage of get_dummies is that it can operate on values other than integers (so you don't need the LabelEncoder) and returns a DataFrame with the categories as column names. Also, you can conveniently drop one redundant category using drop_first=True.
One advantage of scikitlearn's OneHoteEncoder lies in the scikitlearn API. OHE gives you a transformer which you can apply to your training and test set separately if you specify

One advantage of get_dummies is that it can operate on values other than integers (so you don't need the LabelEncoder) and returns a DataFrame with the categories as column names. Also, you can conveniently drop one redundant category using drop_first=True.
One advantage of scikitlearn's OneHoteEncoder lies in the scikitlearn API. OHE gives you a transformer which you can apply to your training and test set separately if you specify the total number of categories. This doesn't work with get_dummies ,for example, if the training set misses categories present in the test set.
You can still delete categories by simply deleting columns from the resulting numpy array (e.g. using n_values_ or feature_indices_ to see which columns correspond to the same feature). Some models work regardless, for example treebased models. Also, L1 regularization can often set redundant features to zero (see Lasso regression).
5 days ago