Description
- Apply PCA on the indepenent variables of IRIS Dataset. Hint: Use prcomp function from stats library Load the data and keep only the first 4 variables.
idf <- iris[,-5]
model <- prcomp(idf)
- Plot the proportion of variancane explained by each component. How many components will you choose to capture maximum variability in the data set?
plot(model)
- Apply normalization (scaling and centering) to your data. Recalculate PCA with normalized data. What is the effect of normalizing (centering and scaling) on the PCA results? How many components are more explanatory now?
model_2 <- prcomp(idf, center=T, scale. = T)
summary(model)
## Importance of components:
## PC1 PC2 PC3 PC4
## Standard deviation 2.0563 0.49262 0.2797 0.15439
## Proportion of Variance 0.9246 0.05307 0.0171 0.00521
## Cumulative Proportion 0.9246 0.97769 0.9948 1.00000
summary(model_2)
## Importance of components:
## PC1 PC2 PC3 PC4
## Standard deviation 1.7084 0.9560 0.38309 0.14393
## Proportion of Variance 0.7296 0.2285 0.03669 0.00518
## Cumulative Proportion 0.7296 0.9581 0.99482 1.00000
plot(model_2)
- Boxplot the original dataset and transformed one. What do you observe? Biplot the first two PCs.
- Visualize first two components of your PCA. Hint: biplot
- Check the correlations of the original dataset and the correlations of the PCs.