Commit e61bb6b1 authored by Klaas Winter's avatar Klaas Winter

Added standardisation to the data before k-means is applied

Currently data is not standardised before k-means is applied. This can
have adverse effects when the variables clustered on do not have the
same scale. Standardisation should fix this.
parent cdedf721
...@@ -24,6 +24,9 @@ cluster.kmeans <- function(rows = c(), vars, identifier, centers = NA, iter.max= ...@@ -24,6 +24,9 @@ cluster.kmeans <- function(rows = c(), vars, identifier, centers = NA, iter.max=
if (is.na(centers)) centers <- floor(sqrt(nrow(clusterData))) if (is.na(centers)) centers <- floor(sqrt(nrow(clusterData)))
# Make sure that the data is standardised
clusterData <- scale(clusterData)
clusters <- as.factor(stats::kmeans(clusterData, centers, iter.max)$cluster) clusters <- as.factor(stats::kmeans(clusterData, centers, iter.max)$cluster)
clusters <- data.frame(row = clusterRows, cluster = clusters) clusters <- data.frame(row = clusterRows, cluster = clusters)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment