⛔️ ❣️ 👩‍🚀 一个简单的示例，按国家对R的酒精饮料偏好进行聚类分析 👨‍👩‍👧‍👧 ✈️ 👐🏾

哈Ha！今天，我想分享一个如何进行聚类分析的小例子。在此示例中，读者将找不到神经网络和其他流行的方向。该示例可以用作参考点，以便对其他数据进行小型完整的聚类分析。任何有兴趣的人-欢迎猫。

立即提出保留，本文决不声称它是完整的学术性，所获得结果的独特性或对该问题报道的完整性。本文旨在演示经典聚类分析的基本步骤，这些步骤可用于简单而有意义的研究（可能在更详细的研究之前）。欢迎对优点进行任何更正，评论和补充。

该数据是2010年按人均酒精饮料（啤酒，葡萄酒，烈酒等）类型划分的人均酒精消费量（占人均酒精消费量的百分比）的样本。数据还包含：人均每日平均酒精消费量（以纯酒精克为单位）和所有（记录的+未计算出的）人均酒精消费量（仅饮酒者以升纯酒精为单位）。

同时，每个国家有条件地属于以下地理区域之一：东部，中部和西部。由于各种原因，该划分是非常任意的，并且引起争议，但是我们将从现有的角度出发。数据来源-2014年全球酒精与健康状况报告，S。289-364

（手绘，可能有错误，但是我认为总体思路是可以理解的）

初步分析

连接使用的库。

library(rgl)
library(heplots)
library(MVN)
library(klaR)
library('Morpho')
library(caret)
library(mclust)
library(ggplot2)
library(GGally)
library(plyr)
library(psych)
library(GPArotation)
library(ggpubr)

, .

#    
data <- read.table("alcohol_data.csv", header=TRUE,  sep=",")
#      
rownames(data) <- make.names(data[,1], unique = TRUE)
#     ,   
data <- data[,-1]
data <- na.omit(data)
#    
head(data)

	Beer	Wine	Spirit	Other	Total	Average_daily	Group
Albania	31.8	19.8	48.4	0.0	13.0	27.5	center
Armenia	9.7	5.3	84.9	0.0	8.3	17.9	east
Austria	50.4	35.5	14.0	0.0	13.8	29.6	center
Azerbaijan	28.7	7.6	63.3	0.0	5.2	11.1	east
Belarus	17.3	5.2	46.6	30.9	22.1	48.0	east
Belgium	49.2	36.3	14.4	0.1	12.8	27.7	center
...	...	...	...	...	...	...	...

summary(data)

, . , Other , , , , . , , , , . , . - .

, , , .

options(rgl.useNULL=TRUE)
open3d()
mfrow3d(2,2)
levelColors <- c('west'='blue', 'east'='red', 'center'='yellow')
plot3d(data$Beer, data$Wine, data$Spirit, xlab="Beer", ylab="Wine", zlab="Spirit", col = levelColors[data$Group], size=3)

widget <- rglwidget()
widget

, . , .

ggpairs(
  data,
  mapping = ggplot2::aes(color = data$Group),
  upper = list(continuous = wrap("cor", alpha = 0.5), combo = "box"),
  lower = list(continuous = wrap("points", alpha = 0.3), combo = wrap("dot", alpha = 0.4)),
  diag = list(continuous = wrap("densityDiag",alpha = 0.5)),
  title = "Alcohol"
)

Average Total , Average.

data <- data[, -6]

, , , , . .

data[data$Wine>60,]

	Beer	Wine	Spirit	Other	Total	Group
Italy	23	65.6	11.5	0	9.9	west

, , , , - , , .

data[data$Spirit>70,]
data[data$Spirit<10,]

	Beer	Wine	Spirit	Other	Total	Group
Armenia	9.7	5.3	84.9	0	8.3	east

	Beer	Wine	Spirit	Other	Total	Group
Slovenia	44.5	46.9	8.6	0	17.2	west

, , .

split(data[,1:5],data$Group)

$center

	Beer	Wine	Spirit	Other	Total
Albania	31.8	19.8	48.4	0.0	13.0
Austria	50.4	35.5	14.0	0.0	13.8
Belgium	49.2	36.3	14.4	0.1	12.8
Bosnia.and.Herzegovina	73.3	9.7	17.0	0.0	12.3
Cyprus	40.9	24.7	33.7	0.7	10.8
Czech.Republic	53.5	20.5	26.0	0.0	14.6
Denmark	37.7	48.2	14.1	0.0	12.9
Finland	46.0	17.5	24.0	12.6	18.1
Germany	53.6	27.8	18.6	0.0	14.7
Hungary	36.3	29.4	34.3	0.0	16.3
Iceland	61.8	21.2	16.5	0.5	10.4
Ireland	48.1	26.1	18.7	7.7	14.7
Malta	39.4	32.7	27.2	0.7	11.5
Netherlands	46.8	36.4	16.9	0.0	11.2
Norway	44.2	34.7	19.0	2.1	9.0
Poland	55.1	9.3	35.5	0.0	24.2
Romania	50.0	28.9	21.1	0.0	21.3
Serbia	51.5	23.9	24.6	0.0	19.0
Sweden	37.0	46.6	15.1	1.4	13.3
Switzerland	31.8	49.4	17.6	1.2	12.1
Turkey	63.6	8.6	27.9	0.0	17.3
UK	36.9	33.8	21.8	7.5	13.8

$east

	Beer	Wine	Spirit	Other	Total
Armenia	9.7	5.3	84.9	0.0	8.3
Azerbaijan	28.7	7.6	63.3	0.0	5.2
Belarus	17.3	5.2	46.6	30.9	22.1
Bulgaria	39.3	16.5	44.1	0.1	16.9
Estonia	41.2	11.1	36.8	10.9	15.7
Georgia	17.0	49.8	33.2	0.1	21.2
Israel	44.0	6.2	49.5	0.3	5.4
Latvia	46.9	10.7	37.0	5.4	18.1
Lithuania	46.5	7.8	34.1	11.6	23.6
Republic.of.Moldova	30.4	5.1	64.5	0.0	25.4
Russian.Federation	37.6	11.4	51.0	0.0	22.3
Slovakia	30.1	18.3	46.2	5.5	19.8
Ukraine	40.5	9.0	48.0	2.6	20.3

$west

	Beer	Wine	Spirit	Other	Total
Croatia	39.5	44.8	15.4	0.2	15.1
France	18.8	56.4	23.1	1.7	12.9
Greece	28.1	47.3	24.2	0.4	15.6
Italy	23.0	65.6	11.5	0.0	9.9
Luxembourg	36.2	42.8	21.0	0.0	12.7
Portugal	30.8	55.5	10.9	2.8	22.6
Slovenia	44.5	46.9	8.6	0.0	17.2
Spain	49.7	20.1	28.2	1.8	16.4
Republic.of.Macedonia	47.4	39.9	12.6	0.0	11.7

ggpairs(
  data,
  mapping = ggplot2::aes(color = data$Group),
  diag=list(continuous="bar", alpha=0.4)
)

, , . Other, : , , , ( 10-12 , 45, , ). . , , , (). , , . Other .

, , — , — . , — , .
Total Other, . .

, Beer, Spirit Wine . , , , . , , , , , .

Total. , — .

data.group = data[,5]
data <- data[,-5]
data<- data[,-4]

Elbow method (“ ”, “ ”). , k, – W(K), .

library(factoextra)
fviz_nbclust(data, kmeans, method = "wss") +
  labs(subtitle = "Elbow method") +
  geom_vline(xintercept = 4, linetype = 2)

data.dist <- dist((data))
hc <- hclust(data.dist, method = "ward.D2")
plot(hc, cex = 0.7)

. .

colors=c('green', 'red', 'blue')
hcd = as.dendrogram(hc)
clusMember = cutree(hc, 4)
colLab <- function(n) {
    if (is.leaf(n)) {
        a <- attributes(n)
        labCol <- colors[data.group[n]]
        attr(n, "nodePar") <- c(a$nodePar, lab.col = labCol)
    }
    n
}
clusDendro = dendrapply(hcd, colLab)
plot(clusDendro, main = "Cool Dendrogram", type = "triangle")

rect.hclust(hc, k = 4)

. , .
, , , 4 .

plot(clusDendro, main = "Cool Dendrogram", type = "triangle")
data.hclas_group <- factor(cutree(hc, k = 3))

rect.hclust(hc, k = 3)

, , .

library(FactoMineR)
res.pca <- PCA(data,scale.unit=T, graph = F)
fviz_pca_biplot(res.pca, 
                col = colors[data.hclas_group], palette = "jco", 
                label = "var",
                ellipse.level = 0.8,
                 addEllipses = T,
                col.var = "black",
                legend.title = "groups4")

, , . , , , , . , , , k-++.

library(flexclust)
data.kk <- kcca(data, k=3, family=kccaFamily("kmeans"),
control=list(initcent="kmeanspp"))

fviz_pca_biplot(res.pca, 
                col.ind =as.factor(data.kk@cluster), palette = "jco", 
                label = "var",
                ellipse.level = 0.8,
                 addEllipses = T,
                col.var = "black", repel = TRUE,
                legend.title = "clusters")

, k- . , , .

, , hclust. .

, , . . , .

. . , , , . , , . , .

可以使用信息标准（此处为描述）基于聚类模型的假设进行聚类，也可以尝试对该数据集进行经典判别分析。如果这篇文章有用，我计划出版续集。

一个简单的示例，按国家对R的酒精饮料偏好进行聚类分析

初步分析

More articles: