本文主要介绍如何利用R语言进行分组排序,以及取出每组的最大值。灵感来自公众号world of statistics pandas分组排序一文。
正文
构造4个班级不同同学的语文成绩及数学成绩。目标:提取出每个班级语文成绩最高的同学。
library(dplyr)
rawdata <-
data.frame(
class.name = sample(paste("class", 1:4),100, replace = TRUE ),
student = sample(paste("student", 1:100), replace = FALSE ),
语文 = runif(100, min = 60, max = 100),
数学 = runif(100, min = 60, max = 100)
)
head(rawdata)
# class.name student 语文 数学
#1 class 3 student 43 70.07040 82.16286
#2 class 1 student 89 82.55545 91.78050
#3 class 3 student 25 72.34546 74.85217
#4 class 3 student 48 91.03874 76.56048
#5 class 1 student 96 78.05422 68.67384
#6 class 3 student 51 64.35143 89.77850
取出每个班级中语文最高的同学成绩
rawdata %>% group_by(class.name) %>% mutate(rank = dense_rank(-语文)) %>% filter(rank==1)
# class.name student 语文 数学 rank
# <chr> <chr> <dbl> <dbl> <int>
#1 class 2 student 93 97.4 77.0 1
#2 class 4 student 9 99.5 63.7 1
#3 class 1 student 44 99.7 60.4 1
#4 class 3 student 6 99.4 60.7 1
首先利用group_by函数进行分组,其次利用dense_rank函数生成语文成绩排名,最后利用filter函数取出排名为第一位的同学,即为每个班级语文成绩最高的同学。
所有同学排序
rawdata %>% group_by(class.name) %>% mutate(rank = dense_rank(-语文)) %>% arrange(class.name, rank)
# class.name student 语文 数学 rank
# <chr> <chr> <dbl> <dbl> <int>
# 1 class 1 student 44 99.7 60.4 1
# 2 class 1 student 98 98.9 82.5 2
# 3 class 1 student 21 96.2 71.9 3
# 4 class 1 student 13 95.9 76.0 4
# 5 class 1 student 91 95.8 62.5 5
# 6 class 1 student 50 95.8 68.8 6
# 7 class 1 student 33 95.2 63.0 7
# 8 class 1 student 31 94.0 97.2 8
# 9 class 1 student 35 91.2 89.5 9
#10 class 1 student 11 89.8 70.6 10
如有帮助请多多点赞哦!
文章转载自日常分享的小懒猫,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




