当前位置：首页>学习笔记>优秀学员R学习笔记(三):Tidyverse及其应用

优秀学员R学习笔记(三):Tidyverse及其应用

2026-05-28 04:16:22

这篇推文基本上是搬运自生信技能树的小洁老师，因为我自己并不太了解这一块，只能说是照猫画虎，因此我几乎没做任何修改

0. Tidyverse介绍

这个是一个能让编程更加优雅的小体系，主要包含stringr，dplyr，以及tidyr。stringr主要是处理字符串的，dplyr主要是处理数据框的，tidyr主要是处理数据框的格式的。Tidyverse体系下的代码极其符合英语的规律，看一眼就知道是啥意思，这里给到顶级！

1. stringr

library(stringr)# 构造示例数据    title <- c("A375 cells 24h Control rep1","A375 cells 24h Control rep2","A375 cells 24h Control rep3","A375 cells 24h Vemurafenib rep1","A375 cells 24h Vemurafenib rep2","A375 cells 24h Vemurafenib rep3"    )    title

 [1] "A375 cells 24h Control rep1"     "A375 cells 24h Control rep2"     [3] "A375 cells 24h Control rep3"     "A375 cells 24h Vemurafenib rep1" [5] "A375 cells 24h Vemurafenib rep2" "A375 cells 24h Vemurafenib rep3"

1.1 拆分

# 将每个元素按空格拆分，简化为矩阵    str_split(title, " ",simplify = TRUE)

      [,1]   [,2]    [,3]  [,4]          [,5]   [1,] "A375" "cells" "24h" "Control"     "rep1" [2,] "A375" "cells" "24h" "Control"     "rep2" [3,] "A375" "cells" "24h" "Control"     "rep3" [4,] "A375" "cells" "24h" "Vemurafenib" "rep1" [5,] "A375" "cells" "24h" "Vemurafenib" "rep2" [6,] "A375" "cells" "24h" "Vemurafenib" "rep3"

 str_split_i(title," ",4) #单独把第4个元素拆出来

 [1] "Control"     "Control"     "Control"     "Vemurafenib" "Vemurafenib" [6] "Vemurafenib"

1.2 内容定位和检测

# 将每个元素按空格拆分    title_words <- str_split(title, " ")# 取title_words的第一个元素作为后续的示例数据    title2 <- title_words[[1]]    title2

 [1] "A375"    "cells"   "24h"     "Control" "rep1"

# 检测每个元素是否包含h    str_detect(title2, "h")

 [1] FALSE FALSE  TRUE FALSE FALSE

1.3 替换和删除

# 将每个词中首次出现的 'o' 替换为 'A'    str_replace(title2, "o", "A")

 [1] "A375"    "cells"   "24h"     "CAntrol" "rep1"

# 删除每个词中首次出现的所有 'o'    str_remove(title2, "o")

 [1] "A375"   "cells"  "24h"    "Cntrol" "rep1"

# 删除每个词中所有出现的 'o'    str_remove_all(title2, "o")

 [1] "A375"  "cells" "24h"   "Cntrl" "rep1"

2. dplyr

    library(dplyr)test <- iris[c(1:2,51:52,101:102),]

2.1 常用函数

1. mutate() 新增列

    mutate(test, new = Sepal.Length * Sepal.Width)

     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species   new 1            5.1         3.5          1.4         0.2     setosa 17.85 2            4.9         3.0          1.4         0.2     setosa 14.70 51           7.0         3.2          4.7         1.4 versicolor 22.40 52           6.4         3.2          4.5         1.5 versicolor 20.48 101          6.3         3.3          6.0         2.5  virginica 20.79 102          5.8         2.7          5.1         1.9  virginica 15.66

2. arrange() 排序

# 按 Sepal.Length 升序排列    arrange(test, Sepal.Length)

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species 1          4.9         3.0          1.4         0.2     setosa 2          5.1         3.5          1.4         0.2     setosa 3          5.8         2.7          5.1         1.9  virginica 4          6.3         3.3          6.0         2.5  virginica 5          6.4         3.2          4.5         1.5 versicolor 6          7.0         3.2          4.7         1.4 versicolor

# 按 Sepal.Length 降序排列    arrange(test, desc(Sepal.Length))

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species 1          7.0         3.2          4.7         1.4 versicolor 2          6.4         3.2          4.5         1.5 versicolor 3          6.3         3.3          6.0         2.5  virginica 4          5.8         2.7          5.1         1.9  virginica 5          5.1         3.5          1.4         0.2     setosa 6          4.9         3.0          1.4         0.2     setosa

3. distinct()去重复

# 根据Species列对整个数据框去重复    distinct(test, Species, .keep_all = TRUE)

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species 1          5.1         3.5          1.4         0.2     setosa 2          7.0         3.2          4.7         1.4 versicolor 3          6.3         3.3          6.0         2.5  virginica

2.2 管道操作 %>%

管道操作是把左侧表达式的结果，作为右侧函数的输入，从而把一连串数据处理步骤“串起来”。快捷键是(ctrl + shift + M)。

下列两句代码的效果相同：

    head(test,2)

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1          5.1         3.5          1.4         0.2  setosa 2          4.9         3.0          1.4         0.2  setosa

test %>% head(2)

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1          5.1         3.5          1.4         0.2  setosa 2          4.9         3.0          1.4         0.2  setosa

管道符号也支持连续使用多个，避免代码层层嵌套。

#先按照Species分组，再计算每组的Sepal.Length平均值test %>%       group_by(Species) %>%       summarise(sepal_len_mean = mean(Sepal.Length))

  A tibble: 3 × 2   Species    sepal_len_mean   <fct>               <dbl> 1 setosa               5    2 versicolor           6.7  3 virginica            6.05

3. tidyr

3.1 原始数据

    library(tidyr)    mat <- matrix(c(1, 4, 7, 10,                       2, 5, 0.8, 11,                     0.3, 6, 9, 12),                  nrow = 4,                  dimnames = list(paste0("gene", 1:4),                                   paste0("sample",1:3)))test = as.data.frame(mat)#将行名转换为一列test = tibble::rownames_to_column(test,"geneid")

   geneid sample1 sample2 sample3 1  gene1       1     2.0     0.3 2  gene2       4     5.0     6.0 3  gene3       7     0.8     9.0 4  gene4      10    11.0    12.0

3.2 宽变长

使用 pivot_longer 将宽格式转为长格式，使样本名落入一列，表达值落入另一列，适合ggplot2绘图。

    test1 <- pivot_longer(data = test,                           cols = -geneid,                           names_to = "sample_nm",                           values_to = "exp")    head(test1)

 A tibble: 6 × 3 geneid sample_nm   exp  <chr>  <chr>     <dbl> 1 gene1  sample1     1   2 gene1  sample2     2   3 gene1  sample3     0.3 4 gene2  sample1     4   5 gene2  sample2     5   6 gene2  sample3     6

3.3 长变宽

使用 pivot_wider 将长格式转为宽格式。宽格式的数据更直观易读，适合画热图。

    test2 <- pivot_wider(data = test1,                         names_from = sample_nm,                         values_from = exp)    test2

 A tibble: 4 × 4   geneid sample1 sample2 sample3   <chr>    <dbl>   <dbl>   <dbl> 1 gene1        1     2       0.3 2 gene2        4     5       6   3 gene3        7     0.8     9   4 gene4       10    11      12

本文来自网友投稿或网络内容，如有侵犯您的权益请联系我们删除，联系邮箱：wyl860211@qq.com 。

1.2 内容定位和检测

1.3 替换和删除

2. dplyr

2.1 常用函数

1. mutate() 新增列

2. arrange() 排序

3. distinct()去重复

2.2 管道操作 %>%

3. tidyr

3.1 原始数据

3.2 宽变长

3.3 长变宽

优秀学员R学习笔记(三):Tidyverse及其应用

这篇推文基本上是搬运自生信技能树的小洁老师，因为我自己并不太了解这一块，只能说是照猫画虎，因此我几乎没做任何修改

0. Tidyverse介绍

1. stringr

1.1 拆分

最新文章

热门文章

随机文章

优秀学员R学习笔记(三):Tidyverse及其应用

这篇推文基本上是搬运自生信技能树的小洁老师，因为我自己并不太了解这一块，只能说是照猫画虎，因此我几乎没做任何修改

0. Tidyverse介绍

1. stringr

1.1 拆分

1.2 内容定位和检测

1.3 替换和删除

2. dplyr

2.1 常用函数

1. mutate() 新增列

2. arrange() 排序

3. distinct()去重复

2.2 管道操作 %>%

3. tidyr

3.1 原始数据

3.2 宽变长

3.3 长变宽

12号,基金定投心得和学习笔记分享

优秀学员R学习笔记(六):转录组数据的读取和处理

最新文章

热门文章

随机文章