重複を削除してデータ抽出 R

library("dplyr")
distinct(重複判定する項目(群), .keep_all = FALSE)

distinct()は重複判定するキー項目(群)を与えると、キー項目が重複するデータは初出を残し、二回目以降を削除します
　.keep_all = FALSE　とすると、キー項目だけが戻り値となります。
　.keep_all = TRUE　とすると、キーでない項目も含めて全項目が戻り値となります。

例

#----- COVID-19診断日 + 2week > 入院日
#----- 入院日 + 1week > COVID-19診断日
#----- 793 pts
diag_date <- res2 %>% 
  group_by(患者ID) %>% 
  summarise(diag_date = min(有効開始日)) %>% 
  distinct(患者ID, diag_date) 

ここで、患者IDとdiag_dateが取り出せるのは、患者IDとdiag_dateをgroup_byとsummarizeで使用したから。

summarize の中身

# A summary applied to ungrouped tbl returns a single row
mtcars %>%
  summarise(mean = mean(disp), n = n())
#>       mean  n
#> 1 230.7219 32

# Usually, you'll want to group first
mtcars %>%
  group_by(cyl) %>%
  summarise(mean = mean(disp), n = n())
#> # A tibble: 3 × 3
#>     cyl  mean     n
#>   <dbl> <dbl> <int>
#> 1     4  105.    11
#> 2     6  183.     7
#> 3     8  353.    14

# dplyr 1.0.0 allows to summarise to more than one value:
mtcars %>%
   group_by(cyl) %>%
   summarise(qs = quantile(disp, c(0.25, 0.75)), prob = c(0.25, 0.75))
#> `summarise()` has grouped output by 'cyl'. You can override using the
#> `.groups` argument.
#> # A tibble: 6 × 3
#> # Groups:   cyl [3]
#>     cyl    qs  prob
#>   <dbl> <dbl> <dbl>
#> 1     4  78.8  0.25
#> 2     4 121.   0.75
#> 3     6 160    0.25
#> 4     6 196.   0.75
#> 5     8 302.   0.25
#> 6     8 390    0.75

# You use a data frame to create multiple columns so you can wrap
# this up into a function:
my_quantile <- function(x, probs) {
  tibble(x = quantile(x, probs), probs = probs)
}
mtcars %>%
  group_by(cyl) %>%
  summarise(my_quantile(disp, c(0.25, 0.75)))
#> `summarise()` has grouped output by 'cyl'. You can override using the
#> `.groups` argument.
#> # A tibble: 6 × 3
#> # Groups:   cyl [3]
#>     cyl     x probs
#>   <dbl> <dbl> <dbl>
#> 1     4  78.8  0.25
#> 2     4 121.   0.75
#> 3     6 160    0.25
#> 4     6 196.   0.75
#> 5     8 302.   0.25
#> 6     8 390    0.75

# Each summary call removes one grouping level (since that group
# is now just a single row)
mtcars %>%
  group_by(cyl, vs) %>%
  summarise(cyl_n = n()) %>%
  group_vars()
#> `summarise()` has grouped output by 'cyl'. You can override using the
#> `.groups` argument.
#> [1] "cyl"