Sunday, 11 August 2013

R : create a subset of a data.frame by removing specific rows

R : create a subset of a data.frame by removing specific rows

I am trying to subdivide a "mother data.frame" into three data.frames :
the mother data.frame, called dfrm, has different variables including "id"
(identification), "time" (three time points), a numerical variable "Ht",
and a factor "fac" with 3 levels depending on Ht. I created 2 data.frames,
dfrm2 and dfrm3, using the ddply function, sorting subjetcs having a
certain level of the "fac" variable AT EACH OF THE THREE TIME POINTS :
id <- rep(c(seq(1,50,1)),3)
time <- factor(rep(c("day1", "day2", "day3"), c(50,50,50)),
levels=c("day1", "day2", "day3"), labels=c("day1", "day2", "day3"),
ordered=TRUE)
Ht <- rnorm(150, mean=30, sd=3)
A <- rnorm(150, mean=7, sd=10)
df <- as.data.frame(cbind(id,time,Ht,A))
head(df)
fac <- factor(cut(df$Ht, breaks=c(1,30,35,100),
labels=c("<30%","<35%", ">35%"), include.lowest=TRUE))
dfrm <- as.data.frame(cbind(df,fac))
library(plyr)
dfrm2 <- ddply(dfrm, "id", function(x) if(all(x$fac=="<30%")) x else NULL)
nrow(dfrm2)
[1] 18
dfrm3 <- ddply(dfrm, "id", function(x) if(all(x$fac=="<35%")) x else NULL)
nrow(dfrm3)
[1] 6
I would like to create the third data.frame, with all the rows that have
not been selected in dfrm2 or dfrm3 : up to now i was not successful... I
think the idea could be to indicate R to remove rows from the mother dfrm
according to "id" not selected yet. Can someone help me on this ? Thank
you very much in advance ! Regards, good day to all..

No comments:

Post a Comment