Rate is not a probability
- model time? e.g., log T = Z/linear predictor –> censoring/ missing data problem. the out come time-to-event is not always observed due to censoring/competing risk.
- Model rate: the aggregated data: known the total number of years and the total number of events
- why rate works? the rate param captures the instantaneous occurrence of the outcome at any given time.(count of success = 每个瞬时时间点上的发生数量,乘上观测总时间)
- while risk param is the probability of an event occurring within a specific time period.
- risk-rate connection:
- In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate(?) and independently of the time since the last event.
- Poisson model
- The likelihood: D ~ Poisson(lambda*Y), where log(lambda) = something (the log-linear model), then parametrize log(lambda) in terms of alpha and beta. substitute the model into the PDF to get the likelihood.
2024-06-09更新
- 傻傻分不清到底哪个是constant –> rate is constant(instantaneous occurrence of the outcome at any given time). 我们把总时间Y划分为N个小bin,每个小bin为一个单位unit h, that is, Y = Nh. h足够小的时候,每一个小bin只有一个事件发生,则每个小bin都是一个Bern(pi),–> 每个小bin的expected number = pi(Bern dist property). mu = pi(risk) (1),
- 在pois dist,我们有 mu = lambda(rate)*h(2)
- 这样结合(1)和 (2),我们有了rate-risk connection: mu = pi(risk)=lambda(rate)h
- 然后我们就可以算survival probability啦:survival up to time T = (1 - risk)^N [T时间内一共有N个bin]= (1-lambda*h)^N –> approx. exp(-lambda*T)其中,lambda*T 是cumulative hazard. surv prob(T) = (1-lambda*h)^N = Aexp(-lambda*T)
Write the likelihood
写likelihood的时候,我们的结局变量outcome包含了 event status and time, 所以我们写的是每个人的一个joint pdf: prob of (event happened/censored at time t),并且是marginal的,并不是conditional。
RV 有哪些呢: (i)censoring dist; (2) latent event time dist;
但是我们的数据是observed data, 我们的模型是wrt latent event time的, 所以我们需要一些假设们来得到一些identification.
(A1): indep censoring :censoring time and latent event time indep given covariates
(A2):non-informative censoring: censoring not involve theta(param of interest),所以在likelihood里面可以把它当作constant,不用放进likelihood里边写出general case的likelihood之后呢,可以看到需要代入 lambda(t) (hazard rate) 和survival function(t), 这就非常的简单~~以pois为例子:lambda(t) = lambda(constant); S(t) 和 hazard function 是有一个公式联系起来的,S(T) = exp(-lambda(t)*T).带进去就行。
constant hazard model写likelihood可以用几种方式来写,可见cheatsheet
immortal time bias
KM survival curve calculation
一生总要计算一次的KM curve重要总结:
#1: 先写表格: 1)表格第一列为时间段,左闭右开,每个time interval 的下界为事件发生的时间点们,比如对于control arm,在observed time= 1.5的时候就没有事件发生,survival curve不会下降,就没有必要写time interval [1.25,1.5),而是直接写[1.25,2),左闭右开可以这样理解,比如treatment arm的第一个death发生在observed time=1, time interval则是[0,1),在这个区间里,我们依然认为这个人是at-risk,另外还有一种情况是有人在observed time = 1的时候删失了,那这个人还是at-risk,因为我们的区间是不包含1的,至少在time=1之前这个人是at-risk 所以用这样的区间就比较直观; 2) 然后censor(c),deathD(d),at-risk(r)人数,1-d/r, survival则可以计算为(1-d/r)的累乘
#2: 然后是画图,我们可以把删失用+号表示
#3: 找risk set的以后,我们也可以通过在follow up图上画一个vertical line,交点的个数就是# of pts at-risk at time t.