1 Introduction
It is of great importance in epidemiology and public health to identify and quantify causal effects of a factor on the risks of diseases and death. The ultimate aim of such research is to decrease such risks by modifying the factors that can be modified. Factors that can be modified, although not always easily and completely, may be treatments, exposures, lifestyle. Modification will be realized through an intervention. It will be possible to compute the effects of interventions if, and only if, the causal effects have been correctly estimated. Having understood the difficulty of establishing it, epidemiologist have long be reluctant to speak of causality, although Hill (1965) stated a list of common sense criteria which are still useful. More recently, statisticians developed formalisms for causality; two broad approaches can be distinguished: the “potential outcome” and the “dynamic” approaches.
The use of potential outcomes has first been proposed by Jerzy Neyman in 1923 (SplawaNeyman, 1990) but was formulated in modern notation much later by Rubin (1974). Since then, causal inference based on potential outcomes have been developed in a series of papers and has become the dominant school of causal inference in biostatistics and has also been influential in other fields such as econometrics (Heckman, 2005). This theory was reviewed by Holland (1986) and Rubin (2005) among others. However, the potential outcome approach has been criticized (Dawid, 2000; Commenges, 2019a).
Another approach relies on developing dynamical models. This is in line with the definition of Granger (1969) working with time series and has been given a more powerful formalism by the works of Schweder (1970) and Aalen (1989), further developed by Arjas and Parner (2004) and Didelez (2008). Of course, the use of a dynamical model is not sufficient for causal interpretation but it is possible to formalize the assumption needed, particularly through the concept of “system” elaborated in Commenges and GégoutPetit (2009), in Chapter 9 of Commenges and JacqminGadda (2015) and in Commenges (2019b); and this was called “the stochastic system approach to causality”.
One issue that has retained much attention recently is that the factor of interest, that will be called “treatment” in the following, may be dynamic and maybe influenced by factors linked to the outcome. Robins and coworkers used the potential outcomes approach to these complex problems (Robins et al., 2000; Robins and Hernán, 2009). It was shown that these problems can also be tackled through the dynamic approach (Arjas, 2012; Prague et al., 2017). Finally, when the causal effect has been estimated, one can compute the effects of interventions, and one can further try to find the best interventions. These interventions can be personalized and adaptive, leading to the search of optimal treatment regime. This topic has attracted much attention recently (Murphy, 2003; Chakraborty and Murphy, 2014; Saarela et al., 2015; Shen et al., 2017; Hager et al., 2018).
One aim of this paper is to show that within the stochastic system approach to causality, the effects of interventions can be estimated from observational studies without resorting to counterfactuals or potential outcomes; this can be done either in a direct way or through the use of marginal structural models. Thus the effects of any complex interventions can be estimated in a way similar to Pearl (2000)
but with the important difference that our approach is based on stochastic processes rather than random variables. The other aim is to show that optimal and suboptimal strategies can be designed and estimated. Our approach incorporates realistic modeling of important features, acknowledging that biological processes live in continuous time while observations are made at discrete times and with errors; see
Aalen et al. (2016) for the importance of continuous time.In Section 2 we recall the main features of the stochastic system approach to causality and we introduce a lead example that will be used to illustrate the theory. In Section 3, we define the effects of interventions, we see how to compute them and in Section 4 we examine some ways of designing strategies. Since finding optimal strategies is in general impractical, we propose parametric families of strategies. Then in Section 5, we examine the inference issue, showing that we can estimate the physical law directly and then compute the effect of any intervention. Once we can compute effects for any strategy, we can search for optimal strategies. In Section 6, we present a simulation study illustrating the choice of a strategy for containing the HIV load below a certain level while limiting the treatment burden. We then conclude.
2 The context
2.1 Recall of the Stochastic System Approach to Causality
In the framework of the stochastic system approach to causality, we consider a (voluntarily simple) system , where are counting processes or diffusion processes, is the outcome of interest, a treatment, an observed potential confounding factor; is really a confounding factor in an observational study; note that that the property of “confounding” depends on both the system and the law (Commenges, 2019b). Influences between processes are defined via “local independence” (Aalen, 1987), direct influence being the contrary of local independence. Given a system represented by a multivariate stochastic process which may include both counting and diffusion processes, a criterion of local independence is defined in terms of measurability of processes involved in the DoobMeyer representation. Commenges and GégoutPetit (2009) denoted the local independence by WCLI (weak local conditional independence) because they also defined a criterion of strong local independence (SCLI); when WCLI does not hold, there is direct influence, when SCLI does not hold while WCLI holds, there is indirect influence. In short, if a component of the stochastic process does not appear in the compensator of the DoobMeyer decomposition of we say that is WCLI of . A system is called “perfect” for if we cannot find a process which influences in the augmented system . A system is “NUC” (no unmeasured confounder) for the effect of on (in short for ) if there does not exist a process such that and ; see Commenges (2019b).
In this paper, we will assume that both systems and are NUC for . These assumptions allow interpreting influences as causal influences; in , will be treated as a random effect (see Section 2.2.2). If is perfect, conditional and marginal effects of on can be estimated (Commenges and GégoutPetit, 2009); if is NUC for , conditional effect of on with respect to but marginal with respect to other factors influencing , can be estimated (Commenges and JacqminGadda, 2015; Commenges, 2019b). To the system is associated the filtration , where is the sigmafield generated by , noted . We use the notation , and similarly for the other processes. Similarly, to the system is associated the filtration .
2.2 Physical law and treatment regime
2.2.1 Physical law
As in Commenges and GégoutPetit (2015), it is important to define the “physical law”, allowing for heterogeneity between subjects; here it is necessary to restore the subscript “i”. We assume that the DoobMeyer decomposition of in can be written:
(1) 
where is a function which does not depend on and the martingales are orthogonal. The conjunction of and the law of the martingales define the physical law of .
2.2.2 Observation
We must acknowledge that generally we do not completely observe for all . For instance if represents an event, its observation may be rightcensored and if is a quantitative marker, the observation is made at discrete times and with a measurement error. We denote by the observation of up to time time . Acknowledging this fact is important for both inference and for defining the treatment regime. We assume that both and are exactly observed. Thus the observation up to time is the sigmafield . The family of these sigmafields is the filtration . is not observed and can be considered as a random effect.
A more complex observation scheme arise if the visittimes are also random; their law may be described by a counting process (jumping at visit times), the compensator of which could depend on the past of . If we restrict to cases where it depends only on past values of , this mechanism is ignorable for inference (Commenges, 2019b). Adapting visittimes may lead to efficient strategies (Villain et al., 2019), but we do not develop this possibility in this paper.
2.2.3 Treatment regime
Often, there are visits at which is observed and the treatment can be changed just after the observation and is unchanged between visits. So, is a process which may change at times and which may be influenced by through the observations . Rather than writing a DoobMeyer decomposition in for the process , we write the distribution of .Taking the case where the treatment takes a binary value we have:
(2) 
The treatment regime is defined by the function . Note that it may depend on . Treatment regimes may vary between subjects because different doctors may have applied different protocols. In observational studies, the treatment is given preferentially when the observed marker indicates a degradation of the health of the patient: if we wish to estimate the effect of by naive methods we are confronted to “dynamic confounding”, or “indication bias”. Moreover, we do not generally know the treatment regime. On the other hand, we may wish to find an optimal treatment regime (or strategy) for future patients; in that case, there is no reason that depend on . That is, the strategy specified by does not depend on although the decision depends on all the information we have on subject at time .
2.3 Lead toy example
We shall use a lead toy example to make all the developments more concrete. Let be a quantitative marker which may represent the log of the viral load of HIV infected subjects, may represent infection by intravenous druguse and an antiretroviral treatment. Assume that, under the true probability, has a DoobMeyer decomposition in which in differential form is:
(3) 
with a normality assumption for , and where are true values; should be negative for the treatment to be efficient, and the are standard Brownian motions. Heterogeneity is explained by the variable but there can also be unexplained heterogeneity that can be represented by random effects on the baseline slope () and on the initial condition .
Assuming for simplicity that the number of visittimes is the same for all subjects, the observation are:
(4) 
with the
independent normal variables with zero mean and variance
.In order to generate data for a simulation study (see Section 6), we will have to specify the treatment regime. We will consider the case where the probability of being treated at depends on the observation and on the factor :
(5) 
If takes a large value, the treatment once given cannot be removed, the case considered by Hernán et al. (2002). There are, however, active research for the possibility of intermittent treatment (Lau et al., 2019). The influence graph of this system in the observed probability measure (Commenges and GégoutPetit, 2009) is depicted in Figure 1 (a).
2.3.1 Interventions, experiments, strategies
How can we use the knowledge that we can obtain about the physical law of ? The scientific context is that we assume that the compensator of represents a “physical law” which is stable (cannot be changed), while the treatment regime, that is the law of , can be manipulated. Thus, we assume that we can devise strategies which define a probability measure for which the compensator of is unchanged but the law of can be controlled, subject to the constraint that must remain adapted to the filtration . That is, the dynamics of may only depend on past observed values of (that is of past values of ). Thus, assuming the observations times fixed, a strategy is specified by functions such that:
(6) 
In most cases there is not need to make depend on . The terms “intervention”, “policy”, “strategy” are often used for naming a control which may depend on past information. In this paper, we use the general term of “intervention” to mean any type of control of the law of , and we reserve the term “strategy” for adaptive interventions, that is, when the law of is made dependent of the past observations of and . Thus the influence graph under a strategy is generally the same as that in an observational study; the difference is that, for a strategy, the dynamics of is known.
A simple randomized experiment is a particular case of intervention, where the dynamics of is made independent of both and (the simplest one is when takes a randomly chosen constant value); in that case the influence graph is as in Figure 1b, analogous to the application of Pearl’s “dooperator” (Pearl, 2000). We assume that we cannot manipulate which has its own independent dynamics; however we could still consider situations where would have a different dynamics (or a different distribution in the case where is a simple random variable), for instance if we apply the model to a different population.
3 Risk functions
In this section (Section 3), we drop the index for simplicity of notation, and we work with rather than .
3.1 Assessing whether there is a causal effect
The first conclusion that we can derive from the knowledge of the compensator of is that if , then influences . In our example this can of course be seen easily by looking at Equation (3). If , influences (), and if the system is NUC for , this influence is causal. This is a conclusion of explanatory nature.
3.2 Definition of effects as risk functions
3.2.1 Generic effects
If there is a causal influence of on we wish to quantify it. The effect of a treatment regime can be defined as a contrast between risk functions or utility functions of this regime and a reference regime. A risk function is the expectation of a loss function; minus a loss function is a utility function and its expectation is the expected utility; the two formulations are equivalent; here we adopt the formulation in terms of risk function. For instance, if our loss function is , taking the difference of risk functions as the contrast, we can define an effect as
where in intervention “int1” the treatment is given during the whole period while in intervention “int0”, the treatment is never given. This kind of effect is often estimated in simple randomized trials where there is a “treatment arm” and a “placebo arm”. However, a much more general definition is possible, and this will be useful for designing strategies (see Section 4).
3.2.2 Effects of strategies: risk functions
A general loss function is : in general , the loss function reflects a compromise between improving the trajectory of and reducing the burden of the therapeutic protocol (treatment dose and number of visits). Risk functions of an intervention are expectations of the loss function in the probability defined by the intervention “int”.

The marginal risk is ;

The conditional risks:
It is clear that . Note that even if the loss function does not depend on , marginal and conditional risks are in general different. Also, optimizing for all leads to optimizing .
Examples of simple loss functions are and , in which case the risk is and , respectively. More complicated loss functions could include values of the whole trajectory of and a cost of treatment (both financial and related to toxicity and quality of life).
A rather general additive risk function for the marker and the treatment is:
(7) 
Particular cases are obtained by taking , and or :
(8) 
or
(9) 
Two interventions can be compared by contrasting their risk functions; it is of course interesting to find the best strategies (see Section 4).
3.3 Examples
In our toy example, because of the linearity of the compensator in (3), the marginal and conditional (additive) generic effects are the same. For instance the additive contrast (3.2.1) is . Beware that this does not mean that we can estimate these effects with a marginal regression model.
On the other hand, conditional and marginal effects are in general different for strategies. Consider the viral load containment strategy. It can be seen that , where
is the cdf of the standard normal distribution. Thus the
, and hence the risk function, are nonlinear functions of the random effect and of , leading to different conditional and marginal effects. Note also that the adaptive strategies give in general different treatment regimes for different subjects, leading to “personalized medicine”; this will be developed in Section 5.3.4 Designing strategies
4.1 Optimal strategies
Knowledge of the physical law of can be used to design adaptive interventions, that is, strategies. Once the risk function is chosen one may try to find the best strategy, that is one which minimizes this risk. This is a special case of optimal control. Optimal control has been well developed, in particular in automatics (Bertsekas et al., 1995); an application for optimizing antiretroviral treatment has been given by Kirschner et al. (1997).
However, there are several difficulties for a realistic application of optimal control. The first difficulty is that the treatment strategies are restricted in practice. One of the most important restriction is that the treatment can be changed only at discrete visittimes. The second difficulty is that optimal control may be computationally intractable if there are an even moderately large number of visittimes. A third difficulty is that the “physical law” is unknown in practice, so it must be updated at each visit time: see Section 5. Thus, optimal control is most often impractical in this context.
4.2 Optimal parametric strategies
A possibility is to restrict the space of strategies to parametric forms. For instance a parametric family of strategies is:
(10) 
Note that strategies do not need to be random and deterministic strategies make more sense for clinical use. A parametric family of deterministic strategies is:
(11) 
If we are able to compute for all strategies belonging to a parametric class, we can use a minimization algorithm to choose the best strategy in that class.
4.3 Predictionbased strategies
A practical approach to adaptive regimes is to possibly change the treatment at visittimes in a way to optimize a risk function defined at a not too far future and not depending on future treatment, and updating the risk function at each visittime (rather than defining it once for all). In our example, we can compute the distribution of given which allows computing any risk function depending on this distribution and of the treatment given at . In our example where is the viral load, a possible risk function is based on , if it is admitted that under a certain level, the infection is controlled; for instance one speaks of virological failure for HIV infected patients if the viral load is above the detection limit of 50 copies per ml. The rule could then be:
(13) 
where is a small number, say : in words, if it is not very likely that without treatment the viral load will remain under the threshold , then give the treatment. We shall call this strategy “the viral load containment strategy”.
A more realistic example was given in Prague et al. (2012) where the dose of antiviral treatment can be changed at each visittime to optimize a risk function defined at the next visittime; see Section 5.
Another type of strategies is to optimize the next visittime. An example is given in the work of Villain et al. (2019) where the next visit time for IL7 injection is decided as a function of previous observations. More generally, we can define both the updating of the treatment and the next visit time.
4.3.1 Combining predictionbased and parametric strategies
It is possible to include predictions into parametric families of strategies, leading possibly to better strategies. For instance we could consider the deterministic parametric strategy obtained from Equation (13) by parameterizing :
(14) 
4.4 Computation of risk functions for a parametric strategy
For a nonrandom treatment regime the risk is easy to compute. For strategies, however, it is difficult, or impossible, to analytically solve the stochastic differential equations defined by the system, the observation and the strategy. The risk can still be computed by simulation. The conditional risk must be computed for a particular patient for whom we observe the value of , say ; here again we drop the subscript . The strategy is specified by a parameter such as in Equations (12) or (14). So, at time we observe and , and we wish to compute the conditional risk . For system and observation equations (4) and (5) the algorithm can be as follows: For to :

at time we apply the strategy to choose the value of ;

Conditional on has the distribution ; draw a value from this distribution;

generate a value for by drawing from and go to step 1
This is done for instance times and then one can compute an approximate version of risk (9) by . We call this algorithm of simulation for known parameters the SKP algorithm.
5 Inference
5.1 Estimation of the law
If we have a wellspecified model for the law of , and if we have enough observations from an observational study, we can estimate this law, for instance by maximum likelihood. In particular we can estimate the compensator of , , in the filtration . This can be done without modeling the law of and if these processes are observed (Commenges and GégoutPetit, 2015). Then the estimates of the densities can be plugged in the formulas for the risk to obtain estimates of the risks of different interventions. A Bayesian approach would allow taking into account the uncertainty about the parameters by integrating them out.
As an example, dynamical models have been proposed by Prague et al. (2017) and marginal effects computed by simulation.
5.2 Observe the marginal law directly by experimentation
The influence graph of a simple experiment is displayed in Figure 1 (b); it is similar to the “do operator” of Pearl (2000) and this approach has been formalized by Arjas (2012). In this experimental setting, the law of the three variables is given by a probability measure ; this is a particular type of intervention where does not influence , that is . If we have observations from an experiment for different values of , we can estimate the marginal distribution of given , and hence that of , given . The simplest experiment uses fixed values of : , and subjects have value , subjects have value ; the values are attributed by randomization, which ensures that they are not influenced by any confounding factor. However, nothing prevents doing an experiment for comparing two (or more) adaptive strategies; this allows directly estimating the marginal risk for the interventions tried.
The great advantage of experiments is that we observe directly the marginal law of given . Thus we are free from the NUC assumption. However, there are several important limitations (feasibility, selected subjects, short followup, noncompliance), which implies that we still need to analyze observational data. The other advantage is that it is not necessary to estimate the physical law if we are only interested in the marginal risk.
5.3 Estimate the physical law then make estimation for any intervention
In our example, assuming that the true values for subject are given by Equation (3), a well specified model for the observations may be:
(15) 
with , , , , , and with and
normal random effects with standard deviations
and , respectively. Parameters for this mixed linear model can be estimated by maximum likelihood from observational data, as is conventional. Note that the estimation uses a probability conditional on so that there is no need to model the dynamics of . Once we have estimated the parameters of the physical law of , we can estimate the conditional and marginal effects of any intervention.5.4 Optimizing strategies when the parameters are unknown
The simplest way is to plugin the maximum likelihood estimates into the computation of the risk by the SKP algorithm.
A more elaborate approach is to take the uncertainty about parameters into account via a Bayesian approach. This has been done for predictionbased strategies by Prague et al. (2012) and Villain et al. (2019). A way to extend the optimal parametric strategies so as to take parameter uncertainty into account is to include this uncertainty in the computation of the risk. If there were no random effect the algorithm can be a variation of the SKP algorithm where we include in the simulation a drawing of the parameter values from their posterior distribution; we call it simulation from posterior distribution of parameters (SPDP).
Moreover, there is the possibility of taking into account the cumulative information that we collect on the random parameters. The threshold that specifies the decision rule should then change at each visit time where more information is available. As a consequence we cannot estimate the risk for each fixed value of and then optimize once for all, but we have to optimize at each visit time. This leads not simply to a change of algorithm but to a change of strategy. We call this strategy “dynamic threshold decision rule (DTDR).
The algorithm can be as follows: For to :

compute the posterior distribution of the random parameters based on observations up to ;

at time optimize using the SPDP algorithm; let the optimum value;

apply decision rule with the threshold to define ;

observe and go to step 1;
The posterior distribution can be computed by an MCMC algorithm as in Prague et al. (2012) or by a Laplace approximation (Rue et al., 2009)
or simply by using the asymptotic distribution of the penalized maximum likelihood estimators which is justified by the Bernsteinvon Mises Theorem
(van der Vaart et al., 2004) and was used by Prague et al. (2013).6 Illustration
6.1 Simulated data
We simulated observations of samples of N subjects from a system obeying a dynamics similar to Equation (3) except that we had both a continuous variable and a discrete variable :
(16) 
with , and . and were, respectively, continuous and binary observed covariates. We set , , , , and . The parameters and depend on the setting of simulation and on whether we assume the individual parameters known or estimated. The observation equation was
(17) 
with , and . A wellspecified model for the observations is similar to Equation (15), except for the addition of the variable . Figure 2 shows the data generated for subjects under three scenarios: treatment never initiated, treatment initiated when the observed Z is higher than 0, treatment initiated right after baseline for all patients.
We simulated data as they could be found in an observational study of HIV infected patient. In real life, treatment is initiated depending on multiple factors including patients covariates and observed biomarker values, and once initiated, the treatment is most of the time not interrupted. For the simulation, we used a logistic regression such that for patient
and visit numbers , . The visittimes were taken as so that the the last visit time was . This model leads to 66.7% of patientstime under treatment.6.2 Estimation of parameters from simulated data
The parameters, including the causal parameter , can be estimated from a real data set by maximum likelihood (ML). We estimated the parameters of our simulated model using the R package covBM which allows to find ML estimates for a mixed linear model with Brownian motion (Stirrup, 2017) using the function lmeBM. Code is available in supplementary material. We performed 1000 replicated estimations on datasets including N=1000 patients. Table 1 summarizes the results and shows that there is essentially no bias in estimation of the parameters of the model, empirical and estimated standard deviation are overall similar and coverage are kept near their nominal value of 95% (in accordance with the ML theory for well specified models). Overall, we confirm that the causal parameter can be estimated by maximum likelihood without needing a treatment attribution model assuming model for (16) and (17) is well specified.
Parameters  Bias  Sd.  Bootstrap Sd.  coverage 

0.0015  0.0352  0.0357  94.1%  
0.0012  0.0490  0.0500  94.8%  
0.0006  0.0266  0.0272  94.8%  
0.0003  0.0547  0.0559  94.9%  
0.0011  0.0486  0.0539  94.5%  
0.0004    0.0339  94.3%  
0.0003    0.0373  94.9%  
0.0008    0.0286  95.0%  
0.0045    0.0466  95.1% 
6.3 Simulation of optimal strategy for a given individual assuming that his parameters are known
In this section we examine strategies where treatment can be temporarily interrupted, and we evaluate them by an approximate version of the risk function (9) with . We will examine parametric strategies given by Equation (12). The parameters of the model are supposed known: here, we set the values to the one used for the simulations. We investigate the optimal strategy for 4 different patients: , , , . We assumed the parameters for these patients perfectly known so that . Table 2 summarizes the results for multiple values of . We observe that the larger , the higher the threshold for treatment initiation. The overall cost is an increasing function of . Finally, we observe as expected that for any given the overall cost is higher for individual with steeper increase in their biomarker , see comparison for patient versus . The strategy is such that for lower the optimal strategy is to treat a lot of patient. When the treatment burden gets higher, i.e. is larger, the strategy progressively tends to “never treat”.
Cost Y  Cost Trt  % Treated  Cost Y  Cost Trt  % Treated  
0.000  9.058  0.015  0.000  0.015  0.558  11.565  0.052  0.000  0.058  0.758 
0.100  4.040  0.016  0.038  0.054  0.422  5.681  0.053  0.062  0.128  0.624 
0.200  3.131  0.019  0.072  0.090  0.397  4.304  0.058  0.118  0.195  0.590 
0.300  2.715  0.021  0.104  0.126  0.387  3.934  0.060  0.174  0.261  0.581 
0.400  2.708  0.022  0.139  0.161  0.387  3.637  0.063  0.230  0.325  0.574 
0.500  2.362  0.024  0.169  0.193  0.376  3.584  0.064  0.286  0.389  0.572 
0.600  2.056  0.028  0.198  0.226  0.367  3.590  0.063  0.344  0.452  0.573 
0.700  1.875  0.032  0.227  0.259  0.361  3.384  0.066  0.397  0.515  0.568 
0.800  1.560  0.039  0.254  0.292  0.352  3.037  0.072  0.447  0.576  0.559 
0.900  1.405  0.043  0.281  0.324  0.347  3.039  0.072  0.503  0.639  0.559 
1.000  1.273  0.047  0.308  0.355  0.342  2.703  0.079  0.552  0.701  0.552 
1.500  1.148  0.051  0.458  0.509  0.339  38.308  0.737  0.000  0.819  0.000 
3.000  31.284  0.546  0.000  0.546  0.000  38.308  0.737  0.000  0.819  0.000 
0.000  7.508  0.010  0.000  0.010  0.450  11.565  0.037  0.000  0.037  0.781 
0.100  3.313  0.012  0.030  0.042  0.336  4.154  0.040  0.052  0.092  0.583 
0.200  3.015  0.013  0.059  0.072  0.328  3.561  0.043  0.102  0.145  0.566 
0.300  2.762  0.015  0.087  0.101  0.321  3.427  0.043  0.152  0.195  0.562 
0.400  2.175  0.018  0.110  0.129  0.307  3.306  0.044  0.201  0.246  0.560 
0.500  1.972  0.020  0.135  0.155  0.301  3.388  0.044  0.253  0.296  0.561 
0.600  1.903  0.021  0.161  0.182  0.299  3.036  0.048  0.298  0.346  0.551 
0.700  1.712  0.023  0.186  0.209  0.295  2.973  0.050  0.346  0.395  0.549 
0.800  1.434  0.028  0.207  0.235  0.288  2.812  0.052  0.393  0.445  0.546 
0  0.260  0.272  2.548  0.057  0.436  0.493  0.538  
1.000  0.872  0.040  0.244  0.284  0.271  2.517  0.057  0.484  0.541  0.538 
1.500  0.264  0.064  0.343  0.407  0.254  38.433  0.697  0.000  0.697  0.000 
3.000  31.284  0.446  0.000  0.446  0.000  38.433  0.697  0.000  0.697  0.000 
6.4 Simulation of optimal strategy for a given individual assuming that his individual parameters are unknown
In this section we assume that we know the population parameters of the model, but that the parameters with random effects are only known with uncertainty; that is we know that and come from a normal distribution with known expectation and standard deviations and , respectively. Thus this is an illustration of the SPDP algorithm. Table 3 summarizes the results for multiple values of . It is very similar to results obtained with the SKP algorithm presented in table 2. Figure 3 shows for patient 4 the risk function for multiple values of according to the value of . We see that the overall cost to optimize has a local and a global minimum for average values of . As expected, the cost in is an increasing function of the threshold and the cost in treatment and the percentage of individual under treatment is a decreasing function of the threshold . Figure 4 shows for the four patients (, , , ) the threshold value for treatment initiation, the multiple risks and the percentage of individual under treatment, according to various values of omega. We notice that the overall risk is a continuous function of suggesting that there is a threshold in from which it is better not to treat at all rather. We find again that optimal is higher for individuals with steeper increase in their biomarker , see comparison for patient versus .
Cost Y  Cost Trt  % Treated  Cost Y  Cost Trt  % Treated  
0.000  11.565  0.023  0.000  0.023  0.621  9.058  0.070  0.000  0.070  0.770 
0.100  3.900  0.026  0.037  0.063  0.416  5.374  0.071  0.061  0.132  0.675 
0.200  3.573  0.027  0.073  0.100  0.406  4.085  0.076  0.115  0.191  0.640 
0.300  2.918  0.031  0.105  0.136  0.389  3.582  0.082  0.169  0.251  0.625 
0.400  2.872  0.031  0.139  0.170  0.387  3.901  0.077  0.229  0.306  0.635 
0.500  2.371  0.037  0.168  0.205  0.373  3.765  0.079  0.284  0.363  0.630 
0.600  1.992  0.041  0.196  0.237  0.363  3.566  0.082  0.337  0.420  0.625 
0.700  1.997  0.041  0.229  0.270  0.363  3.563  0.082  0.394  0.476  0.625 
0.800  1.917  0.043  0.260  0.303  0.361  2.568  0.098  0.431  0.530  0.599 
0.900  1.559  0.050  0.284  0.334  0.351  2.578  0.098  0.485  0.584  0.599 
1.000  1.478  0.052  0.314  0.366  0.349  2.486  0.101  0.537  0.638  0.597 
1.500  35.042  0.516  0.000  0.516  0.000  42.851  0.717  0.000  0.717  0.000 
3.000  35.042  0.516  0.000  0.516  0.000  42.851  0.717  0.000  0.717  0.000 
0.000  7.508  0.016  0.000  0.016  0.452  11.565  0.049  0.000  0.049  0.770 
0.100  3.260  0.018  0.030  0.049  0.335  4.721  0.051  0.053  0.105  0.591 
0.200  2.861  0.020  0.059  0.079  0.325  3.950  0.054  0.103  0.157  0.571 
0.300  2.693  0.021  0.087  0.108  0.322  3.646  0.056  0.152  0.208  0.563 
0.400  2.263  0.025  0.112  0.136  0.310  3.038  0.062  0.196  0.258  0.545 
0.500  2.052  0.027  0.137  0.164  0.304  3.152  0.061  0.247  0.307  0.549 
0.600  1.966  0.028  0.163  0.191  0.301  2.848  0.066  0.291  0.357  0.539 
0.700  1.291  0.040  0.178  0.218  0.283  2.844  0.066  0.339  0.405  0.539 
0.800  1.296  0.040  0.204  0.243  0.283  2.757  0.068  0.386  0.454  0.537 
0.900  1.084  0.044  0.224  0.268  0.277  2.417  0.074  0.427  0.502  0.527 
1.000  1.087  0.044  0.249  0.293  0.277  2.419  0.074  0.475  0.549  0.527 
1.500  0.652  0.054  0.360  0.414  0.267  38.035  0.672  0.000  0.672  0.000 
3.000  31.284  0.437  0.000  0.437  0.000  38.035  0.672  0.000  0.673  0.000 
6.5 Adaptive optimal
Now that we validated the properties of both the estimation and the optimal control strategy, it is possible to combine both so that we propose a strategy as described in the DTDR strategy. In our toyexample example it is easy to compute the likelihood of , which can be used for computing the posterior distribution of the parameters or for approximating it. We note that being obtained by summing normal variables has itself a normal distribution. So, we have only to compute its expectation and variance matrix. The expectations of the (the components of ) are easily computed from Equation (15). The computation of the variance (given the parameters) is also easy because we identify independent variables which are and the , and so, the variances are obtained by summing the variances of these variables, that is: ; the covariance between and comes from the covariance of the Brownian which is .
It is possible to neglect the information brought by the incoming observation of a particular subject on the fixed parameters , so that we can update only the distribution of the random parameters, . In that case, we can write . The priors for and , denoted and respectively, come from the observation of a sample, The likelihood of observation of a subject up to visit , involves but we neglect the variability due to the uncertainty of in this likelihood and make the computation at . The expectation of can be written so that we have ; thanks to the linearity in of the expectation, the posterior of
is normal (conjugate prior), so that we have just to examine the argument of the exponential part of the density to determine its expectation and variance which specify the posterior.
Applying the formulas of Appendix 8.2 to (with ) we find the expectation and variance of the posterior of :
7 Conclusion
We have shown how effects of simple and adaptive interventions could be defined and computed within the stochastic system approach to causality. More over we have shown that it is possible to design optimal parametric strategies. This was illustrated in a simulation study showing that the physical law could be learned from an observational study and that it would be possible to control an intermittent antiretroviral treatment in an adaptive way, based on a dynamical model. The simulation for the adaptive optimal strategy remains to do.
References
 Aalen et al. (2016) Aalen, O., K. Røysland, J. Gran, R. Kouyos, and T. Lange (2016). Can we believe the DAGs? A comment on the relationship between causal DAGs and mechanisms. Statistical methods in medical research 25(5), 2294–2314.
 Aalen (1987) Aalen, O. O. (1987). Dynamic modelling and causality. Scandinavian Actuarial Journal (34), 177–190.

Aalen (1989)
Aalen, O. O. (1989).
A linear regression model for the analysis of life times.
Statistics in medicine 8(8), 907–925.  Arjas (2012) Arjas, E. (2012). Causal inference from observational data: A Bayesian predictive approach. In C. Berzuini, P. Dawid, and L. Bernadinelli (Eds.), Causality: Statistical perspectives and applications. Wiley.
 Arjas and Parner (2004) Arjas, E. and J. Parner (2004). Causal reasoning from longitudinal data. Scandinavian Journal of Statistics 31(2), 171–187.
 Bertsekas et al. (1995) Bertsekas, D. P., D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas (1995). Dynamic programming and optimal control, Volume 1. Athena scientific Belmont, MA.
 Chakraborty and Murphy (2014) Chakraborty, B. and S. A. Murphy (2014). Dynamic treatment regimes. Annual review of statistics and its application 1, 447–464.
 Commenges (2019a) Commenges, D. (2019a). Causality without potential outcomes and the dynamic approach. arXiv preprint arXiv:1905.01195.
 Commenges (2019b) Commenges, D. (2019b). Dealing with death when studying disease or physiological marker: the stochastic system approach to causality. Lifetime Data Analysis 25, 381–405.
 Commenges and GégoutPetit (2009) Commenges, D. and A. GégoutPetit (2009, June). A general dynamical statistical model with causal interpretation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(3), 719–736.
 Commenges and GégoutPetit (2015) Commenges, D. and A. GégoutPetit (2015). The stochastic system approach for estimating dynamic treatments effect. Lifetime data analysis 21(4), 561–578.
 Commenges and JacqminGadda (2015) Commenges, D. and H. JacqminGadda (2015). Dynamical Biostatistical Models, Volume 86. CRC Press.
 Dawid (2000) Dawid, A. P. (2000). Causal inference without counterfactuals. Journal of the American Statistical Association 95(188), 407–424.
 Didelez (2008) Didelez, V. (2008, January). Graphical models for marked point processes based on local independence. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(1), 245–264.
 Granger (1969) Granger, C. W. J. (1969, August). Investigating causal relations by econometric models and crossspectral methods. Econometrica 37(3), 424.
 Hager et al. (2018) Hager, R., A. A. Tsiatis, and M. Davidian (2018). Optimal twostage dynamic treatment regimes from a classification perspective with censored survival data. Biometrics 74(4), 1180–1192.
 Heckman (2005) Heckman, J. J. (2005). The scientific model of causality. Sociological methodology 35(1), 1–97.
 Hernán et al. (2002) Hernán, M. A., B. A. Brumback, and J. M. Robins (2002, June). Estimating the causal effect of zidovudine on CD4 count with a marginal structural model for repeated measures. Statistics in Medicine 21(12), 1689–709.
 Hill (1965) Hill, B. (1965). The Environment and Disease: Association or Causation? Proceedings of the Royal Society of Medicine 58, 295–300.
 Holland (1986) Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association 81(396), 945–960.
 Kirschner et al. (1997) Kirschner, D., S. Lenhart, and S. Serbin (1997). Optimal control of the chemotherapy of HIV. Journal of mathematical biology 35(7), 775–792.
 Lau et al. (2019) Lau, J. S., M. Z. Smith, S. R. Lewin, and J. H. McMahon (2019). Clinical trials of antiretroviral treatment interruption in hivinfected individuals. Aids 33(5), 773–791.
 Murphy (2003) Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(2), 331–355.
 Pearl (2000) Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge Univ Press.
 Prague et al. (2012) Prague, M., D. Commenges, J. Drylewicz, and R. Thiébaut (2012). Treatment monitoring of HIVinfected patients based on mechanistic models. Biometrics 68, 902–911.
 Prague et al. (2017) Prague, M., D. Commenges, J. M. Gran, B. Ledergerber, J. Young, H. Furrer, and R. Thiébaut (2017). Dynamic models for estimating the effect of HAART on CD4 in observational studies: Application to the Aquitaine Cohort and the Swiss HIV Cohort Study. Biometrics 73, 294–304.

Prague et al. (2013)
Prague, M., D. Commenges, J. Guedj, J. Drylewicz, and R. Thiébaut (2013).
NIMROD: A program for inference via normal approximation of the posterior in models with random effects based on ordinary differential equations.
Computer Methods and Programs in Biomedicine 111, 447 – 458.  Robins and Hernán (2009) Robins, J. M. and M. A. Hernán (2009). Estimation of the causal effects of timevarying exposures. Longitudinal data analysis, 553–599.
 Robins et al. (2000) Robins, J. M., M. A. Hernán, and B. Brumback (2000, September). Marginal structural models and causal inference in epidemiology. Epidemiology (Cambridge, Mass.) 11(5), 550–60.
 Rubin (1974) Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology 66(5), 688.
 Rubin (2005) Rubin, D. B. (2005). Causal inference using potential outcomes. Journal of the American Statistical Association 100(469), 322–331.

Rue
et al. (2009)
Rue, H., S. Martino, and N. Chopin (2009).
Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.
Journal of the royal statistical society: Series B (statistical methodology) 71(2), 319–392.  Saarela et al. (2015) Saarela, O., E. Arjas, D. A. Stephens, and E. E. Moodie (2015). Predictive bayesian inference and dynamic treatment regimes. Biometrical Journal 57(6), 941–958.
 Schweder (1970) Schweder, T. (1970). Composable markov processes. Journal of applied probability 7(2), 400–410.
 Shen et al. (2017) Shen, J., L. Wang, and J. M. Taylor (2017). Estimation of the optimal regime in treatment of prostate cancer recurrence from observational data using flexible weighting models. Biometrics 73(2), 635–645.

SplawaNeyman (1990)
SplawaNeyman, J. (1990).
On the application of probability theory to agricultural experiments. Essay on principles. Section 9.
Statistical Science 5(4), 472–480.  Stirrup (2017) Stirrup, O. (2017). CovBM: incorporating Brownian motion components into ’nlme’ models. CRAN vignette.
 van der Vaart et al. (2004) van der Vaart, H., D. S. Postma, W. Timens, and N. H. T. ten Hacken (2004, August). Acute effects of cigarette smoke on inflammation and oxidative stress: a review. Thorax 59(8), 713–21.
 Villain et al. (2019) Villain, L., D. Commenges, C. Pasin, M. Prague, and R. Thiébaut (2019). Adaptive protocols based on predictions from a mechanistic model of the effect of IL7 on CD4 counts. Statistics in Medicine 38, 221–235.
8 Appendix
8.1 Computing effects of strategies: an analytical formula
We derive an analytical formula in the case where the visittimes are fixed and the loss function depends only on values of at visittimes. For simplicity we remove the factor . To compute any risk function we must compute the marginal distribution of under the intervention. This can be done by recurrence: knowing the marginal distribution of , compute the marginal distribution of . For making this computation, we use:

the physical law of , ,

the strategy .

the observation equation .
Theorem The recurrence equation is:
(18) 
That is, one goes from the marginal distribution at to that at by multiplying by the product of the conditional densities of observation, physical law and strategy.
Proof: We have that (because depends only on . Then (because the physical law does not involve the observation ). Finally, we have that (because the strategy does not depend on the true values given the observed values ).
When all the distributions are Gaussian, the computation can be done analytically.
For a general loss function at horizon , the associated risk is:
Except in the Gaussian case and with linear loss function, this computation will be intractable if there are many visit times; in that case, one must resort to simulation.
8.2 Conjugate normal priors
Let the prior for the parameter vector
of dimension be and the likelihood for the dimensional observation be , where is a matrix. Then the posterior is normal, and by identification of the quadratic forms we find thatand
Comments
There are no comments yet.