|
View:
New views
8 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Re: [R-sig-ME] lme nesting/interaction adviceI'm entering this discussion late so I may be discussing issues that
have already been addressed. As I understand it, Federico, you began by describing a model for data in which two factors have a fixed set of levels and one factor has an extensible, or "random", set of levels and you wanted to fit a model that you described as y ~ effect1 * effect2 * effect3 The problem is that this specification is not complete. An interaction of factors with fixed levels and a factor with random levels can mean, in the lmer specification, lmer(y ~ effect1 * effect2 + (1| effect3) + (1|effect1:effect2:effect3), ...) or lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...) or other variations. When you specify a random effect or an random interaction term you must, either explicitly or implicitly, specify the form of the variance-covariance matrix associated with those random effects. The "advantage" that other software may provide for you is that it chooses the model for you but that, of course, means that you only have the one choice. If you can describe how many variance components you think should be estimated in your model and what they would represent then I think it will be easier to describe how to fit the model. ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
|
Re: [R-sig-ME] lme nesting/interaction adviceOn 12 May 2008, at 17:09, Douglas Bates wrote:
> I'm entering this discussion late so I may be discussing issues that > have already been addressed. > > As I understand it, Federico, you began by describing a model for data > in which two factors have a fixed set of levels and one factor has an > extensible, or "random", set of levels and you wanted to fit a model > that you described as > > y ~ effect1 * effect2 * effect3 > > The problem is that this specification is not complete. My apologies for that, I thought that the above formula was the shorthand for what I would call the 'full' model, i.e. the single factors and the 2 and 3 ways interactions. > An > interaction of factors with fixed levels and a factor with random > levels can mean, in the lmer specification, > > lmer(y ~ effect1 * effect2 + (1| effect3) + (1| > effect1:effect2:effect3), ...) > > or > > lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...) > > or other variations. When you specify a random effect or an random > interaction term you must, either explicitly or implicitly, specify > the form of the variance-covariance matrix associated with those > random effects. I'll play around with this and see what I can get. > > The "advantage" that other software may provide for you is that it > chooses the model for you but that, of course, means that you only > have the one choice. I'm more than happy to stick to R, and to put more legwork into my models > > If you can describe how many variance components you think should be > estimated in your model and what they would represent then I think it > will be easier to describe how to fit the model. I'll work on that. Incidentally, what/where is the most comprehensive and up to date documentation for lme4? the pdfs coming with the package? I suspect knowing which are the right docs will help a lot in keeping me within the boundaries of civility and prevent me from annoying anyone (which is not something I sent forth to do on purpose). Best regards, Federico -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St. Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 75941602 Fax +44 (0)20 75943193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
|
Re: [R-sig-ME] lme nesting/interaction adviceOn Mon, May 12, 2008 at 11:22 AM, Federico Calboli
<f.calboli@...> wrote: > On 12 May 2008, at 17:09, Douglas Bates wrote: > >> I'm entering this discussion late so I may be discussing issues that >> have already been addressed. >> >> As I understand it, Federico, you began by describing a model for data >> in which two factors have a fixed set of levels and one factor has an >> extensible, or "random", set of levels and you wanted to fit a model >> that you described as >> >> y ~ effect1 * effect2 * effect3 >> >> The problem is that this specification is not complete. > > My apologies for that, I thought that the above formula was the shorthand > for what I would call the 'full' model, i.e. the single factors and the 2 > and 3 ways interactions. As I indicated, the trick is that the interaction of a fixed factor and a random factor can be defined in more than one way. It sounds as if what you want is lmer(y ~ factor1 * factor2 + (1|factor3) + (1|factor1:factor3) + (1|factor2:factor3) + (1|factor1:factor2:factor3), ...) but I'm not sure. >> An interaction of factors with fixed levels and a factor with random >> levels can mean, in the lmer specification, >> >> lmer(y ~ effect1 * effect2 + (1| effect3) + (1|effect1:effect2:effect3), >> ...) >> >> or >> >> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...) >> >> or other variations. When you specify a random effect or an random >> interaction term you must, either explicitly or implicitly, specify >> the form of the variance-covariance matrix associated with those >> random effects. > > I'll play around with this and see what I can get. >> >> The "advantage" that other software may provide for you is that it >> chooses the model for you but that, of course, means that you only >> have the one choice. > > I'm more than happy to stick to R, and to put more legwork into my models >> >> If you can describe how many variance components you think should be >> estimated in your model and what they would represent then I think it >> will be easier to describe how to fit the model. > > I'll work on that. Incidentally, what/where is the most comprehensive and up > to date documentation for lme4? the pdfs coming with the package? I suspect > knowing which are the right docs will help a lot in keeping me within the > boundaries of civility and prevent me from annoying anyone (which is not > something I sent forth to do on purpose). Documentation for lme4 is pretty sketchy at present. I hope to remedy that during our summer break. ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
|
Re: [R-sig-ME] lme nesting/interaction adviceOn 13/05/2008, at 4:09 AM, Douglas Bates wrote: > I'm entering this discussion late so I may be discussing issues that > have already been addressed. > > As I understand it, Federico, you began by describing a model for data > in which two factors have a fixed set of levels and one factor has an > extensible, or "random", set of levels and you wanted to fit a model > that you described as > > y ~ effect1 * effect2 * effect3 > > The problem is that this specification is not complete. At *last* (as Owl said to Rabbit) we're getting somewhere!!! I always knew that there was some basic fundamental point about this business that I (and I believe many others) were simply missing. But I could not for the life of me get anyone to explain to me what that point was. Or to put it another way, I was never able to frame a question that would illuminate just what it was that I wasn't getting. I now may be at a stage where I can start asking the right questions. > An interaction of factors with fixed levels and a factor with random > levels can mean, in the lmer specification, > > lmer(y ~ effect1 * effect2 + (1| effect3) + (1| > effect1:effect2:effect3), ...) > > or > > lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...) > > or other variations. When you specify a random effect or an random > interaction term you must, either explicitly or implicitly, specify > the form of the variance-covariance matrix associated with those > random effects. > > The "advantage" that other software may provide for you is that it > chooses the model for you but that, of course, means that you only > have the one choice. Now may I start asking what I hope are questions that will lift the fog a bit? Let us for specificity consider a three-way model with two fixed effects and one random effect from the good old Rothamstead style agricultural experiment context: Suppose we have a number of species/breeds of wheat (say) and a number of fertilizers. These are fixed effects. And we have a number of fields (blocks) --- a random effect. Each breed-fertilizer combination is applied a number of times in each field. We ***assume*** that that the field or block effect is homogeneous throughout. This may or may not be a ``good'' assumption, but it's not completely ridiculous and would often be made in practice. And probably *was* made at Rothamstead. The response would be something like yield in bushels per acre. The way that I would write the ``full'' model for this setting, in mathematical style is: Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k + (alpha.C)_ik + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl The alpha_i and beta_j are parameters corresponding to breed and fertilizer respectively; the C_k are random effects corresponding to fields or blocks. Any effect ``involving'' C is also random. The assumptions made by the Package-Which-Must-Not-Be-Named are (I think) that C_k ~ N(0,sigma_C^2) (alpha.C)_ik ~ N(0,sigma_aC^2) (beta.C)jk ~ N(0,sigma_bC^2) (alpha.beta.C)_ijk ~ N(0,sigma_abC^2) E_ijkl ~ N(0,sigma^2) and these random variables are *all independent*. Ahhhhhhhh ... perhaps I'm on the way to answering my own question. Is it this assumption of ``all independent'' which is questionable? It seemed innocent enough when I first learned about this stuff, lo these many years ago. But .... mmmmmaybe not! To start with: What would be the lmer syntax to fit the foregoing (possibly naive) model? I am sorry, but I really cannot get my head around the syntax of lmer model specification, and I've tried. I really have. Hard. I know I must be starting from the wrong place, but I haven't a clue as to what the *right* place to start from is. And if I'm in that boat, I will wager Euros to pretzels that there are others in it. I know that I'm not the brightest bulb in the chandelier, but I'm not the dullest either. Having got there: Presuming that I'm more-or-less on the right track in my foregoing conjecture that it's the over-simple dependence structure that is the problem with what's delivered by the Package-Which-Must- Not-Be-Named, how might one go about being less simple-minded? I.e. what might be some more realistic dependence structures, and how would one specify these in lmer? And how would one assess whether the assumed dependence structure gives a reasonable fit to the data? > If you can describe how many variance components you think should be > estimated in your model and what they would represent then I think it > will be easier to describe how to fit the model. How does this fit in with my conjecture (above) about what I've been missing all these years? Does it fit? How many variance components are there in the ``naive'' model? It looks like 5 to me ... but maybe I'm totally out to lunch in what I think I'm understanding at this stage. (And besides --- there are three sorts of statistician; those who can count, and those who can't.) Thank you for your indulgence. cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}} ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
|
Re: [R-sig-ME] lme nesting/interaction adviceOn Mon, May 12, 2008 at 5:39 PM, Rolf Turner <r.turner@...> wrote:
> > On 13/05/2008, at 4:09 AM, Douglas Bates wrote: > >> I'm entering this discussion late so I may be discussing issues that >> have already been addressed. >> >> As I understand it, Federico, you began by describing a model for data >> in which two factors have a fixed set of levels and one factor has an >> extensible, or "random", set of levels and you wanted to fit a model >> that you described as >> >> y ~ effect1 * effect2 * effect3 >> >> The problem is that this specification is not complete. > > At *last* (as Owl said to Rabbit) we're getting somewhere!!! > > I always knew that there was some basic fundamental point > about this business that I (and I believe many others) were > simply missing. But I could not for the life of me get anyone > to explain to me what that point was. Or to put it another > way, I was never able to frame a question that would illuminate > just what it was that I wasn't getting. > > I now may be at a stage where I can start asking the right > questions. > >> An interaction of factors with fixed levels and a factor with random >> levels can mean, in the lmer specification, >> >> lmer(y ~ effect1 * effect2 + (1| effect3) + (1|effect1:effect2:effect3), >> ...) >> >> or >> >> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...) >> >> or other variations. When you specify a random effect or an random >> interaction term you must, either explicitly or implicitly, specify >> the form of the variance-covariance matrix associated with those >> random effects. >> >> The "advantage" that other software may provide for you is that it >> chooses the model for you but that, of course, means that you only >> have the one choice. > > Now may I start asking what I hope are questions that will lift > the fog a bit? > > Let us for specificity consider a three-way model with two > fixed effects and one random effect from the good old Rothamstead > style > agricultural experiment context: Suppose we have a number of > species/breeds of wheat (say) and a number of fertilizers. > These are fixed effects. And we have a number of fields (blocks) > --- a random effect. Each breed-fertilizer combination is > applied a number of times in each field. We ***assume*** that > that the field or block effect is homogeneous throughout. This > may or may not be a ``good'' assumption, but it's not completely > ridiculous and would often be made in practice. And probably > *was* made at Rothamstead. The response would be something like > yield in bushels per acre. > > The way that I would write the ``full'' model for this setting, > in mathematical style is: > > Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k + (alpha.C)_ik > + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl > > The alpha_i and beta_j are parameters corresponding to breed and > fertilizer > respectively; the C_k are random effects corresponding to fields or > blocks. > Any effect ``involving'' C is also random. > > The assumptions made by the Package-Which-Must-Not-Be-Named are (I > think) > that > > C_k ~ N(0,sigma_C^2) > (alpha.C)_ik ~ N(0,sigma_aC^2) > (beta.C)jk ~ N(0,sigma_bC^2) > (alpha.beta.C)_ijk ~ N(0,sigma_abC^2) > E_ijkl ~ N(0,sigma^2) > > and these random variables are *all independent*. > > Ahhhhhhhh ... perhaps I'm on the way to answering my own question. > Is > it this assumption of ``all independent'' which is questionable? It > seemed innocent enough when I first learned about this stuff, lo > these > many years ago. But .... mmmmmaybe not! > > To start with: What would be the lmer syntax to fit the foregoing > (possibly naive) model? I am sorry, but I really cannot get my head > around the syntax of lmer model specification, and I've tried. I > really have. Hard. I know I must be starting from the wrong place, > but I haven't a clue as to what the *right* place to start from is. > And if I'm in that boat, I will wager Euros to pretzels that there > are others in it. I know that I'm not the brightest bulb in the > chandelier, but I'm not the dullest either. specification can be an extremely confusing area. Let's consider a set of models for the Machines data reproduced (from Milliken and Johnson) in Pinheiro and Bates and available in the MEMSS package. > library(lme4) Loading required package: Matrix Loading required package: lattice Attaching package: 'Matrix' The following object(s) are masked from package:stats : xtabs > data("Machines", package = "MEMSS") > str(Machines) 'data.frame': 54 obs. of 3 variables: $ Worker : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 4 ... $ Machine: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ... $ score : num 52 52.8 53.1 51.8 52.8 53.1 60 60.2 58.4 51.1 ... We consider the Machine factor to have a fixed set of levels in that we only consider these three machines. The levels of the Worker factor represent a sample from the set of potential operators. As you might imagine from this description, I now think of the distinction between "fixed" and "random" as being associated with the factor, not necessarily the "effects". If you plot these data > dotplot(reorder(Worker, score) ~ score, Machines, groups = Machine, type = c("g", "p", "a"), xlab = "Efficiency score", ylab = "Worker", auto.key = list(columns = 3, lines = TRUE)) (resulting PDF enclosed) you will see evidence of an interaction. That is, some workers have noticeably different score patterns on the three machines than do others. Worker 6 on machine B is the most striking example. One way to model this interaction is to say that there is a random effect for each worker and a separate random effect for each worker/machine combination. If the random effects for the worker/machine combinations are assumed to be independent with constant variance then one expresses the model as > print(fm1 <- lmer(score ~ Machine + (1|Worker) + (1|Machine:Worker), Machines), corr = FALSE) Linear mixed model fit by REML Formula: score ~ Machine + (1 | Worker) + (1 | Machine:Worker) Data: Machines AIC BIC logLik deviance REMLdev 227.7 239.6 -107.8 225.5 215.7 Random effects: Groups Name Variance Std.Dev. Machine:Worker (Intercept) 13.90963 3.72956 Worker (Intercept) 22.85528 4.78072 Residual 0.92464 0.96158 Number of obs: 54, groups: Machine:Worker, 18; Worker, 6 Fixed effects: Estimate Std. Error t value (Intercept) 52.356 2.486 21.062 MachineB 7.967 2.177 3.659 MachineC 13.917 2.177 6.393 An equivalent formulation is > print(fm1 <- lmer(score ~ Machine + (1|Worker/Machine), Machines), corr = FALSE) Linear mixed model fit by REML Formula: score ~ Machine + (1 | Worker/Machine) Data: Machines AIC BIC logLik deviance REMLdev 227.7 239.6 -107.8 225.5 215.7 Random effects: Groups Name Variance Std.Dev. Machine:Worker (Intercept) 13.90963 3.72956 Worker (Intercept) 22.85528 4.78072 Residual 0.92464 0.96158 Number of obs: 54, groups: Machine:Worker, 18; Worker, 6 Fixed effects: Estimate Std. Error t value (Intercept) 52.356 2.486 21.062 MachineB 7.967 2.177 3.659 MachineC 13.917 2.177 6.393 The expression (1|Worker/Machine) is just "syntactic sugar". It is expanded to (1|Worker) + (1|Machine:Worker) before the model matrices are created. If you want to start with the formula and see what that means for the model then use these rules: - a term including the '|' operator is a random effects term - if the left-hand side of the '|' operator is 1 then the random effects are scalar random effects, one for each level of the factor on the right of the '|' - random effects associated with different terms are independent - random effects associated with different levels of the factor within a term are independent - the variance of the random effects within the same term is constant However, there is another mixed-effects model that could make sense for these data. Suppose I consider the variations associated with each worker as a vector of length 3 (Machines A, B and C) with a symmetric, positive semidefinite 3 by 3 variance-covariance matrix. I fit that model as > print(fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines), corr = FALSE) Linear mixed model fit by REML Formula: score ~ Machine + (Machine | Worker) Data: Machines AIC BIC logLik deviance REMLdev 228.3 248.2 -104.2 216.6 208.3 Random effects: Groups Name Variance Std.Dev. Corr Worker (Intercept) 16.64051 4.07928 MachineB 34.54670 5.87764 0.484 MachineC 13.61398 3.68971 -0.365 0.297 Residual 0.92463 0.96158 Number of obs: 54, groups: Worker, 6 Fixed effects: Estimate Std. Error t value (Intercept) 52.356 1.681 31.151 MachineB 7.967 2.421 3.291 MachineC 13.917 1.540 9.037 It may be more meaningful to write it as > print(fm3 <- lmer(score ~ Machine + (0+Machine|Worker), Machines), corr = FALSE) Linear mixed model fit by REML Formula: score ~ Machine + (0 + Machine | Worker) Data: Machines AIC BIC logLik deviance REMLdev 228.3 248.2 -104.2 216.6 208.3 Random effects: Groups Name Variance Std.Dev. Corr Worker MachineA 16.64097 4.07933 MachineB 74.39557 8.62529 0.803 MachineC 19.26646 4.38936 0.623 0.771 Residual 0.92463 0.96158 Number of obs: 54, groups: Worker, 6 Fixed effects: Estimate Std. Error t value (Intercept) 52.356 1.681 31.150 MachineB 7.967 2.421 3.291 MachineC 13.917 1.540 9.037 Now we are fitting 3 variances and 3 covariances for the random effects instead of the previous models which had two variances. The difference in the models is exactly what made you pause - the simpler model assumes that, conditional on the random effect for the worker, the worker/machine random effects are independent and have constant variance. In the more general models the worker/machine interactions are allowed to be correlated within worker. It is more common to allow this kind of correlation within subject in models for longitudinal data (the Laird-Ware formulation) where each subject has a random effect for the intercept and a random effect for the slope with respect to time and these can be correlated. However, this type of representation can make sense with a factor on the left hand side of the '|' operator, like the Machine factor here. If that factor has a large number of levels then the model quickly becomes unwieldy because the number of variance-covariance parameters to estimate is quadratic in the number of levels of the factor on the lhs. I hope this helps. > Having got there: Presuming that I'm more-or-less on the right track > in my foregoing conjecture that it's the over-simple dependence > structure > that is the problem with what's delivered by the > Package-Which-Must-Not-Be-Named, > how might one go about being less simple-minded? I.e. what might be > some > more realistic dependence structures, and how would one specify these > in lmer? > And how would one assess whether the assumed dependence structure > gives > a reasonable fit to the data? > >> If you can describe how many variance components you think should be >> estimated in your model and what they would represent then I think it >> will be easier to describe how to fit the model. > > How does this fit in with my conjecture (above) about what I've been > missing all these years? Does it fit? How many variance components > are there in the ``naive'' model? It looks like 5 to me ... but > maybe > I'm totally out to lunch in what I think I'm understanding at this > stage. > (And besides --- there are three sorts of statistician; those who can > count, > and those who can't.) > > Thank you for your indulgence. > > cheers, > > Rolf Turner > > ###################################################################### > Attention:This e-mail message is privileged and confidential. If you are not > theintended recipient please delete the message and notify the sender.Any > views or opinions presented are solely those of the author. > > This e-mail has been scanned and cleared by > MailMarshalwww.marshalsoftware.com > ###################################################################### > ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
|
Re: [R-sig-ME] lme nesting/interaction adviceThanks very much for your long, detailed, patient, and lucid response to my cri de coeur. That helps a *great* deal. I'm not sure that I have a solid understanding of the issues yet --- I never am! --- but I think I'm getting there. I'll need to chew over the posting a bit more and try some examples before I feel comfortable. One point that I'd like to spell out very explicitly, to make sure that I'm starting from the right square: The model that you start with in the Machines/Workers example is >> fm1 <- lmer(score ~ Machine + (1|Worker) + (1|Machine:Worker), >> Machines) My understanding is that this is the ``only'' model that could be fitted by the Package-Which-Must-Not-Be-Named. I.e. this package *could not* fit the second, more complex model: >> fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines) (At least not directly.) Can you (or someone) confirm that I've got that right? It seems to me to be the key to why I've had such trouble in the past in grappling with mixed models in R. I.e. I've been thinking like the Package-Which-Must-Not-Be-Named --- that the simple, everything- independent model was the only possible model. Thanks again. cheers, Rolf ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}} ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
|
Re: [R-sig-ME] lme nesting/interaction adviceHi Rolf,
On Tue, May 13, 2008 at 1:59 PM, Rolf Turner <r.turner@...> wrote: < in response to Doug Bates' useful tutorial...> > Thanks very much for your long, detailed, patient, and lucid > response to my cri de coeur. That helps a *great* deal. > Hear Hear! <snip> > One point that I'd like to spell out very explicitly, to make sure > that I'm starting from the right square: > > The model that you start with in the Machines/Workers example is > > > > > > > > fm1 <- lmer(score ~ Machine + (1|Worker) + (1|Machine:Worker), Machines) > > > > > > > > My understanding is that this is the ``only'' model that could be fitted > by the Package-Which-Must-Not-Be-Named. I.e. this package *could not* fit > the second, more complex model: > > > > > > > > fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines) > > > > > > > (At least not directly.) Can you (or someone) confirm that I've got that > right? Compare: ## R > m4 Linear mixed model fit by REML Formula: score ~ Machine + (0 + Machine | Worker) Data: Machines AIC BIC logLik deviance REMLdev 228.3 248.2 -104.2 216.6 208.3 Random effects: Groups Name Variance Std.Dev. Corr Worker MachineA 16.64098 4.07934 MachineB 74.39558 8.62529 0.803 MachineC 19.26646 4.38936 0.623 0.771 Residual 0.92463 0.96158 Number of obs: 54, groups: Worker, 6 ## "The-Package" proc mixed data = machines; class worker machine; model score = machine / solution; random machine / subject = worker type = un; run; Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) Worker 16.6405 UN(2,1) Worker 28.2447 UN(2,2) Worker 74.3956 UN(3,1) Worker 11.1465 UN(3,2) Worker 29.1841 UN(3,3) Worker 19.2675 Residual 0.9246 The two outputs report essentially the same thing. Note that e.g, UN(2,1) = 28.2477 approx= .803*4.07934*8.62529 (And, as usual, the fixed effects estimates match too once the contrasts and 'types' of SS for an ANOVA table are set up) UN is short for 'unstructured' - a term Doug has pointed out is not particularly fitting because the covariance matrix is symmetric positive definite. > > It seems to me to be the key to why I've had such trouble in the past > in grappling with mixed models in R. I.e. I've been thinking like > the Package-Which-Must-Not-Be-Named --- that the simple, > everything-independent > model was the only possible model. > Although this may well not apply to you, another area of confusion arises not so much from differences between stats packages but by differences between methods. I'm not an expert in the estimation methods but, as I understand it, classic texts describe fitting mixed models in terms of ANOVA in the OLS framework, calculating method of moments estimators for the variances of the random effects by equating observed and expected mean squares (I believe using aov and lm with an 'Error' term would fall into this category, and proc anova and proc glm would also). Starting in the 90's these methods started falling out of fashion in favor of ML/REML/GLS methods (likelihood based), which offer more flexibility in structuring both the error and random effects covariance matrices, will not produce negative variance estimates, and have other nice properties that someone more 'mathy' than me could explain. Tools like lme, lmer, proc mixed and proc glimmix fall into this category. hoping this helps, Kingsford Jones > Thanks again. > > cheers, > > Rolf > > > > ###################################################################### > Attention:\ This e-mail message is privileged and confid...{{dropped:9}} > > ______________________________________________ > R-help@... mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
|
Re: [R-sig-ME] lme nesting/interaction adviceI've been looking back over this discussion.
Another model that one can fit using lme is: > lme(score~Machine, random=list(Worker=pdIdent(~0+Machine)), + weights=varIdent(form=~1|Machine), data=Machines) Linear mixed-effects model fit by REML Data: Machines Log-restricted-likelihood: -108.9547 Fixed: score ~ Machine (Intercept) MachineB MachineC 52.355556 7.966667 13.916667 Random effects: Formula: ~0 + Machine | Worker Structure: Multiple of an Identity MachineA MachineB MachineC Residual StdDev: 6.06209 6.06209 6.06209 1.148581 Variance function: Structure: Different standard deviations per stratum Formula: ~1 | Machine Parameter estimates: A B C 1.0000000 0.8713263 0.5859709 Number of Observations: 54 Number of Groups: 6 This insists (I think) that conditional on the random effect for the worker, the worker/machine random effects be independent, but allows them to have different variances. I am wondering whether it is possible to fit such a model using lmer(). [In this example the large estimated correlations suggest that it is not a sensible model.] John Maindonald email: john.maindonald@... phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 14 May 2008, at 2:49 AM, Douglas Bates wrote: > On Mon, May 12, 2008 at 5:39 PM, Rolf Turner > <r.turner@...> wrote: >> >> On 13/05/2008, at 4:09 AM, Douglas Bates wrote: >> >>> I'm entering this discussion late so I may be discussing issues that >>> have already been addressed. >>> >>> As I understand it, Federico, you began by describing a model for >>> data >>> in which two factors have a fixed set of levels and one factor has >>> an >>> extensible, or "random", set of levels and you wanted to fit a model >>> that you described as >>> >>> y ~ effect1 * effect2 * effect3 >>> >>> The problem is that this specification is not complete. >> >> At *last* (as Owl said to Rabbit) we're getting somewhere!!! >> >> I always knew that there was some basic fundamental point >> about this business that I (and I believe many others) were >> simply missing. But I could not for the life of me get anyone >> to explain to me what that point was. Or to put it another >> way, I was never able to frame a question that would illuminate >> just what it was that I wasn't getting. >> >> I now may be at a stage where I can start asking the right >> questions. >> >>> An interaction of factors with fixed levels and a factor with random >>> levels can mean, in the lmer specification, >>> >>> lmer(y ~ effect1 * effect2 + (1| effect3) + (1| >>> effect1:effect2:effect3), >>> ...) >>> >>> or >>> >>> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...) >>> >>> or other variations. When you specify a random effect or an random >>> interaction term you must, either explicitly or implicitly, specify >>> the form of the variance-covariance matrix associated with those >>> random effects. >>> >>> The "advantage" that other software may provide for you is that it >>> chooses the model for you but that, of course, means that you only >>> have the one choice. >> >> Now may I start asking what I hope are questions that will lift >> the fog a bit? >> >> Let us for specificity consider a three-way model with two >> fixed effects and one random effect from the good old >> Rothamstead >> style >> agricultural experiment context: Suppose we have a number of >> species/breeds of wheat (say) and a number of fertilizers. >> These are fixed effects. And we have a number of fields >> (blocks) >> --- a random effect. Each breed-fertilizer combination is >> applied a number of times in each field. We ***assume*** that >> that the field or block effect is homogeneous throughout. This >> may or may not be a ``good'' assumption, but it's not completely >> ridiculous and would often be made in practice. And probably >> *was* made at Rothamstead. The response would be something like >> yield in bushels per acre. >> >> The way that I would write the ``full'' model for this setting, >> in mathematical style is: >> >> Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k + >> (alpha.C)_ik >> + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl >> >> The alpha_i and beta_j are parameters corresponding to breed and >> fertilizer >> respectively; the C_k are random effects corresponding to >> fields or >> blocks. >> Any effect ``involving'' C is also random. >> >> The assumptions made by the Package-Which-Must-Not-Be-Named >> are (I >> think) >> that >> >> C_k ~ N(0,sigma_C^2) >> (alpha.C)_ik ~ N(0,sigma_aC^2) >> (beta.C)jk ~ N(0,sigma_bC^2) >> (alpha.beta.C)_ijk ~ N(0,sigma_abC^2) >> E_ijkl ~ N(0,sigma^2) >> >> and these random variables are *all independent*. >> >> Ahhhhhhhh ... perhaps I'm on the way to answering my own >> question. >> Is >> it this assumption of ``all independent'' which is >> questionable? It >> seemed innocent enough when I first learned about this stuff, lo >> these >> many years ago. But .... mmmmmaybe not! >> >> To start with: What would be the lmer syntax to fit the >> foregoing >> (possibly naive) model? I am sorry, but I really cannot get >> my head >> around the syntax of lmer model specification, and I've >> tried. I >> really have. Hard. I know I must be starting from the wrong >> place, >> but I haven't a clue as to what the *right* place to start >> from is. >> And if I'm in that boat, I will wager Euros to pretzels that >> there >> are others in it. I know that I'm not the brightest bulb in the >> chandelier, but I'm not the dullest either. > > Thanks for the questions, Rolf. I completely agree that mixed model > specification can be an extremely confusing area. > > Let's consider a set of models for the Machines data reproduced (from > Milliken and Johnson) in Pinheiro and Bates and available in the MEMSS > package. > >> library(lme4) > Loading required package: Matrix > Loading required package: lattice > > Attaching package: 'Matrix' > > > The following object(s) are masked from package:stats : > > xtabs >> data("Machines", package = "MEMSS") >> str(Machines) > 'data.frame': 54 obs. of 3 variables: > $ Worker : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 > 4 ... > $ Machine: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ... > $ score : num 52 52.8 53.1 51.8 52.8 53.1 60 60.2 58.4 51.1 ... > > We consider the Machine factor to have a fixed set of levels in that > we only consider these three machines. The levels of the Worker > factor represent a sample from the set of potential operators. As you > might imagine from this description, I now think of the distinction > between "fixed" and "random" as being associated with the factor, not > necessarily the "effects". > > If you plot these data >> dotplot(reorder(Worker, score) ~ score, Machines, groups = Machine, >> type = c("g", "p", "a"), xlab = "Efficiency score", ylab = >> "Worker", auto.key = list(columns = 3, lines = TRUE)) > > (resulting PDF enclosed) you will see evidence of an interaction. > That is, some workers have noticeably different score patterns on the > three machines than do others. Worker 6 on machine B is the most > striking example. > > One way to model this interaction is to say that there is a random > effect for each worker and a separate random effect for each > worker/machine combination. If the random effects for the > worker/machine combinations are assumed to be independent with > constant variance then one expresses the model as > >> print(fm1 <- lmer(score ~ Machine + (1|Worker) + (1| >> Machine:Worker), Machines), corr = FALSE) > Linear mixed model fit by REML > Formula: score ~ Machine + (1 | Worker) + (1 | Machine:Worker) > Data: Machines > AIC BIC logLik deviance REMLdev > 227.7 239.6 -107.8 225.5 215.7 > Random effects: > Groups Name Variance Std.Dev. > Machine:Worker (Intercept) 13.90963 3.72956 > Worker (Intercept) 22.85528 4.78072 > Residual 0.92464 0.96158 > Number of obs: 54, groups: Machine:Worker, 18; Worker, 6 > > Fixed effects: > Estimate Std. Error t value > (Intercept) 52.356 2.486 21.062 > MachineB 7.967 2.177 3.659 > MachineC 13.917 2.177 6.393 > > An equivalent formulation is >> print(fm1 <- lmer(score ~ Machine + (1|Worker/Machine), Machines), >> corr = FALSE) > Linear mixed model fit by REML > Formula: score ~ Machine + (1 | Worker/Machine) > Data: Machines > AIC BIC logLik deviance REMLdev > 227.7 239.6 -107.8 225.5 215.7 > Random effects: > Groups Name Variance Std.Dev. > Machine:Worker (Intercept) 13.90963 3.72956 > Worker (Intercept) 22.85528 4.78072 > Residual 0.92464 0.96158 > Number of obs: 54, groups: Machine:Worker, 18; Worker, 6 > > Fixed effects: > Estimate Std. Error t value > (Intercept) 52.356 2.486 21.062 > MachineB 7.967 2.177 3.659 > MachineC 13.917 2.177 6.393 > > The expression (1|Worker/Machine) is just "syntactic sugar". It is > expanded to (1|Worker) + (1|Machine:Worker) before the model matrices > are created. > > If you want to start with the formula and see what that means for the > model then use these rules: > > - a term including the '|' operator is a random effects term > - if the left-hand side of the '|' operator is 1 then the random > effects are scalar random effects, one for each level of the factor on > the right of the '|' > - random effects associated with different terms are independent > - random effects associated with different levels of the factor within > a term are independent > - the variance of the random effects within the same term is constant > > However, there is another mixed-effects model that could make sense > for these data. Suppose I consider the variations associated with > each worker as a vector of length 3 (Machines A, B and C) with a > symmetric, positive semidefinite 3 by 3 variance-covariance matrix. I > fit that model as > >> print(fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines), >> corr = FALSE) > Linear mixed model fit by REML > Formula: score ~ Machine + (Machine | Worker) > Data: Machines > AIC BIC logLik deviance REMLdev > 228.3 248.2 -104.2 216.6 208.3 > Random effects: > Groups Name Variance Std.Dev. Corr > Worker (Intercept) 16.64051 4.07928 > MachineB 34.54670 5.87764 0.484 > MachineC 13.61398 3.68971 -0.365 0.297 > Residual 0.92463 0.96158 > Number of obs: 54, groups: Worker, 6 > > Fixed effects: > Estimate Std. Error t value > (Intercept) 52.356 1.681 31.151 > MachineB 7.967 2.421 3.291 > MachineC 13.917 1.540 9.037 > > It may be more meaningful to write it as > >> print(fm3 <- lmer(score ~ Machine + (0+Machine|Worker), Machines), >> corr = FALSE) > Linear mixed model fit by REML > Formula: score ~ Machine + (0 + Machine | Worker) > Data: Machines > AIC BIC logLik deviance REMLdev > 228.3 248.2 -104.2 216.6 208.3 > Random effects: > Groups Name Variance Std.Dev. Corr > Worker MachineA 16.64097 4.07933 > MachineB 74.39557 8.62529 0.803 > MachineC 19.26646 4.38936 0.623 0.771 > Residual 0.92463 0.96158 > Number of obs: 54, groups: Worker, 6 > > Fixed effects: > Estimate Std. Error t value > (Intercept) 52.356 1.681 31.150 > MachineB 7.967 2.421 3.291 > MachineC 13.917 1.540 9.037 > > Now we are fitting 3 variances and 3 covariances for the random > effects instead of the previous models which had two variances. The > difference in the models is exactly what made you pause - the simpler > model assumes that, conditional on the random effect for the worker, > the worker/machine random effects are independent and have constant > variance. In the more general models the worker/machine interactions > are allowed to be correlated within worker. > > It is more common to allow this kind of correlation within subject in > models for longitudinal data (the Laird-Ware formulation) where each > subject has a random effect for the intercept and a random effect for > the slope with respect to time and these can be correlated. However, > this type of representation can make sense with a factor on the left > hand side of the '|' operator, like the Machine factor here. If that > factor has a large number of levels then the model quickly becomes > unwieldy because the number of variance-covariance parameters to > estimate is quadratic in the number of levels of the factor on the > lhs. > > I hope this helps. > >> Having got there: Presuming that I'm more-or-less on the >> right track >> in my foregoing conjecture that it's the over-simple dependence >> structure >> that is the problem with what's delivered by the >> Package-Which-Must-Not-Be-Named, >> how might one go about being less simple-minded? I.e. what >> might be >> some >> more realistic dependence structures, and how would one >> specify these >> in lmer? >> And how would one assess whether the assumed dependence >> structure >> gives >> a reasonable fit to the data? >> >>> If you can describe how many variance components you think should be >>> estimated in your model and what they would represent then I think >>> it >>> will be easier to describe how to fit the model. >> >> How does this fit in with my conjecture (above) about what >> I've been >> missing all these years? Does it fit? How many variance >> components >> are there in the ``naive'' model? It looks like 5 to me ... but >> maybe >> I'm totally out to lunch in what I think I'm understanding at >> this >> stage. >> (And besides --- there are three sorts of statistician; those >> who can >> count, >> and those who can't.) >> >> Thank you for your indulgence. >> >> cheers, >> >> Rolf Turner >> >> ###################################################################### >> Attention:This e-mail message is privileged and confidential. If >> you are not >> theintended recipient please delete the message and notify the >> sender.Any >> views or opinions presented are solely those of the author. >> >> This e-mail has been scanned and cleared by >> MailMarshalwww.marshalsoftware.com >> ###################################################################### >> > <Machines.pdf>_______________________________________________ > R-sig-mixed-models@... mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models ______________________________________________ R-help@... mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| < Prev | 1 - 2 | Next > |
| Free Forum Powered by Nabble | Forum Help |