lme nesting/interaction advice

View: New views
8 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Re: [R-sig-ME] lme nesting/interaction advice

by Douglas Bates-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm entering this discussion late so I may be discussing issues that
have already been addressed.

As I understand it, Federico, you began by describing a model for data
in which two factors have a fixed set of levels and one factor has an
extensible, or "random", set of levels and you wanted to fit a model
that you described as

y ~ effect1 * effect2 * effect3

The problem is that this specification is not complete.  An
interaction of factors with fixed levels and a factor with random
levels can mean, in the lmer specification,

lmer(y ~ effect1 * effect2 + (1| effect3) + (1|effect1:effect2:effect3), ...)

or

lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...)

or other variations.  When you specify a random effect or an random
interaction term you must, either explicitly or implicitly, specify
the form of the variance-covariance matrix associated with those
random effects.

The "advantage" that other software may provide for you is that it
chooses the model for you but that, of course, means that you only
have the one choice.

If you can describe how many variance components you think should be
estimated in your model and what they would represent then I think it
will be easier to describe how to fit the model.

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-sig-ME] lme nesting/interaction advice

by Federico Calboli :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 12 May 2008, at 17:09, Douglas Bates wrote:

> I'm entering this discussion late so I may be discussing issues that
> have already been addressed.
>
> As I understand it, Federico, you began by describing a model for data
> in which two factors have a fixed set of levels and one factor has an
> extensible, or "random", set of levels and you wanted to fit a model
> that you described as
>
> y ~ effect1 * effect2 * effect3
>
> The problem is that this specification is not complete.

My apologies for that, I thought that the above formula was the  
shorthand for what I would call the 'full' model, i.e. the single  
factors and the 2 and 3 ways interactions.

> An
> interaction of factors with fixed levels and a factor with random
> levels can mean, in the lmer specification,
>
> lmer(y ~ effect1 * effect2 + (1| effect3) + (1|
> effect1:effect2:effect3), ...)
>
> or
>
> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...)
>
> or other variations.  When you specify a random effect or an random
> interaction term you must, either explicitly or implicitly, specify
> the form of the variance-covariance matrix associated with those
> random effects.

I'll play around with this and see what I can get.
>
> The "advantage" that other software may provide for you is that it
> chooses the model for you but that, of course, means that you only
> have the one choice.

I'm more than happy to stick to R, and to put more legwork into my  
models
>
> If you can describe how many variance components you think should be
> estimated in your model and what they would represent then I think it
> will be easier to describe how to fit the model.

I'll work on that. Incidentally, what/where is the most comprehensive  
and up to date documentation for lme4? the pdfs coming with the  
package? I suspect knowing which are the right docs will help a lot  
in keeping me within the boundaries of civility and prevent me from  
annoying anyone (which is not something I sent forth to do on purpose).

Best regards,

Federico

--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-sig-ME] lme nesting/interaction advice

by Douglas Bates-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, May 12, 2008 at 11:22 AM, Federico Calboli
<f.calboli@...> wrote:

> On 12 May 2008, at 17:09, Douglas Bates wrote:
>
>> I'm entering this discussion late so I may be discussing issues that
>> have already been addressed.
>>
>> As I understand it, Federico, you began by describing a model for data
>> in which two factors have a fixed set of levels and one factor has an
>> extensible, or "random", set of levels and you wanted to fit a model
>> that you described as
>>
>> y ~ effect1 * effect2 * effect3
>>
>> The problem is that this specification is not complete.
>
> My apologies for that, I thought that the above formula was the shorthand
> for what I would call the 'full' model, i.e. the single factors and the 2
> and 3 ways interactions.

As I indicated, the trick is that the interaction of a fixed factor
and a random factor can be defined in more than one way.

It sounds as if what you want is

lmer(y ~ factor1 * factor2 + (1|factor3) + (1|factor1:factor3) +
(1|factor2:factor3) + (1|factor1:factor2:factor3), ...)

but I'm not sure.

>> An interaction of factors with fixed levels and a factor with random
>> levels can mean, in the lmer specification,
>>
>> lmer(y ~ effect1 * effect2 + (1| effect3) + (1|effect1:effect2:effect3),
>> ...)
>>
>> or
>>
>> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...)
>>
>> or other variations.  When you specify a random effect or an random
>> interaction term you must, either explicitly or implicitly, specify
>> the form of the variance-covariance matrix associated with those
>> random effects.
>
> I'll play around with this and see what I can get.
>>
>> The "advantage" that other software may provide for you is that it
>> chooses the model for you but that, of course, means that you only
>> have the one choice.
>
> I'm more than happy to stick to R, and to put more legwork into my models
>>
>> If you can describe how many variance components you think should be
>> estimated in your model and what they would represent then I think it
>> will be easier to describe how to fit the model.
>
> I'll work on that. Incidentally, what/where is the most comprehensive and up
> to date documentation for lme4? the pdfs coming with the package? I suspect
> knowing which are the right docs will help a lot in keeping me within the
> boundaries of civility and prevent me from annoying anyone (which is not
> something I sent forth to do on purpose).

Documentation for lme4 is pretty sketchy at present.  I hope to remedy
that during our summer break.

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-sig-ME] lme nesting/interaction advice

by Rolf Turner-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 13/05/2008, at 4:09 AM, Douglas Bates wrote:

> I'm entering this discussion late so I may be discussing issues that
> have already been addressed.
>
> As I understand it, Federico, you began by describing a model for data
> in which two factors have a fixed set of levels and one factor has an
> extensible, or "random", set of levels and you wanted to fit a model
> that you described as
>
> y ~ effect1 * effect2 * effect3
>
> The problem is that this specification is not complete.

        At *last* (as Owl said to Rabbit) we're getting somewhere!!!

        I always knew that there was some basic fundamental point
        about this business that I (and I believe many others) were
        simply missing.  But I could not for the life of me get anyone
        to explain to me what that point was.  Or to put it another
        way, I was never able to frame a question that would illuminate
        just what it was that I wasn't getting.

        I now may be at a stage where I can start asking the right
        questions.

> An interaction of factors with fixed levels and a factor with random
> levels can mean, in the lmer specification,
>
> lmer(y ~ effect1 * effect2 + (1| effect3) + (1|
> effect1:effect2:effect3), ...)
>
> or
>
> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...)
>
> or other variations.  When you specify a random effect or an random
> interaction term you must, either explicitly or implicitly, specify
> the form of the variance-covariance matrix associated with those
> random effects.
>
> The "advantage" that other software may provide for you is that it
> chooses the model for you but that, of course, means that you only
> have the one choice.

        Now may I start asking what I hope are questions that will lift
        the fog a bit?

        Let us for specificity consider a three-way model with two
        fixed effects and one random effect from the good old Rothamstead style
        agricultural experiment context:  Suppose we have a number of
        species/breeds of wheat (say) and a number of fertilizers.
        These are fixed effects.  And we have a number of fields (blocks)
        --- a random effect.  Each breed-fertilizer combination is
        applied a number of times in each field.  We ***assume*** that
        that the field or block effect is homogeneous throughout.  This
        may or may not be a ``good'' assumption, but it's not completely
        ridiculous and would often be made in practice.  And probably
        *was* made at Rothamstead.  The response would be something like
        yield in bushels per acre.

        The way that I would write the ``full'' model for this setting,
        in mathematical style is:

        Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k + (alpha.C)_ik
                     + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl

        The alpha_i and beta_j are parameters corresponding to breed and  
fertilizer
        respectively; the C_k are random effects corresponding to fields or  
blocks.
        Any effect ``involving'' C is also random.

        The assumptions made by the Package-Which-Must-Not-Be-Named are (I  
think)
        that

                C_k ~ N(0,sigma_C^2)
                (alpha.C)_ik ~ N(0,sigma_aC^2)
                (beta.C)jk ~ N(0,sigma_bC^2)
                (alpha.beta.C)_ijk ~ N(0,sigma_abC^2)
                E_ijkl ~ N(0,sigma^2)

        and these random variables are *all independent*.

        Ahhhhhhhh ... perhaps I'm on the way to answering my own question.  Is
        it this assumption of ``all independent'' which is questionable?  It
        seemed innocent enough when I first learned about this stuff, lo these
        many years ago.  But .... mmmmmaybe not!

        To start with:  What would be the lmer syntax to fit the foregoing
        (possibly naive) model?  I am sorry, but I really cannot get my head
        around the syntax of lmer model specification, and I've tried.  I
        really have.  Hard.  I know I must be starting from the wrong place,
        but I haven't a clue as to what the *right* place to start from is.
        And if I'm in that boat, I will wager Euros to pretzels that there
        are others in it.  I know that I'm not the brightest bulb in the
        chandelier, but I'm not the dullest either.

        Having got there:  Presuming that I'm more-or-less on the right track
        in my foregoing conjecture that it's the over-simple dependence  
structure
        that is the problem with what's delivered by the Package-Which-Must-
Not-Be-Named,
        how might one go about being less simple-minded?  I.e. what might be  
some
        more realistic dependence structures, and how would one specify  
these in lmer?
        And how would one assess whether the assumed dependence structure gives
        a reasonable fit to the data?

> If you can describe how many variance components you think should be
> estimated in your model and what they would represent then I think it
> will be easier to describe how to fit the model.

        How does this fit in with my conjecture (above) about what I've been
        missing all these years?  Does it fit?  How many variance components
        are there in the ``naive'' model?  It looks like 5 to me ... but maybe
        I'm totally out to lunch in what I think I'm understanding at this  
stage.
        (And besides --- there are three sorts of statistician; those who  
can count,
        and those who can't.)

        Thank you for your indulgence.

                 cheers,

                        Rolf Turner

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-sig-ME] lme nesting/interaction advice

by Douglas Bates-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, May 12, 2008 at 5:39 PM, Rolf Turner <r.turner@...> wrote:

>
> On 13/05/2008, at 4:09 AM, Douglas Bates wrote:
>
>> I'm entering this discussion late so I may be discussing issues that
>> have already been addressed.
>>
>> As I understand it, Federico, you began by describing a model for data
>> in which two factors have a fixed set of levels and one factor has an
>> extensible, or "random", set of levels and you wanted to fit a model
>> that you described as
>>
>> y ~ effect1 * effect2 * effect3
>>
>> The problem is that this specification is not complete.
>
>        At *last* (as Owl said to Rabbit) we're getting somewhere!!!
>
>        I always knew that there was some basic fundamental point
>        about this business that I (and I believe many others) were
>        simply missing.  But I could not for the life of me get anyone
>        to explain to me what that point was.  Or to put it another
>        way, I was never able to frame a question that would illuminate
>        just what it was that I wasn't getting.
>
>        I now may be at a stage where I can start asking the right
>        questions.
>
>> An interaction of factors with fixed levels and a factor with random
>> levels can mean, in the lmer specification,
>>
>> lmer(y ~ effect1 * effect2 + (1| effect3) + (1|effect1:effect2:effect3),
>> ...)
>>
>> or
>>
>> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...)
>>
>> or other variations.  When you specify a random effect or an random
>> interaction term you must, either explicitly or implicitly, specify
>> the form of the variance-covariance matrix associated with those
>> random effects.
>>
>> The "advantage" that other software may provide for you is that it
>> chooses the model for you but that, of course, means that you only
>> have the one choice.
>
>        Now may I start asking what I hope are questions that will lift
>        the fog a bit?
>
>        Let us for specificity consider a three-way model with two
>        fixed effects and one random effect from the good old Rothamstead
> style
>        agricultural experiment context:  Suppose we have a number of
>        species/breeds of wheat (say) and a number of fertilizers.
>        These are fixed effects.  And we have a number of fields (blocks)
>        --- a random effect.  Each breed-fertilizer combination is
>        applied a number of times in each field.  We ***assume*** that
>        that the field or block effect is homogeneous throughout.  This
>        may or may not be a ``good'' assumption, but it's not completely
>        ridiculous and would often be made in practice.  And probably
>        *was* made at Rothamstead.  The response would be something like
>        yield in bushels per acre.
>
>        The way that I would write the ``full'' model for this setting,
>        in mathematical style is:
>
>        Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k + (alpha.C)_ik
>                    + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl
>
>        The alpha_i and beta_j are parameters corresponding to breed and
> fertilizer
>        respectively; the C_k are random effects corresponding to fields or
> blocks.
>        Any effect ``involving'' C is also random.
>
>        The assumptions made by the Package-Which-Must-Not-Be-Named are (I
> think)
>        that
>
>                C_k ~ N(0,sigma_C^2)
>                (alpha.C)_ik ~ N(0,sigma_aC^2)
>                (beta.C)jk ~ N(0,sigma_bC^2)
>                (alpha.beta.C)_ijk ~ N(0,sigma_abC^2)
>                E_ijkl ~ N(0,sigma^2)
>
>        and these random variables are *all independent*.
>
>        Ahhhhhhhh ... perhaps I'm on the way to answering my own question.
>  Is
>        it this assumption of ``all independent'' which is questionable?  It
>        seemed innocent enough when I first learned about this stuff, lo
> these
>        many years ago.  But .... mmmmmaybe not!
>
>        To start with:  What would be the lmer syntax to fit the foregoing
>        (possibly naive) model?  I am sorry, but I really cannot get my head
>        around the syntax of lmer model specification, and I've tried.  I
>        really have.  Hard.  I know I must be starting from the wrong place,
>        but I haven't a clue as to what the *right* place to start from is.
>        And if I'm in that boat, I will wager Euros to pretzels that there
>        are others in it.  I know that I'm not the brightest bulb in the
>        chandelier, but I'm not the dullest either.
Thanks for the questions, Rolf.  I completely agree that mixed model
specification can be an extremely confusing area.

Let's consider a set of models for the Machines data reproduced (from
Milliken and Johnson) in Pinheiro and Bates and available in the MEMSS
package.

> library(lme4)
Loading required package: Matrix
Loading required package: lattice

Attaching package: 'Matrix'


        The following object(s) are masked from package:stats :

         xtabs
> data("Machines", package = "MEMSS")
> str(Machines)
'data.frame': 54 obs. of  3 variables:
 $ Worker : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3 4 ...
 $ Machine: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
 $ score  : num  52 52.8 53.1 51.8 52.8 53.1 60 60.2 58.4 51.1 ...

We consider the Machine factor to have a fixed set of levels in that
we only consider these three machines.  The levels of the Worker
factor represent a sample from the set of potential operators.  As you
might imagine from this description, I now think of the distinction
between "fixed" and "random" as being associated with the factor, not
necessarily the "effects".

If you plot these data
> dotplot(reorder(Worker, score) ~ score, Machines, groups = Machine, type = c("g", "p", "a"), xlab = "Efficiency score", ylab = "Worker", auto.key = list(columns = 3, lines = TRUE))

(resulting PDF enclosed) you will see evidence of an interaction.
That is, some workers have noticeably different score patterns on the
three machines than do others.  Worker 6 on machine B is the most
striking example.

One way to model this interaction is to say that there is a random
effect for each worker and a separate random effect for each
worker/machine combination.  If the random effects for the
worker/machine combinations are assumed to be independent with
constant variance then one expresses the model as

> print(fm1 <- lmer(score ~ Machine + (1|Worker) + (1|Machine:Worker), Machines), corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (1 | Worker) + (1 | Machine:Worker)
   Data: Machines
   AIC   BIC logLik deviance REMLdev
 227.7 239.6 -107.8    225.5   215.7
Random effects:
 Groups         Name        Variance Std.Dev.
 Machine:Worker (Intercept) 13.90963 3.72956
 Worker         (Intercept) 22.85528 4.78072
 Residual                    0.92464 0.96158
Number of obs: 54, groups: Machine:Worker, 18; Worker, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)   52.356      2.486  21.062
MachineB       7.967      2.177   3.659
MachineC      13.917      2.177   6.393

An equivalent formulation is
> print(fm1 <- lmer(score ~ Machine + (1|Worker/Machine), Machines), corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (1 | Worker/Machine)
   Data: Machines
   AIC   BIC logLik deviance REMLdev
 227.7 239.6 -107.8    225.5   215.7
Random effects:
 Groups         Name        Variance Std.Dev.
 Machine:Worker (Intercept) 13.90963 3.72956
 Worker         (Intercept) 22.85528 4.78072
 Residual                    0.92464 0.96158
Number of obs: 54, groups: Machine:Worker, 18; Worker, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)   52.356      2.486  21.062
MachineB       7.967      2.177   3.659
MachineC      13.917      2.177   6.393

The expression (1|Worker/Machine) is just "syntactic sugar".  It is
expanded to (1|Worker) + (1|Machine:Worker) before the model matrices
are created.

If you want to start with the formula and see what that means for the
model then use these rules:

- a term including the '|' operator is a random effects term
- if the left-hand side of the '|' operator is 1 then the random
effects are scalar random effects, one for each level of the factor on
the right of the '|'
- random effects associated with different terms are independent
- random effects associated with different levels of the factor within
a term are independent
- the variance of the random effects within the same term is constant

However, there is another mixed-effects model that could make sense
for these data.  Suppose I consider the variations associated with
each worker as a vector of length 3 (Machines A, B and C) with a
symmetric, positive semidefinite 3 by 3 variance-covariance matrix.  I
fit that model as

> print(fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines), corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (Machine | Worker)
   Data: Machines
   AIC   BIC logLik deviance REMLdev
 228.3 248.2 -104.2    216.6   208.3
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Worker   (Intercept) 16.64051 4.07928
          MachineB    34.54670 5.87764   0.484
          MachineC    13.61398 3.68971  -0.365  0.297
 Residual              0.92463 0.96158
Number of obs: 54, groups: Worker, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)   52.356      1.681  31.151
MachineB       7.967      2.421   3.291
MachineC      13.917      1.540   9.037

It may be more meaningful to write it as

> print(fm3 <- lmer(score ~ Machine + (0+Machine|Worker), Machines), corr = FALSE)
Linear mixed model fit by REML
Formula: score ~ Machine + (0 + Machine | Worker)
   Data: Machines
   AIC   BIC logLik deviance REMLdev
 228.3 248.2 -104.2    216.6   208.3
Random effects:
 Groups   Name     Variance Std.Dev. Corr
 Worker   MachineA 16.64097 4.07933
          MachineB 74.39557 8.62529  0.803
          MachineC 19.26646 4.38936  0.623 0.771
 Residual           0.92463 0.96158
Number of obs: 54, groups: Worker, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)   52.356      1.681  31.150
MachineB       7.967      2.421   3.291
MachineC      13.917      1.540   9.037

Now we are fitting 3 variances and 3 covariances for the random
effects instead of the previous models which had two variances.  The
difference in the models is exactly what made you pause - the simpler
model assumes that, conditional on the random effect for the worker,
the worker/machine random effects are independent and have constant
variance.  In the more general models the worker/machine interactions
are allowed to be correlated within worker.

It is more common to allow this kind of correlation within subject in
models for longitudinal data (the Laird-Ware formulation) where each
subject has a random effect for the intercept and a random effect for
the slope with respect to time and these can be correlated.  However,
this type of representation can make sense with a factor on the left
hand side of the '|' operator, like the Machine factor here.  If that
factor has a large number of levels then the model quickly becomes
unwieldy because the number of variance-covariance parameters to
estimate is quadratic in the number of levels of the factor on the
lhs.

I hope this helps.

>        Having got there:  Presuming that I'm more-or-less on the right track
>        in my foregoing conjecture that it's the over-simple dependence
> structure
>        that is the problem with what's delivered by the
> Package-Which-Must-Not-Be-Named,
>        how might one go about being less simple-minded?  I.e. what might be
> some
>        more realistic dependence structures, and how would one specify these
> in lmer?
>        And how would one assess whether the assumed dependence structure
> gives
>        a reasonable fit to the data?
>
>> If you can describe how many variance components you think should be
>> estimated in your model and what they would represent then I think it
>> will be easier to describe how to fit the model.
>
>        How does this fit in with my conjecture (above) about what I've been
>        missing all these years?  Does it fit?  How many variance components
>        are there in the ``naive'' model?  It looks like 5 to me ... but
> maybe
>        I'm totally out to lunch in what I think I'm understanding at this
> stage.
>        (And besides --- there are three sorts of statistician; those who can
> count,
>        and those who can't.)
>
>        Thank you for your indulgence.
>
>                 cheers,
>
>                        Rolf Turner
>
> ######################################################################
> Attention:This e-mail message is privileged and confidential. If you are not
> theintended recipient please delete the message and notify the sender.Any
> views or opinions presented are solely those of the author.
>
> This e-mail has been scanned and cleared by
> MailMarshalwww.marshalsoftware.com
> ######################################################################
>


______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Machines.pdf (51K) Download Attachment

Re: [R-sig-ME] lme nesting/interaction advice

by Rolf Turner-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Thanks very much for your long, detailed, patient, and lucid
response to my cri de coeur.  That helps a *great* deal.

I'm not sure that I have a solid understanding of the issues yet
--- I never am! --- but I think I'm getting there.  I'll need to
chew over the posting a bit more and try some examples before
I feel comfortable.

One point that I'd like to spell out very explicitly, to make sure
that I'm starting from the right square:

The model that you start with in the Machines/Workers example is

>> fm1 <- lmer(score ~ Machine + (1|Worker) + (1|Machine:Worker),  
>> Machines)


My understanding is that this is the ``only'' model that could be fitted
by the Package-Which-Must-Not-Be-Named.  I.e. this package *could  
not* fit
the second, more complex model:

>> fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines)

(At least not directly.)  Can you (or someone) confirm that I've got  
that
right?

It seems to me to be the key to why I've had such trouble in the past
in grappling with mixed models in R.  I.e. I've been thinking like
the Package-Which-Must-Not-Be-Named --- that the simple, everything-
independent
model was the only possible model.

Thanks again.

        cheers,

                Rolf

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-sig-ME] lme nesting/interaction advice

by Kingsford Jones :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Rolf,

On Tue, May 13, 2008 at 1:59 PM, Rolf Turner <r.turner@...> wrote:

< in response to Doug Bates' useful tutorial...>

>  Thanks very much for your long, detailed, patient, and lucid
>  response to my cri de coeur.  That helps a *great* deal.
>

Hear Hear!

<snip>

>  One point that I'd like to spell out very explicitly, to make sure
>  that I'm starting from the right square:
>
>  The model that you start with in the Machines/Workers example is
>
>
>
> >
> > > fm1 <- lmer(score ~ Machine + (1|Worker) + (1|Machine:Worker), Machines)
> > >
> >
>
>
>  My understanding is that this is the ``only'' model that could be fitted
>  by the Package-Which-Must-Not-Be-Named.  I.e. this package *could not* fit
>  the second, more complex model:
>
>
>
> >
> > > fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines)
> > >
> >
>
>  (At least not directly.)  Can you (or someone) confirm that I've got that
>  right?

Compare:

## R
> m4
Linear mixed model fit by REML
Formula: score ~ Machine + (0 + Machine | Worker)
   Data: Machines
   AIC   BIC logLik deviance REMLdev
 228.3 248.2 -104.2    216.6   208.3
Random effects:
 Groups   Name     Variance Std.Dev. Corr
 Worker   MachineA 16.64098 4.07934
          MachineB 74.39558 8.62529  0.803
          MachineC 19.26646 4.38936  0.623 0.771
 Residual           0.92463 0.96158
Number of obs: 54, groups: Worker, 6

## "The-Package"

proc mixed data = machines;
class worker machine;
model score = machine  / solution;
random machine / subject = worker type = un;
run;

 Covariance Parameter Estimates

Cov Parm     Subject    Estimate

UN(1,1)      Worker      16.6405
UN(2,1)      Worker      28.2447
UN(2,2)      Worker      74.3956
UN(3,1)      Worker      11.1465
UN(3,2)      Worker      29.1841
UN(3,3)      Worker      19.2675
Residual                  0.9246


The two outputs report essentially the same thing.
Note that e.g, UN(2,1) = 28.2477 approx= .803*4.07934*8.62529
(And, as usual, the fixed effects estimates match too once the
contrasts and 'types' of SS for an ANOVA table are set up)

UN is short for 'unstructured' - a term Doug has pointed out is not
particularly fitting because the covariance matrix is symmetric
positive definite.

>
>  It seems to me to be the key to why I've had such trouble in the past
>  in grappling with mixed models in R.  I.e. I've been thinking like
>  the Package-Which-Must-Not-Be-Named --- that the simple,
> everything-independent
>  model was the only possible model.
>

Although this may well not apply to you, another area of confusion
arises not so much from differences between stats packages but by
differences between methods. I'm not an expert in the estimation
methods but, as I understand it, classic texts describe fitting mixed
models in terms of ANOVA in the OLS framework, calculating method of
moments estimators for the variances of the random effects by equating
observed and expected mean squares (I believe using aov and lm with an
'Error' term would fall into this category, and proc anova and proc
glm would also).  Starting in the 90's these methods started falling
out of fashion in favor of ML/REML/GLS methods (likelihood based),
which offer more flexibility in structuring both the error and random
effects covariance matrices, will not produce negative variance
estimates, and have other nice properties that someone more 'mathy'
than me could explain.  Tools like lme, lmer, proc mixed and proc
glimmix fall into this category.

hoping this helps,

Kingsford Jones







>  Thanks again.
>
>         cheers,
>
>                 Rolf
>
>
>
>  ######################################################################
>  Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
>
>  ______________________________________________
>  R-help@... mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-sig-ME] lme nesting/interaction advice

by John Maindonald :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've been looking back over this discussion.
Another model that one can fit using lme is:

 > lme(score~Machine, random=list(Worker=pdIdent(~0+Machine)),
+        weights=varIdent(form=~1|Machine), data=Machines)
Linear mixed-effects model fit by REML
  Data: Machines
  Log-restricted-likelihood: -108.9547
  Fixed: score ~ Machine
(Intercept)    MachineB    MachineC
  52.355556    7.966667   13.916667

Random effects:
Formula: ~0 + Machine | Worker
Structure: Multiple of an Identity
        MachineA MachineB MachineC Residual
StdDev:  6.06209  6.06209  6.06209 1.148581

Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Machine
Parameter estimates:
        A         B         C
1.0000000 0.8713263 0.5859709
Number of Observations: 54
Number of Groups: 6


This insists (I think) that conditional on the random effect for the  
worker,
the worker/machine random effects be independent,
but allows them to have different variances.  I am wondering whether
it is possible to fit such a model using lmer().

[In this example the large estimated correlations suggest that it is not
a sensible model.]

John Maindonald             email: john.maindonald@...
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 14 May 2008, at 2:49 AM, Douglas Bates wrote:

> On Mon, May 12, 2008 at 5:39 PM, Rolf Turner  
> <r.turner@...> wrote:
>>
>> On 13/05/2008, at 4:09 AM, Douglas Bates wrote:
>>
>>> I'm entering this discussion late so I may be discussing issues that
>>> have already been addressed.
>>>
>>> As I understand it, Federico, you began by describing a model for  
>>> data
>>> in which two factors have a fixed set of levels and one factor has  
>>> an
>>> extensible, or "random", set of levels and you wanted to fit a model
>>> that you described as
>>>
>>> y ~ effect1 * effect2 * effect3
>>>
>>> The problem is that this specification is not complete.
>>
>>      At *last* (as Owl said to Rabbit) we're getting somewhere!!!
>>
>>      I always knew that there was some basic fundamental point
>>      about this business that I (and I believe many others) were
>>      simply missing.  But I could not for the life of me get anyone
>>      to explain to me what that point was.  Or to put it another
>>      way, I was never able to frame a question that would illuminate
>>      just what it was that I wasn't getting.
>>
>>      I now may be at a stage where I can start asking the right
>>      questions.
>>
>>> An interaction of factors with fixed levels and a factor with random
>>> levels can mean, in the lmer specification,
>>>
>>> lmer(y ~ effect1 * effect2 + (1| effect3) + (1|
>>> effect1:effect2:effect3),
>>> ...)
>>>
>>> or
>>>
>>> lmer(y ~ effect1 * effect2 + (effect1*effect2 | effect3), ...)
>>>
>>> or other variations.  When you specify a random effect or an random
>>> interaction term you must, either explicitly or implicitly, specify
>>> the form of the variance-covariance matrix associated with those
>>> random effects.
>>>
>>> The "advantage" that other software may provide for you is that it
>>> chooses the model for you but that, of course, means that you only
>>> have the one choice.
>>
>>      Now may I start asking what I hope are questions that will lift
>>      the fog a bit?
>>
>>      Let us for specificity consider a three-way model with two
>>      fixed effects and one random effect from the good old  
>> Rothamstead
>> style
>>      agricultural experiment context:  Suppose we have a number of
>>      species/breeds of wheat (say) and a number of fertilizers.
>>      These are fixed effects.  And we have a number of fields  
>> (blocks)
>>      --- a random effect.  Each breed-fertilizer combination is
>>      applied a number of times in each field.  We ***assume*** that
>>      that the field or block effect is homogeneous throughout.  This
>>      may or may not be a ``good'' assumption, but it's not completely
>>      ridiculous and would often be made in practice.  And probably
>>      *was* made at Rothamstead.  The response would be something like
>>      yield in bushels per acre.
>>
>>      The way that I would write the ``full'' model for this setting,
>>      in mathematical style is:
>>
>>      Y_ijkl = mu + alpha_i + beta_j + (alpha.beta)_ij + C_k +  
>> (alpha.C)_ik
>>                  + (beta.C)_jk + (alpha.beta.C)_ijk + E_ijkl
>>
>>      The alpha_i and beta_j are parameters corresponding to breed and
>> fertilizer
>>      respectively; the C_k are random effects corresponding to  
>> fields or
>> blocks.
>>      Any effect ``involving'' C is also random.
>>
>>      The assumptions made by the Package-Which-Must-Not-Be-Named  
>> are (I
>> think)
>>      that
>>
>>              C_k ~ N(0,sigma_C^2)
>>              (alpha.C)_ik ~ N(0,sigma_aC^2)
>>              (beta.C)jk ~ N(0,sigma_bC^2)
>>              (alpha.beta.C)_ijk ~ N(0,sigma_abC^2)
>>              E_ijkl ~ N(0,sigma^2)
>>
>>      and these random variables are *all independent*.
>>
>>      Ahhhhhhhh ... perhaps I'm on the way to answering my own  
>> question.
>> Is
>>      it this assumption of ``all independent'' which is  
>> questionable?  It
>>      seemed innocent enough when I first learned about this stuff, lo
>> these
>>      many years ago.  But .... mmmmmaybe not!
>>
>>      To start with:  What would be the lmer syntax to fit the  
>> foregoing
>>      (possibly naive) model?  I am sorry, but I really cannot get  
>> my head
>>      around the syntax of lmer model specification, and I've  
>> tried.  I
>>      really have.  Hard.  I know I must be starting from the wrong  
>> place,
>>      but I haven't a clue as to what the *right* place to start  
>> from is.
>>      And if I'm in that boat, I will wager Euros to pretzels that  
>> there
>>      are others in it.  I know that I'm not the brightest bulb in the
>>      chandelier, but I'm not the dullest either.
>
> Thanks for the questions, Rolf.  I completely agree that mixed model
> specification can be an extremely confusing area.
>
> Let's consider a set of models for the Machines data reproduced (from
> Milliken and Johnson) in Pinheiro and Bates and available in the MEMSS
> package.
>
>> library(lme4)
> Loading required package: Matrix
> Loading required package: lattice
>
> Attaching package: 'Matrix'
>
>
> The following object(s) are masked from package:stats :
>
> xtabs
>> data("Machines", package = "MEMSS")
>> str(Machines)
> 'data.frame': 54 obs. of  3 variables:
> $ Worker : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 2 2 2 3 3 3  
> 4 ...
> $ Machine: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
> $ score  : num  52 52.8 53.1 51.8 52.8 53.1 60 60.2 58.4 51.1 ...
>
> We consider the Machine factor to have a fixed set of levels in that
> we only consider these three machines.  The levels of the Worker
> factor represent a sample from the set of potential operators.  As you
> might imagine from this description, I now think of the distinction
> between "fixed" and "random" as being associated with the factor, not
> necessarily the "effects".
>
> If you plot these data
>> dotplot(reorder(Worker, score) ~ score, Machines, groups = Machine,  
>> type = c("g", "p", "a"), xlab = "Efficiency score", ylab =  
>> "Worker", auto.key = list(columns = 3, lines = TRUE))
>
> (resulting PDF enclosed) you will see evidence of an interaction.
> That is, some workers have noticeably different score patterns on the
> three machines than do others.  Worker 6 on machine B is the most
> striking example.
>
> One way to model this interaction is to say that there is a random
> effect for each worker and a separate random effect for each
> worker/machine combination.  If the random effects for the
> worker/machine combinations are assumed to be independent with
> constant variance then one expresses the model as
>
>> print(fm1 <- lmer(score ~ Machine + (1|Worker) + (1|
>> Machine:Worker), Machines), corr = FALSE)
> Linear mixed model fit by REML
> Formula: score ~ Machine + (1 | Worker) + (1 | Machine:Worker)
>  Data: Machines
>  AIC   BIC logLik deviance REMLdev
> 227.7 239.6 -107.8    225.5   215.7
> Random effects:
> Groups         Name        Variance Std.Dev.
> Machine:Worker (Intercept) 13.90963 3.72956
> Worker         (Intercept) 22.85528 4.78072
> Residual                    0.92464 0.96158
> Number of obs: 54, groups: Machine:Worker, 18; Worker, 6
>
> Fixed effects:
>           Estimate Std. Error t value
> (Intercept)   52.356      2.486  21.062
> MachineB       7.967      2.177   3.659
> MachineC      13.917      2.177   6.393
>
> An equivalent formulation is
>> print(fm1 <- lmer(score ~ Machine + (1|Worker/Machine), Machines),  
>> corr = FALSE)
> Linear mixed model fit by REML
> Formula: score ~ Machine + (1 | Worker/Machine)
>  Data: Machines
>  AIC   BIC logLik deviance REMLdev
> 227.7 239.6 -107.8    225.5   215.7
> Random effects:
> Groups         Name        Variance Std.Dev.
> Machine:Worker (Intercept) 13.90963 3.72956
> Worker         (Intercept) 22.85528 4.78072
> Residual                    0.92464 0.96158
> Number of obs: 54, groups: Machine:Worker, 18; Worker, 6
>
> Fixed effects:
>           Estimate Std. Error t value
> (Intercept)   52.356      2.486  21.062
> MachineB       7.967      2.177   3.659
> MachineC      13.917      2.177   6.393
>
> The expression (1|Worker/Machine) is just "syntactic sugar".  It is
> expanded to (1|Worker) + (1|Machine:Worker) before the model matrices
> are created.
>
> If you want to start with the formula and see what that means for the
> model then use these rules:
>
> - a term including the '|' operator is a random effects term
> - if the left-hand side of the '|' operator is 1 then the random
> effects are scalar random effects, one for each level of the factor on
> the right of the '|'
> - random effects associated with different terms are independent
> - random effects associated with different levels of the factor within
> a term are independent
> - the variance of the random effects within the same term is constant
>
> However, there is another mixed-effects model that could make sense
> for these data.  Suppose I consider the variations associated with
> each worker as a vector of length 3 (Machines A, B and C) with a
> symmetric, positive semidefinite 3 by 3 variance-covariance matrix.  I
> fit that model as
>
>> print(fm2 <- lmer(score ~ Machine + (Machine|Worker), Machines),  
>> corr = FALSE)
> Linear mixed model fit by REML
> Formula: score ~ Machine + (Machine | Worker)
>  Data: Machines
>  AIC   BIC logLik deviance REMLdev
> 228.3 248.2 -104.2    216.6   208.3
> Random effects:
> Groups   Name        Variance Std.Dev. Corr
> Worker   (Intercept) 16.64051 4.07928
>         MachineB    34.54670 5.87764   0.484
>         MachineC    13.61398 3.68971  -0.365  0.297
> Residual              0.92463 0.96158
> Number of obs: 54, groups: Worker, 6
>
> Fixed effects:
>           Estimate Std. Error t value
> (Intercept)   52.356      1.681  31.151
> MachineB       7.967      2.421   3.291
> MachineC      13.917      1.540   9.037
>
> It may be more meaningful to write it as
>
>> print(fm3 <- lmer(score ~ Machine + (0+Machine|Worker), Machines),  
>> corr = FALSE)
> Linear mixed model fit by REML
> Formula: score ~ Machine + (0 + Machine | Worker)
>  Data: Machines
>  AIC   BIC logLik deviance REMLdev
> 228.3 248.2 -104.2    216.6   208.3
> Random effects:
> Groups   Name     Variance Std.Dev. Corr
> Worker   MachineA 16.64097 4.07933
>         MachineB 74.39557 8.62529  0.803
>         MachineC 19.26646 4.38936  0.623 0.771
> Residual           0.92463 0.96158
> Number of obs: 54, groups: Worker, 6
>
> Fixed effects:
>           Estimate Std. Error t value
> (Intercept)   52.356      1.681  31.150
> MachineB       7.967      2.421   3.291
> MachineC      13.917      1.540   9.037
>
> Now we are fitting 3 variances and 3 covariances for the random
> effects instead of the previous models which had two variances.  The
> difference in the models is exactly what made you pause - the simpler
> model assumes that, conditional on the random effect for the worker,
> the worker/machine random effects are independent and have constant
> variance.  In the more general models the worker/machine interactions
> are allowed to be correlated within worker.
>
> It is more common to allow this kind of correlation within subject in
> models for longitudinal data (the Laird-Ware formulation) where each
> subject has a random effect for the intercept and a random effect for
> the slope with respect to time and these can be correlated.  However,
> this type of representation can make sense with a factor on the left
> hand side of the '|' operator, like the Machine factor here.  If that
> factor has a large number of levels then the model quickly becomes
> unwieldy because the number of variance-covariance parameters to
> estimate is quadratic in the number of levels of the factor on the
> lhs.
>
> I hope this helps.
>
>>      Having got there:  Presuming that I'm more-or-less on the  
>> right track
>>      in my foregoing conjecture that it's the over-simple dependence
>> structure
>>      that is the problem with what's delivered by the
>> Package-Which-Must-Not-Be-Named,
>>      how might one go about being less simple-minded?  I.e. what  
>> might be
>> some
>>      more realistic dependence structures, and how would one  
>> specify these
>> in lmer?
>>      And how would one assess whether the assumed dependence  
>> structure
>> gives
>>      a reasonable fit to the data?
>>
>>> If you can describe how many variance components you think should be
>>> estimated in your model and what they would represent then I think  
>>> it
>>> will be easier to describe how to fit the model.
>>
>>      How does this fit in with my conjecture (above) about what  
>> I've been
>>      missing all these years?  Does it fit?  How many variance  
>> components
>>      are there in the ``naive'' model?  It looks like 5 to me ... but
>> maybe
>>      I'm totally out to lunch in what I think I'm understanding at  
>> this
>> stage.
>>      (And besides --- there are three sorts of statistician; those  
>> who can
>> count,
>>      and those who can't.)
>>
>>      Thank you for your indulgence.
>>
>>               cheers,
>>
>>                      Rolf Turner
>>
>> ######################################################################
>> Attention:This e-mail message is privileged and confidential. If  
>> you are not
>> theintended recipient please delete the message and notify the  
>> sender.Any
>> views or opinions presented are solely those of the author.
>>
>> This e-mail has been scanned and cleared by
>> MailMarshalwww.marshalsoftware.com
>> ######################################################################
>>
> <Machines.pdf>_______________________________________________
> R-sig-mixed-models@... mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
< Prev | 1 - 2 | Next >