matching predictors and dummies

View: New views
4 Messages — Rating Filter:   Alert me  

matching predictors and dummies

by Jeroen Ooms :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I am trying to make a little web interface for the lm() function. It calculates both anova F-tests and parameters and returns it in a nice table. However, I have a problem with matching the Anova predictors with the regression coefficients: For numeric predictors there is no problem: the coefficients have the same names as the predictors. However, when a factor IV is specified, lm() automatically converts this factor to dummy variables, which (of course) have different names than the orriginal predictor. The lm model that is returned contains a seperate parameter for every dummy variable.

Then when you use anova(lm.model) the function seems to know which of the parameters are dummies of one and the same factor, and takes these together in the anova-test. The anova() function returns the variance explained by the orriginal factor, which are all dummies. It does not show the seperate dummy variables anymore. Of course, this is exactly what you want in an analysis of variance.

My question is: where in the lm or glm object is stored which of the parameters are dummies of the same factor? The only thing i could think of was using lm.model$xlevels, however manipulating these names in the lm-model did not confuse anova() at all, so I guess there is a better way.

An additional question is if it is possible to specify the names of the dummy variables that lm/glm creates when a factor is specified as IV?

Re: matching predictors and dummies

by Charles C. Berry :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 11 Jul 2008, Jeroen Ooms wrote:

>
> I am trying to make a little web interface for the lm() function. It
> calculates both anova F-tests and parameters and returns it in a nice table.
> However, I have a problem with matching the Anova predictors with the
> regression coefficients: For numeric predictors there is no problem: the
> coefficients have the same names as the predictors. However, when a factor
> IV is specified, lm() automatically converts this factor to dummy variables,
> which (of course) have different names than the orriginal predictor. The lm
> model that is returned contains a seperate parameter for every dummy
> variable.
>
> Then when you use anova(lm.model) the function seems to know which of the
> parameters are dummies of one and the same factor, and takes these together
> in the anova-test. The anova() function returns the variance explained by
> the orriginal factor, which are all dummies. It does not show the seperate
> dummy variables anymore. Of course, this is exactly what you want in an
> analysis of variance.
>
> My question is: where in the lm or glm object is stored which of the
> parameters are dummies of the same factor? The only thing i could think of
> was using lm.model$xlevels, however manipulating these names in the lm-model
> did not confuse anova() at all, so I guess there is a better way.

See
  ?terms
  ?terms.object

and run

  example( terms.object )

or something like

  terms( lm( Ozone ~ Temp + factor(Month), airquality ) )

>
> An additional question is if it is possible to specify the names of the
> dummy variables that lm/glm creates when a factor is specified as IV?
> --

I'm guessing a custom contrast function would do this. Have a look at

  ?contrasts
  page( contr.treatment, 'print' )

Or just hack the names attribute of the relevant pieces in the object
returned by lm/glm.

You do know to use str(), right?


HTH,

Chuck


> View this message in context: http://www.nabble.com/matching-predictors-and-dummies-tp18405023p18405023.html
> Sent from the R devel mailing list archive at Nabble.com.
>
> ______________________________________________
> R-devel@... mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry@...            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-devel@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: matching predictors and dummies

by Jeroen Ooms :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Charles C. Berry wrote:
See
  ?terms
  ?terms.object
I am sorry but i cannot figure out how i can find out which coefficients belong to which predictors using the model$terms. If i do attributes(model$terms) i get a nice list which contains the orriginal factors and some more information, and from model$xlevels i can see the levels of these factors. However, it does not say anywhere which of the model$coefficients belong to which factor. The only thing i can imagine is assuming that the dummy names are simply the concatenation of the factor name and the level name, which seems to be the default behavior. But preferebly i would not want to make my program rely on this assumption. How could i use the model$terms to extract which coefficients belong to which factor, the way anova() does it?

Re: matching predictors and dummies

by Mark Difford :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jeroen,

>> How could i use the model$terms to extract which coefficients belong to which factor,
>> the way anova() does it?

There may be a simpler ("canned") way to do it, but why don't you debug anova.lm to see how it does it?

##
methods("anova")
anova.lm

HTH, Mark.

Jeroen Ooms wrote:
Charles C. Berry wrote:
See
  ?terms
  ?terms.object
I am sorry but i cannot figure out how i can find out which coefficients belong to which predictors using the model$terms. If i do attributes(model$terms) i get a nice list which contains the orriginal factors and some more information, and from model$xlevels i can see the levels of these factors. However, it does not say anywhere which of the model$coefficients belong to which factor. The only thing i can imagine is assuming that the dummy names are simply the concatenation of the factor name and the level name, which seems to be the default behavior. But preferebly i would not want to make my program rely on this assumption. How could i use the model$terms to extract which coefficients belong to which factor, the way anova() does it?
LightInTheBox - Buy quality products at wholesale price