Coefficients of Logistic Regression from bootstrap - how to get them?

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Coefficients of Logistic Regression from bootstrap - how to get them?

by Michal Figurski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello all,

I am trying to optimize my logistic regression model by using bootstrap.
I was previously using SAS for this kind of tasks, but I am now
switching to R.

My data frame consists of 5 columns and has 109 rows. Each row is a
single record composed of the following values: Subject_name, numeric1,
numeric2, numeric3 and outcome (yes or no). All three numerics are used
to predict outcome using LR.

In SAS I have written a macro, that was splitting the dataset, running
LR on one half of data and making predictions on second half. Then it
was collecting the equation coefficients from each iteration of
bootstrap. Later I was just taking medians of these coefficients from
all iterations, and used them as an optimal model - it really worked well!

Now I want to do the same in R. I tried to use the 'validate' or
'calibrate' functions from package "Design", and I also experimented
with function 'sm.binomial.bootstrap' from package "sm". I tried also
the function 'boot' from package "boot", though without success - in my
case it randomly selected _columns_ from my data frame, while I wanted
it to select _rows_.

Though the main point here is the optimized LR equation. I would
appreciate any help on how to extract the LR equation coefficients from
any of these bootstrap functions, in the same form as given by 'glm' or
'lrm'.

Many thanks in advance!

--
Michal J. Figurski
HUP, Pathology & Laboratory Medicine
Xenobiotics Toxicokinetics Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Coefficients of Logistic Regression from bootstrap - how to get them?

by Frank E Harrell Jr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Michal Figurski wrote:

> Hello all,
>
> I am trying to optimize my logistic regression model by using bootstrap.
> I was previously using SAS for this kind of tasks, but I am now
> switching to R.
>
> My data frame consists of 5 columns and has 109 rows. Each row is a
> single record composed of the following values: Subject_name, numeric1,
> numeric2, numeric3 and outcome (yes or no). All three numerics are used
> to predict outcome using LR.
>
> In SAS I have written a macro, that was splitting the dataset, running
> LR on one half of data and making predictions on second half. Then it
> was collecting the equation coefficients from each iteration of
> bootstrap. Later I was just taking medians of these coefficients from
> all iterations, and used them as an optimal model - it really worked well!

Why not use maximum likelihood estimation, i.e., the coefficients from
the original fit.  How does the bootstrap improve on that?

>
> Now I want to do the same in R. I tried to use the 'validate' or
> 'calibrate' functions from package "Design", and I also experimented
> with function 'sm.binomial.bootstrap' from package "sm". I tried also
> the function 'boot' from package "boot", though without success - in my
> case it randomly selected _columns_ from my data frame, while I wanted
> it to select _rows_.

validate and calibrate in Design do resampling on the rows

Resampling is mainly used to get a nearly unbiased estimate of the model
performance, i.e., to correct for overfitting.

Frank Harrell

>
> Though the main point here is the optimized LR equation. I would
> appreciate any help on how to extract the LR equation coefficients from
> any of these bootstrap functions, in the same form as given by 'glm' or
> 'lrm'.
>
> Many thanks in advance!
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Coefficients of Logistic Regression from bootstrap - how to get them?

by Michal Figurski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Frank,

"How does bootstrap improve on that?"

I don't know, but I have an idea. Since the data in my set are just a
small sample of a big population, then if I use my whole dataset to
obtain max likelihood estimates, these estimates may be best for this
dataset, but far from ideal for the whole population.

I used bootstrap to virtually increase the size of my dataset, it should
result in estimates more close to that from the population - isn't it
the purpose of bootstrap?

When I use such median coefficients on another dataset (another sample
from population), the predictions are better, than using max likelihood
estimates. I have already tested that and it worked!

I am not a statistician and I don't feel what "overfitting" is, but it
may be just another word for the same idea.

Nevertheless, I would still like to know how can I get the coeffcients
for the model that gives the "nearly unbiased estimates". I greatly
appreciate your help.

--
Michal J. Figurski
HUP, Pathology & Laboratory Medicine
Xenobiotics Toxicokinetics Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

Frank E Harrell Jr wrote:

> Michal Figurski wrote:
>> Hello all,
>>
>> I am trying to optimize my logistic regression model by using
>> bootstrap. I was previously using SAS for this kind of tasks, but I am
>> now switching to R.
>>
>> My data frame consists of 5 columns and has 109 rows. Each row is a
>> single record composed of the following values: Subject_name,
>> numeric1, numeric2, numeric3 and outcome (yes or no). All three
>> numerics are used to predict outcome using LR.
>>
>> In SAS I have written a macro, that was splitting the dataset, running
>> LR on one half of data and making predictions on second half. Then it
>> was collecting the equation coefficients from each iteration of
>> bootstrap. Later I was just taking medians of these coefficients from
>> all iterations, and used them as an optimal model - it really worked
>> well!
>
> Why not use maximum likelihood estimation, i.e., the coefficients from
> the original fit.  How does the bootstrap improve on that?
>
>>
>> Now I want to do the same in R. I tried to use the 'validate' or
>> 'calibrate' functions from package "Design", and I also experimented
>> with function 'sm.binomial.bootstrap' from package "sm". I tried also
>> the function 'boot' from package "boot", though without success - in
>> my case it randomly selected _columns_ from my data frame, while I
>> wanted it to select _rows_.
>
> validate and calibrate in Design do resampling on the rows
>
> Resampling is mainly used to get a nearly unbiased estimate of the model
> performance, i.e., to correct for overfitting.
>
> Frank Harrell
>
>>
>> Though the main point here is the optimized LR equation. I would
>> appreciate any help on how to extract the LR equation coefficients
>> from any of these bootstrap functions, in the same form as given by
>> 'glm' or 'lrm'.
>>
>> Many thanks in advance!
>>
>
>

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Coefficients of Logistic Regression from bootstrap - how to get them?

by Doran, Harold :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I used bootstrap to virtually increase the size of my
> dataset, it should result in estimates more close to that
> from the population - isn't it the purpose of bootstrap?

No, not really. The bootstrap is a resampling method for variance
estimation. It is often used when there is not an easy way, or a closed
form expression, for estimating the sampling variance of a statistic.

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Coefficients of Logistic Regression from bootstrap - how to get them?

by 刘杰-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Doran,

Maybe I am wrong, but I think bootstrap is a general resampling method which
can be used for different purposes...Usually it works well when you do not
have a presentative sample set (maybe with limited number of samples).
Therefore, I am positive with Michal...

P.S., overfitting, in my opinion, is used to depict when you got a model
which is quite specific for the training dataset but cannot be generalized
with new samples......

Thanks,

--Jerry
2008/7/21 Doran, Harold <HDoran@...>:

> > I used bootstrap to virtually increase the size of my
> > dataset, it should result in estimates more close to that
> > from the population - isn't it the purpose of bootstrap?
>
> No, not really. The bootstrap is a resampling method for variance
> estimation. It is often used when there is not an easy way, or a closed
> form expression, for estimating the sampling variance of a statistic.
>
> ______________________________________________
> R-help@... mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Coefficients of Logistic Regression from bootstrap - how

by Ted.Harding-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

There is one aspect for which bootstrap or re-sampling is useful,
which is not provided by maximum likelihood estimation (and the
usual MLE estimates of SEs of the coefficients.

That is, that the SEs of the coefficients are conditional on the
values of the covariates in the sample. The only random variation
that is considered in producing the SEs in standard regression is
that of the response variable, as implied by the model being fitted.

Hence the MLE will tell you about the uncertainty in the coefficients
due to random response, but with only the exact covariate values which
are present in the sample.

In practice, as has been indicated by other responses, the data are
from a population in which the covariates vary and not all have been
observed, and there is interest in assessing the uncertainty about
the "population coefficients" due to this.

An indication of this (with somewhat uncertain reliability) can be
obtained by a bootstrap procedure, on the basis that sampling from
the sample will have some resemblance to sampling from the population.

Ted.

On 21-Jul-08 19:56:16, Áõ½Ü wrote:

> Hi Doran,
>
> Maybe I am wrong, but I think bootstrap is a general resampling method
> which
> can be used for different purposes...Usually it works well when you do
> not
> have a presentative sample set (maybe with limited number of samples).
> Therefore, I am positive with Michal...
>
> P.S., overfitting, in my opinion, is used to depict when you got a
> model
> which is quite specific for the training dataset but cannot be
> generalized
> with new samples......
>
> Thanks,
>
> --Jerry
> 2008/7/21 Doran, Harold <HDoran@...>:
>
>> > I used bootstrap to virtually increase the size of my
>> > dataset, it should result in estimates more close to that
>> > from the population - isn't it the purpose of bootstrap?
>>
>> No, not really. The bootstrap is a resampling method for variance
>> estimation. It is often used when there is not an easy way, or a
>> closed
>> form expression, for estimating the sampling variance of a statistic.
>>
>> ______________________________________________
>> R-help@... mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/po
>> sting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@... mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding@...>
Fax-to-email: +44 (0)870 094 0861
Date: 21-Jul-08                                       Time: 21:11:10
------------------------------ XFMail ------------------------------

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Coefficients of Logistic Regression from bootstrap - how to get them?

by Doran, Harold :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Well, here is a good source--wikipedia.
 
http://en.wikipedia.org/wiki/Bootstrapping_(statistics)


________________________________

        From: Áõ½Ü [mailto:jerryliu4u@...]
        Sent: Monday, July 21, 2008 3:56 PM
        To: Doran, Harold
        Cc: Michal Figurski; Frank E Harrell Jr; r-help@...
        Subject: Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?
       
       
        Hi Doran,
         
        Maybe I am wrong, but I think bootstrap is a general resampling method which can be used for different purposes...Usually it works well when you do not have a presentative sample set (maybe with limited number of samples). Therefore, I am positive with Michal...
         
        P.S., overfitting, in my opinion, is used to depict when you got a model which is quite specific for the training dataset but cannot be generalized with new samples......
         
        Thanks,
         
        --Jerry
       
        2008/7/21 Doran, Harold <HDoran@...>:
       

                > I used bootstrap to virtually increase the size of my
                > dataset, it should result in estimates more close to that
                > from the population - isn't it the purpose of bootstrap?
               
               
                No, not really. The bootstrap is a resampling method for variance
                estimation. It is often used when there is not an easy way, or a closed
                form expression, for estimating the sampling variance of a statistic.
               

                ______________________________________________
                R-help@... mailing list
                https://stat.ethz.ch/mailman/listinfo/r-help
                PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
                and provide commented, minimal, self-contained, reproducible code.
               



        [[alternative HTML version deleted]]


______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Coefficients of Logistic Regression from bootstrap - how to get them?

by Frank E Harrell Jr :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Michal Figurski wrote:
> Frank,
>
> "How does bootstrap improve on that?"
>
> I don't know, but I have an idea. Since the data in my set are just a
> small sample of a big population, then if I use my whole dataset to
> obtain max likelihood estimates, these estimates may be best for this
> dataset, but far from ideal for the whole population.

The bootstrap, being a resampling procedure from your sample, has the
same issues about the population as MLEs.

>
> I used bootstrap to virtually increase the size of my dataset, it should
> result in estimates more close to that from the population - isn't it
> the purpose of bootstrap?

No

>
> When I use such median coefficients on another dataset (another sample
> from population), the predictions are better, than using max likelihood
> estimates. I have already tested that and it worked!

Then your testing procedure is probably not valid.

>
> I am not a statistician and I don't feel what "overfitting" is, but it
> may be just another word for the same idea.
>
> Nevertheless, I would still like to know how can I get the coeffcients
> for the model that gives the "nearly unbiased estimates". I greatly
> appreciate your help.

More info in my book Regression Modeling Strategies.

Frank

>
> --
> Michal J. Figurski
> HUP, Pathology & Laboratory Medicine
> Xenobiotics Toxicokinetics Research Laboratory
> 3400 Spruce St. 7 Maloney
> Philadelphia, PA 19104
> tel. (215) 662-3413
>
> Frank E Harrell Jr wrote:
>> Michal Figurski wrote:
>>> Hello all,
>>>
>>> I am trying to optimize my logistic regression model by using
>>> bootstrap. I was previously using SAS for this kind of tasks, but I
>>> am now switching to R.
>>>
>>> My data frame consists of 5 columns and has 109 rows. Each row is a
>>> single record composed of the following values: Subject_name,
>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three
>>> numerics are used to predict outcome using LR.
>>>
>>> In SAS I have written a macro, that was splitting the dataset,
>>> running LR on one half of data and making predictions on second half.
>>> Then it was collecting the equation coefficients from each iteration
>>> of bootstrap. Later I was just taking medians of these coefficients
>>> from all iterations, and used them as an optimal model - it really
>>> worked well!
>>
>> Why not use maximum likelihood estimation, i.e., the coefficients from
>> the original fit.  How does the bootstrap improve on that?
>>
>>>
>>> Now I want to do the same in R. I tried to use the 'validate' or
>>> 'calibrate' functions from package "Design", and I also experimented
>>> with function 'sm.binomial.bootstrap' from package "sm". I tried also
>>> the function 'boot' from package "boot", though without success - in
>>> my case it randomly selected _columns_ from my data frame, while I
>>> wanted it to select _rows_.
>>
>> validate and calibrate in Design do resampling on the rows
>>
>> Resampling is mainly used to get a nearly unbiased estimate of the
>> model performance, i.e., to correct for overfitting.
>>
>> Frank Harrell
>>
>>>
>>> Though the main point here is the optimized LR equation. I would
>>> appreciate any help on how to extract the LR equation coefficients
>>> from any of these bootstrap functions, in the same form as given by
>>> 'glm' or 'lrm'.
>>>
>>> Many thanks in advance!
>>>
>>
>>
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Coefficients of Logistic Regression from bootstrap - how to get them?

by Michal Figurski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear all,

I don't want to argue with anybody about words or about what bootstrap
is suitable for - I know too little for that.

All I need is help to get the *equation coefficients* optimized by
bootstrap - either by one of the functions or by simple median.

Please help,

--
Michal J. Figurski
HUP, Pathology & Laboratory Medicine
Xenobiotics Toxicokinetics Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

Frank E Harrell Jr wrote:

> Michal Figurski wrote:
>> Frank,
>>
>> "How does bootstrap improve on that?"
>>
>> I don't know, but I have an idea. Since the data in my set are just a
>> small sample of a big population, then if I use my whole dataset to
>> obtain max likelihood estimates, these estimates may be best for this
>> dataset, but far from ideal for the whole population.
>
> The bootstrap, being a resampling procedure from your sample, has the
> same issues about the population as MLEs.
>
>>
>> I used bootstrap to virtually increase the size of my dataset, it
>> should result in estimates more close to that from the population -
>> isn't it the purpose of bootstrap?
>
> No
>
>>
>> When I use such median coefficients on another dataset (another sample
>> from population), the predictions are better, than using max
>> likelihood estimates. I have already tested that and it worked!
>
> Then your testing procedure is probably not valid.
>
>>
>> I am not a statistician and I don't feel what "overfitting" is, but it
>> may be just another word for the same idea.
>>
>> Nevertheless, I would still like to know how can I get the coeffcients
>> for the model that gives the "nearly unbiased estimates". I greatly
>> appreciate your help.
>
> More info in my book Regression Modeling Strategies.
>
> Frank
>
>>
>> --
>> Michal J. Figurski
>> HUP, Pathology & Laboratory Medicine
>> Xenobiotics Toxicokinetics Research Laboratory
>> 3400 Spruce St. 7 Maloney
>> Philadelphia, PA 19104
>> tel. (215) 662-3413
>>
>> Frank E Harrell Jr wrote:
>>> Michal Figurski wrote:
>>>> Hello all,
>>>>
>>>> I am trying to optimize my logistic regression model by using
>>>> bootstrap. I was previously using SAS for this kind of tasks, but I
>>>> am now switching to R.
>>>>
>>>> My data frame consists of 5 columns and has 109 rows. Each row is a
>>>> single record composed of the following values: Subject_name,
>>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three
>>>> numerics are used to predict outcome using LR.
>>>>
>>>> In SAS I have written a macro, that was splitting the dataset,
>>>> running LR on one half of data and making predictions on second
>>>> half. Then it was collecting the equation coefficients from each
>>>> iteration of bootstrap. Later I was just taking medians of these
>>>> coefficients from all iterations, and used them as an optimal model
>>>> - it really worked well!
>>>
>>> Why not use maximum likelihood estimation, i.e., the coefficients
>>> from the original fit.  How does the bootstrap improve on that?
>>>
>>>>
>>>> Now I want to do the same in R. I tried to use the 'validate' or
>>>> 'calibrate' functions from package "Design", and I also experimented
>>>> with function 'sm.binomial.bootstrap' from package "sm". I tried
>>>> also the function 'boot' from package "boot", though without success
>>>> - in my case it randomly selected _columns_ from my data frame,
>>>> while I wanted it to select _rows_.
>>>
>>> validate and calibrate in Design do resampling on the rows
>>>
>>> Resampling is mainly used to get a nearly unbiased estimate of the
>>> model performance, i.e., to correct for overfitting.
>>>
>>> Frank Harrell
>>>
>>>>
>>>> Though the main point here is the optimized LR equation. I would
>>>> appreciate any help on how to extract the LR equation coefficients
>>>> from any of these bootstrap functions, in the same form as given by
>>>> 'glm' or 'lrm'.
>>>>
>>>> Many thanks in advance!
>>>>
>>>
>>>
>>
>
>

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Coefficients of Logistic Regression from bootstrap - how to get them?

by Doran, Harold :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I think the answer has been given to you. If you want to continue to
ignore that advice and use bootstrap for point estimates rather than the
properties of those estimates (which is what bootstrap is for) then you
are on your own.

> -----Original Message-----
> From: r-help-bounces@...
> [mailto:r-help-bounces@...] On Behalf Of Michal Figurski
> Sent: Tuesday, July 22, 2008 9:52 AM
> To: r-help@...
> Subject: Re: [R] Coefficients of Logistic Regression from
> bootstrap - how to get them?
>
> Dear all,
>
> I don't want to argue with anybody about words or about what
> bootstrap is suitable for - I know too little for that.
>
> All I need is help to get the *equation coefficients*
> optimized by bootstrap - either by one of the functions or by
> simple median.
>
> Please help,
>
> --
> Michal J. Figurski
> HUP, Pathology & Laboratory Medicine
> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce
> St. 7 Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>
> Frank E Harrell Jr wrote:
> > Michal Figurski wrote:
> >> Frank,
> >>
> >> "How does bootstrap improve on that?"
> >>
> >> I don't know, but I have an idea. Since the data in my set
> are just a
> >> small sample of a big population, then if I use my whole
> dataset to
> >> obtain max likelihood estimates, these estimates may be
> best for this
> >> dataset, but far from ideal for the whole population.
> >
> > The bootstrap, being a resampling procedure from your
> sample, has the
> > same issues about the population as MLEs.
> >
> >>
> >> I used bootstrap to virtually increase the size of my dataset, it
> >> should result in estimates more close to that from the
> population -
> >> isn't it the purpose of bootstrap?
> >
> > No
> >
> >>
> >> When I use such median coefficients on another dataset (another
> >> sample from population), the predictions are better, than
> using max
> >> likelihood estimates. I have already tested that and it worked!
> >
> > Then your testing procedure is probably not valid.
> >
> >>
> >> I am not a statistician and I don't feel what
> "overfitting" is, but
> >> it may be just another word for the same idea.
> >>
> >> Nevertheless, I would still like to know how can I get the
> >> coeffcients for the model that gives the "nearly unbiased
> estimates".
> >> I greatly appreciate your help.
> >
> > More info in my book Regression Modeling Strategies.
> >
> > Frank
> >
> >>
> >> --
> >> Michal J. Figurski
> >> HUP, Pathology & Laboratory Medicine
> >> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7
> >> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
> >>
> >> Frank E Harrell Jr wrote:
> >>> Michal Figurski wrote:
> >>>> Hello all,
> >>>>
> >>>> I am trying to optimize my logistic regression model by using
> >>>> bootstrap. I was previously using SAS for this kind of
> tasks, but I
> >>>> am now switching to R.
> >>>>
> >>>> My data frame consists of 5 columns and has 109 rows.
> Each row is a
> >>>> single record composed of the following values: Subject_name,
> >>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three
> >>>> numerics are used to predict outcome using LR.
> >>>>
> >>>> In SAS I have written a macro, that was splitting the dataset,
> >>>> running LR on one half of data and making predictions on second
> >>>> half. Then it was collecting the equation coefficients from each
> >>>> iteration of bootstrap. Later I was just taking medians of these
> >>>> coefficients from all iterations, and used them as an
> optimal model
> >>>> - it really worked well!
> >>>
> >>> Why not use maximum likelihood estimation, i.e., the coefficients
> >>> from the original fit.  How does the bootstrap improve on that?
> >>>
> >>>>
> >>>> Now I want to do the same in R. I tried to use the 'validate' or
> >>>> 'calibrate' functions from package "Design", and I also
> >>>> experimented with function 'sm.binomial.bootstrap' from package
> >>>> "sm". I tried also the function 'boot' from package
> "boot", though
> >>>> without success
> >>>> - in my case it randomly selected _columns_ from my data frame,
> >>>> while I wanted it to select _rows_.
> >>>
> >>> validate and calibrate in Design do resampling on the rows
> >>>
> >>> Resampling is mainly used to get a nearly unbiased
> estimate of the
> >>> model performance, i.e., to correct for overfitting.
> >>>
> >>> Frank Harrell
> >>>
> >>>>
> >>>> Though the main point here is the optimized LR equation. I would
> >>>> appreciate any help on how to extract the LR equation
> coefficients
> >>>> from any of these bootstrap functions, in the same form
> as given by
> >>>> 'glm' or 'lrm'.
> >>>>
> >>>> Many thanks in advance!
> >>>>
> >>>
> >>>
> >>
> >
> >
>
> ______________________________________________
> R-help@... mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Coefficients of Logistic Regression from bootstrap - how to get them?

by Michal Figurski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hmm...

It sounds like ideology to me. I was asking for technical help. I know
what I want to do, just don't know how to do it in R. I'll go back to
SAS then. Thank you.

--
Michal J. Figurski

Doran, Harold wrote:

> I think the answer has been given to you. If you want to continue to
> ignore that advice and use bootstrap for point estimates rather than the
> properties of those estimates (which is what bootstrap is for) then you
> are on your own.
>
>> -----Original Message-----
>> From: r-help-bounces@...
>> [mailto:r-help-bounces@...] On Behalf Of Michal Figurski
>> Sent: Tuesday, July 22, 2008 9:52 AM
>> To: r-help@...
>> Subject: Re: [R] Coefficients of Logistic Regression from
>> bootstrap - how to get them?
>>
>> Dear all,
>>
>> I don't want to argue with anybody about words or about what
>> bootstrap is suitable for - I know too little for that.
>>
>> All I need is help to get the *equation coefficients*
>> optimized by bootstrap - either by one of the functions or by
>> simple median.
>>
>> Please help,
>>
>> --
>> Michal J. Figurski
>> HUP, Pathology & Laboratory Medicine
>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce
>> St. 7 Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>
>> Frank E Harrell Jr wrote:
>>> Michal Figurski wrote:
>>>> Frank,
>>>>
>>>> "How does bootstrap improve on that?"
>>>>
>>>> I don't know, but I have an idea. Since the data in my set
>> are just a
>>>> small sample of a big population, then if I use my whole
>> dataset to
>>>> obtain max likelihood estimates, these estimates may be
>> best for this
>>>> dataset, but far from ideal for the whole population.
>>> The bootstrap, being a resampling procedure from your
>> sample, has the
>>> same issues about the population as MLEs.
>>>
>>>> I used bootstrap to virtually increase the size of my dataset, it
>>>> should result in estimates more close to that from the
>> population -
>>>> isn't it the purpose of bootstrap?
>>> No
>>>
>>>> When I use such median coefficients on another dataset (another
>>>> sample from population), the predictions are better, than
>> using max
>>>> likelihood estimates. I have already tested that and it worked!
>>> Then your testing procedure is probably not valid.
>>>
>>>> I am not a statistician and I don't feel what
>> "overfitting" is, but
>>>> it may be just another word for the same idea.
>>>>
>>>> Nevertheless, I would still like to know how can I get the
>>>> coeffcients for the model that gives the "nearly unbiased
>> estimates".
>>>> I greatly appreciate your help.
>>> More info in my book Regression Modeling Strategies.
>>>
>>> Frank
>>>
>>>> --
>>>> Michal J. Figurski
>>>> HUP, Pathology & Laboratory Medicine
>>>> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7
>>>> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
>>>>
>>>> Frank E Harrell Jr wrote:
>>>>> Michal Figurski wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> I am trying to optimize my logistic regression model by using
>>>>>> bootstrap. I was previously using SAS for this kind of
>> tasks, but I
>>>>>> am now switching to R.
>>>>>>
>>>>>> My data frame consists of 5 columns and has 109 rows.
>> Each row is a
>>>>>> single record composed of the following values: Subject_name,
>>>>>> numeric1, numeric2, numeric3 and outcome (yes or no). All three
>>>>>> numerics are used to predict outcome using LR.
>>>>>>
>>>>>> In SAS I have written a macro, that was splitting the dataset,
>>>>>> running LR on one half of data and making predictions on second
>>>>>> half. Then it was collecting the equation coefficients from each
>>>>>> iteration of bootstrap. Later I was just taking medians of these
>>>>>> coefficients from all iterations, and used them as an
>> optimal model
>>>>>> - it really worked well!
>>>>> Why not use maximum likelihood estimation, i.e., the coefficients
>>>>> from the original fit.  How does the bootstrap improve on that?
>>>>>
>>>>>> Now I want to do the same in R. I tried to use the 'validate' or
>>>>>> 'calibrate' functions from package "Design", and I also
>>>>>> experimented with function 'sm.binomial.bootstrap' from package
>>>>>> "sm". I tried also the function 'boot' from package
>> "boot", though
>>>>>> without success
>>>>>> - in my case it randomly selected _columns_ from my data frame,
>>>>>> while I wanted it to select _rows_.
>>>>> validate and calibrate in Design do resampling on the rows
>>>>>
>>>>> Resampling is mainly used to get a nearly unbiased
>> estimate of the
>>>>> model performance, i.e., to correct for overfitting.
>>>>>
>>>>> Frank Harrell
>>>>>
>>>>>> Though the main point here is the optimized LR equation. I would
>>>>>> appreciate any help on how to extract the LR equation
>> coefficients
>>>>>> from any of these bootstrap functions, in the same form
>> as given by
>>>>>> 'glm' or 'lrm'.
>>>>>>
>>>>>> Many thanks in advance!
>>>>>>
>>>>>
>>>
>> ______________________________________________
>> R-help@... mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Coefficients of Logistic Regression from bootstrap - how to get them?

by Doran, Harold :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Probably a good idea for you. The R help list is useful for both
programming AND statistical advice for those who want it.

 

> -----Original Message-----
> From: Michal Figurski [mailto:figurski@...]
> Sent: Tuesday, July 22, 2008 10:44 AM
> To: Doran, Harold; r-help@...
> Subject: Re: [R] Coefficients of Logistic Regression from
> bootstrap - how to get them?
>
> Hmm...
>
> It sounds like ideology to me. I was asking for technical
> help. I know what I want to do, just don't know how to do it
> in R. I'll go back to SAS then. Thank you.
>
> --
> Michal J. Figurski
>
> Doran, Harold wrote:
> > I think the answer has been given to you. If you want to
> continue to
> > ignore that advice and use bootstrap for point estimates
> rather than
> > the properties of those estimates (which is what bootstrap is for)
> > then you are on your own.
> >
> >> -----Original Message-----
> >> From: r-help-bounces@...
> >> [mailto:r-help-bounces@...] On Behalf Of Michal Figurski
> >> Sent: Tuesday, July 22, 2008 9:52 AM
> >> To: r-help@...
> >> Subject: Re: [R] Coefficients of Logistic Regression from
> bootstrap -
> >> how to get them?
> >>
> >> Dear all,
> >>
> >> I don't want to argue with anybody about words or about what
> >> bootstrap is suitable for - I know too little for that.
> >>
> >> All I need is help to get the *equation coefficients* optimized by
> >> bootstrap - either by one of the functions or by simple median.
> >>
> >> Please help,
> >>
> >> --
> >> Michal J. Figurski
> >> HUP, Pathology & Laboratory Medicine
> >> Xenobiotics Toxicokinetics Research Laboratory 3400 Spruce St. 7
> >> Maloney Philadelphia, PA 19104 tel. (215) 662-3413
> >>
> >> Frank E Harrell Jr wrote:
> >>> Michal Figurski wrote:
> >>>> Frank,
> >>>>
> >>>> "How does bootstrap improve on that?"
> >>>>
> >>>> I don't know, but I have an idea. Since the data in my set
> >> are just a
> >>>> small sample of a big population, then if I use my whole
> >> dataset to
> >>>> obtain max likelihood estimates, these estimates may be
> >> best for this
> >>>> dataset, but far from ideal for the whole population.
> >>> The bootstrap, being a resampling procedure from your
> >> sample, has the
> >>> same issues about the population as MLEs.
> >>>
> >>>> I used bootstrap to virtually increase the size of my
> dataset, it
> >>>> should result in estimates more close to that from the
> >> population -
> >>>> isn't it the purpose of bootstrap?
> >>> No
> >>>
> >>>> When I use such median coefficients on another dataset (another
> >>>> sample from population), the predictions are better, than
> >> using max
> >>>> likelihood estimates. I have already tested that and it worked!
> >>> Then your testing procedure is probably not valid.
> >>>
> >>>> I am not a statistician and I don't feel what
> >> "overfitting" is, but
> >>>> it may be just another word for the same idea.
> >>>>
> >>>> Nevertheless, I would still like to know how can I get the
> >>>> coeffcients for the model that gives the "nearly unbiased
> >> estimates".
> >>>> I greatly appreciate your help.
> >>> More info in my book Regression Modeling Strategies.
> >>>
> >>> Frank
> >>>
> >>>> --
> >>>> Michal J. Figurski
> >>>> HUP, Pathology & Laboratory Medicine Xenobiotics Toxicokinetics
> >>>> Research Laboratory 3400 Spruce St. 7 Maloney Philadelphia, PA
> >>>> 19104 tel. (215) 662-3413
> >>>>
> >>>> Frank E Harrell Jr wrote:
> >>>>> Michal Figurski wrote:
> >>>>>> Hello all,
> >>>>>>
> >>>>>> I am trying to optimize my logistic regression model by using
> >>>>>> bootstrap. I was previously using SAS for this kind of
> >> tasks, but I
> >>>>>> am now switching to R.
> >>>>>>
> >>>>>> My data frame consists of 5 columns and has 109 rows.
> >> Each row is a
> >>>>>> single record composed of the following values: Subject_name,
> >>>>>> numeric1, numeric2, numeric3 and outcome (yes or no).
> All three
> >>>>>> numerics are used to predict outcome using LR.
> >>>>>>
> >>>>>> In SAS I have written a macro, that was splitting the dataset,
> >>>>>> running LR on one half of data and making predictions
> on second
> >>>>>> half. Then it was collecting the equation coefficients
> from each
> >>>>>> iteration of bootstrap. Later I was just taking
> medians of these
> >>>>>> coefficients from all iterations, and used them as an
> >> optimal model
> >>>>>> - it really worked well!
> >>>>> Why not use maximum likelihood estimation, i.e., the
> coefficients
> >>>>> from the original fit.  How does the bootstrap improve on that?
> >>>>>
> >>>>>> Now I want to do the same in R. I tried to use the
> 'validate' or
> >>>>>> 'calibrate' functions from package "Design", and I also
> >>>>>> experimented with function 'sm.binomial.bootstrap'
> from package
> >>>>>> "sm". I tried also the function 'boot' from package
> >> "boot", though
> >>>>>> without success
> >>>>>> - in my case it randomly selected _columns_ from my
> data frame,
> >>>>>> while I wanted it to select _rows_.
> >>>>> validate and calibrate in Design do resampling on the rows
> >>>>>
> >>>>> Resampling is mainly used to get a nearly unbiased
> >> estimate of the
> >>>>> model performance, i.e., to correct for overfitting.
> >>>>>
> >>>>> Frank Harrell
> >>>>>
> >>>>>> Though the main point here is the optimized LR
> equation. I would
> >>>>>> appreciate any help on how to extract the LR equation
> >> coefficients
> >>>>>> from any of these bootstrap functions, in the same form
> >> as given by
> >>>>>> 'glm' or 'lrm'.
> >>>>>>
> >>>>>> Many thanks in advance!
> >>>>>>
> >>>>>
> >>>
> >> ______________________________________________
> >> R-help@... mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>