Reading multiple tables from file

View: New views
3 Messages — Rating Filter:   Alert me  

Reading multiple tables from file

by Gerrit Draisma :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear R-users,
I have output files having a variable number of tables
in the following format:
-------------
1
Pietje
I1 I2 Value
1  1  0.11
1  2  0.12
2  1  0.21

2
Jantje
I1 I2 I3 Value
1  1  1  0.111
3  3  3  0.333
...
-------------

Would there be an easy way
of turning this into (a list of) data.frames
with names Pietje, Jantje
and variables I1,I2,...Value?
(I1,I2 are string or categorical,
Value is double).

I used an sed script to extract the tables
from the file into separte file,
but would rather be able to process
the output files directly.

Thanks,
Gerrit.

--
Gerrit Draisma
Department of Public Health
Erasmus MC, University Medical Center Rotterdam
Room AE-103
P.O. Box 2040 3000 CA  Rotterdam The Netherlands
Phone: +31 10 7043124 Fax: +31 10 010-7038474
http://mgzlx4.erasmusmc.nl/pwp/?gdraisma

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Reading multiple tables from file

by jholtman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Will this do it for you:

> x <- readLines(textConnection("1
+ Pietje
+ I1 I2 Value
+ 1  1  0.11
+ 1  2  0.12
+ 2  1  0.21
+
+ 2
+ Jantje
+ I1 I2 I3 Value
+ 1  1  1  0.111
+ 3  3  3  0.333"))
> closeAllConnections()
> start <- grep("^[[:digit:]]+$", x)
> mark <- vector('integer', length(x))
> mark[start] <- 1
> # determine limits of each table
> mark <- cumsum(mark)
> # split the data for reading
> df <- lapply(split(x, mark), function(.data){
+     .input <- read.table(textConnection(.data), skip=2, header=TRUE)
+     attr(.input, 'name') <- .data[2]  # save the name
+     .input
+ })
> # rename the list
> names(df) <- sapply(df, attr, 'name')
> df
$Pietje
  I1 I2 Value
1  1  1  0.11
2  1  2  0.12
3  2  1  0.21

$Jantje
  I1 I2 I3 Value
1  1  1  1 0.111
2  3  3  3 0.333



On Thu, May 8, 2008 at 7:40 AM, G. Draisma <g.draisma@...> wrote:

> Dear R-users,
> I have output files having a variable number of tables
> in the following format:
> -------------
> 1
> Pietje
> I1 I2 Value
> 1  1  0.11
> 1  2  0.12
> 2  1  0.21
>
> 2
> Jantje
> I1 I2 I3 Value
> 1  1  1  0.111
> 3  3  3  0.333
> ...
> -------------
>
> Would there be an easy way
> of turning this into (a list of) data.frames
> with names Pietje, Jantje
> and variables I1,I2,...Value?
> (I1,I2 are string or categorical,
> Value is double).
>
> I used an sed script to extract the tables
> from the file into separte file,
> but would rather be able to process
> the output files directly.
>
> Thanks,
> Gerrit.
>
> --
> Gerrit Draisma
> Department of Public Health
> Erasmus MC, University Medical Center Rotterdam
> Room AE-103
> P.O. Box 2040 3000 CA  Rotterdam The Netherlands
> Phone: +31 10 7043124 Fax: +31 10 010-7038474
> http://mgzlx4.erasmusmc.nl/pwp/?gdraisma
>
> ______________________________________________
> R-help@... mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: Reading multiple tables from file

by Gerrit Draisma :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Mr Holman,
Thanks very much for your help.
This is what I was looking for.
Gerrit.

on 2008-05-08 20:31 jim holtman said the following:

> Will this do it for you:
>
>> x <- readLines(textConnection("1
> + Pietje
> + I1 I2 Value
> + 1  1  0.11
> + 1  2  0.12
> + 2  1  0.21
> +
> + 2
> + Jantje
> + I1 I2 I3 Value
> + 1  1  1  0.111
> + 3  3  3  0.333"))
>> closeAllConnections()
>> start <- grep("^[[:digit:]]+$", x)
>> mark <- vector('integer', length(x))
>> mark[start] <- 1
>> # determine limits of each table
>> mark <- cumsum(mark)
>> # split the data for reading
>> df <- lapply(split(x, mark), function(.data){
> +     .input <- read.table(textConnection(.data), skip=2, header=TRUE)
> +     attr(.input, 'name') <- .data[2]  # save the name
> +     .input
> + })
>> # rename the list
>> names(df) <- sapply(df, attr, 'name')
>> df
> $Pietje
>   I1 I2 Value
> 1  1  1  0.11
> 2  1  2  0.12
> 3  2  1  0.21
>
> $Jantje
>   I1 I2 I3 Value
> 1  1  1  1 0.111
> 2  3  3  3 0.333
>
>
>
> On Thu, May 8, 2008 at 7:40 AM, G. Draisma <g.draisma@...> wrote:
>> Dear R-users,
>> I have output files having a variable number of tables
>> in the following format:
>> -------------
>> 1
>> Pietje
>> I1 I2 Value
>> 1  1  0.11
>> 1  2  0.12
>> 2  1  0.21
>>
>> 2
>> Jantje
>> I1 I2 I3 Value
>> 1  1  1  0.111
>> 3  3  3  0.333
>> ...
>> -------------
>>
>> Would there be an easy way
>> of turning this into (a list of) data.frames
>> with names Pietje, Jantje
>> and variables I1,I2,...Value?
>> (I1,I2 are string or categorical,
>> Value is double).
>>
>> I used an sed script to extract the tables
>> from the file into separte file,
>> but would rather be able to process
>> the output files directly.
>>
>> Thanks,
>> Gerrit.
>>
>> --
>> Gerrit Draisma
>> Department of Public Health
>> Erasmus MC, University Medical Center Rotterdam
>> Room AE-103
>> P.O. Box 2040 3000 CA  Rotterdam The Netherlands
>> Phone: +31 10 7043124 Fax: +31 10 010-7038474
>> http://mgzlx4.erasmusmc.nl/pwp/?gdraisma
>>
>> ______________________________________________
>> R-help@... mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>

--
Gerrit Draisma
Department of Public Health
Erasmus MC, University Medical Center Rotterdam
Room AE-103
P.O. Box 2040 3000 CA  Rotterdam The Netherlands
Phone: +31 10 7043124 Fax: +31 10 010-7038474
http://mgzlx4.erasmusmc.nl/pwp/?gdraisma

______________________________________________
R-help@... mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.