|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
question on zoo data manipulationHi Zoo-experts,
I am working on the data-set below. Ticker Date BrokerName Acc_Yr Measure lag XXX 20080320 BRK1 200806 2.2 0 XXX 20080320 BRK1 200906 2.5 0 XXX 20080320 BRK2 200806 2.3 0 XXX 20080320 BRK2 200906 2.8 0 XXX 20080320 BRK3 200806 3.3 0 XXX 20080218 BRK1 200806 2.2 1 XXX 20080218 BRK1 200906 2.5 1 XXX 20080218 BRK2 200806 2.4 1 XXX 20080218 BRK2 200906 2.8 1 Using zoo object, Is there a quicker/efficient way of manipulating the data as per following criteria? 1) For any given date/lag - compute mean of column "measure" grouped by different broker & different accounting year? so the output data-set should look like: Ticker Date Mean Measure Acc_Yr Lag XXX 20080320 2.6 200806 0 2) For any lag >= 1, calculate returns on aggregate "measure" constrained on "intersection" of broker-name across lag 0 & lag 1 (so BRK3 should drop out) ? i.e: the intermediate data-set should look like Ticker Date Mean Measure Acc_Yr Lag XXX 20080320 2.25 200806 0 XXX 20080318 2.3 200806 1 Note that for 200806, the mean changes from 2.6 as measured above to 2.25 (since BRK3 is dropped in calculation. The final data-set should then be: Ticker Date Pct_Change Acc_Yr Lag XXX 20080218 0.02 200806 1 -------------------- I can accomplish the results using a combination of tapply & subsetting the data-set for each lag but I thought this kind of data-structure is ideal for zoo manipulation, hence the help request. Thanks in Advance. Manoj _______________________________________________ R-SIG-Finance@... mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first. |
|
|
Re: question on zoo data manipulationOn Mon, 14 Apr 2008, Manoj wrote:
> Hi Zoo-experts, > I am working on the data-set below. > > Ticker Date BrokerName Acc_Yr Measure lag > XXX 20080320 BRK1 200806 2.2 0 > XXX 20080320 BRK1 200906 2.5 0 > XXX 20080320 BRK2 200806 2.3 0 > XXX 20080320 BRK2 200906 2.8 0 > XXX 20080320 BRK3 200806 3.3 0 > XXX 20080218 BRK1 200806 2.2 1 > XXX 20080218 BRK1 200906 2.5 1 > XXX 20080218 BRK2 200806 2.4 1 > XXX 20080218 BRK2 200906 2.8 1 The data is not really a straightforward time series but has more structure, like a panel data set. Hence, I wouldn't try to represent it in zoo in its un-aggregated form. Instead I would put it into a data.frame using appropriate classes for the colums, e.g., "Date" for the Date and "factor" for the BrokerName etc. Then I would use the aggregate() method for data.frames to accomplish the aggregation you look for. You can then collect various aggregations of your data in a zoo object (if you've got unique Dates after aggregation). hth, Z > > > Using zoo object, Is there a quicker/efficient way of manipulating the > data as per following criteria? > > 1) For any given date/lag - compute mean of column "measure" grouped > by different broker & different accounting year? > so the output data-set should look like: > > Ticker Date Mean Measure Acc_Yr Lag > XXX 20080320 2.6 200806 0 > > 2) For any lag >= 1, calculate returns on aggregate "measure" > constrained on "intersection" of broker-name across lag 0 & lag 1 (so > BRK3 should drop out) ? > > i.e: the intermediate data-set should look like > > Ticker Date Mean Measure Acc_Yr Lag > XXX 20080320 2.25 200806 0 > XXX 20080318 2.3 200806 1 > > > Note that for 200806, the mean changes from 2.6 as measured above to > 2.25 (since BRK3 is dropped in calculation. The final data-set should > then be: > > Ticker Date Pct_Change Acc_Yr Lag > XXX 20080218 0.02 200806 1 > > -------------------- > > I can accomplish the results using a combination of tapply & > subsetting the data-set for each lag but I thought this kind of > data-structure is ideal for zoo manipulation, hence the help request. > > Thanks in Advance. > > Manoj > > _______________________________________________ > R-SIG-Finance@... mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. > -- If you want to post, subscribe first. > > _______________________________________________ R-SIG-Finance@... mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first. |
|
|
Re: question on zoo data manipulationIs lag always 0 or 1?
On Mon, Apr 14, 2008 at 8:30 AM, Manoj <manojsw@...> wrote: > Hi Zoo-experts, > I am working on the data-set below. > > Ticker Date BrokerName Acc_Yr Measure lag > XXX 20080320 BRK1 200806 2.2 0 > XXX 20080320 BRK1 200906 2.5 0 > XXX 20080320 BRK2 200806 2.3 0 > XXX 20080320 BRK2 200906 2.8 0 > XXX 20080320 BRK3 200806 3.3 0 > XXX 20080218 BRK1 200806 2.2 1 > XXX 20080218 BRK1 200906 2.5 1 > XXX 20080218 BRK2 200806 2.4 1 > XXX 20080218 BRK2 200906 2.8 1 > > > > Using zoo object, Is there a quicker/efficient way of manipulating the > data as per following criteria? > > 1) For any given date/lag - compute mean of column "measure" grouped > by different broker & different accounting year? > so the output data-set should look like: > > Ticker Date Mean Measure Acc_Yr Lag > XXX 20080320 2.6 200806 0 > > 2) For any lag >= 1, calculate returns on aggregate "measure" > constrained on "intersection" of broker-name across lag 0 & lag 1 (so > BRK3 should drop out) ? > > i.e: the intermediate data-set should look like > > Ticker Date Mean Measure Acc_Yr Lag > XXX 20080320 2.25 200806 0 > XXX 20080318 2.3 200806 1 > > > Note that for 200806, the mean changes from 2.6 as measured above to > 2.25 (since BRK3 is dropped in calculation. The final data-set should > then be: > > Ticker Date Pct_Change Acc_Yr Lag > XXX 20080218 0.02 200806 1 > > -------------------- > > I can accomplish the results using a combination of tapply & > subsetting the data-set for each lag but I thought this kind of > data-structure is ideal for zoo manipulation, hence the help request. > > Thanks in Advance. > > Manoj > > _______________________________________________ > R-SIG-Finance@... mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. > -- If you want to post, subscribe first. > _______________________________________________ R-SIG-Finance@... mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first. |
|
|
Re: question on zoo data manipulationThanks for your suggestion Achim.
Gabor, No lag takes a value range of 0 ~ 12 - and is tied to date. 20080320 take 0, a one month lag take the value of 20080218 and so on. Please let me know if you need more info. Many thanks. Manoj On 4/15/08, Gabor Grothendieck <ggrothendieck@...> wrote: > Is lag always 0 or 1? > > On Mon, Apr 14, 2008 at 8:30 AM, Manoj <manojsw@...> wrote: > > Hi Zoo-experts, > > I am working on the data-set below. > > > > Ticker Date BrokerName Acc_Yr Measure lag > > XXX 20080320 BRK1 200806 2.2 0 > > XXX 20080320 BRK1 200906 2.5 0 > > XXX 20080320 BRK2 200806 2.3 0 > > XXX 20080320 BRK2 200906 2.8 0 > > XXX 20080320 BRK3 200806 3.3 0 > > XXX 20080218 BRK1 200806 2.2 1 > > XXX 20080218 BRK1 200906 2.5 1 > > XXX 20080218 BRK2 200806 2.4 1 > > XXX 20080218 BRK2 200906 2.8 1 > > > > > > > > Using zoo object, Is there a quicker/efficient way of manipulating the > > data as per following criteria? > > > > 1) For any given date/lag - compute mean of column "measure" grouped > > by different broker & different accounting year? > > so the output data-set should look like: > > > > Ticker Date Mean Measure Acc_Yr Lag > > XXX 20080320 2.6 200806 0 > > > > 2) For any lag >= 1, calculate returns on aggregate "measure" > > constrained on "intersection" of broker-name across lag 0 & lag 1 (so > > BRK3 should drop out) ? > > > > i.e: the intermediate data-set should look like > > > > Ticker Date Mean Measure Acc_Yr Lag > > XXX 20080320 2.25 200806 0 > > XXX 20080318 2.3 200806 1 > > > > > > Note that for 200806, the mean changes from 2.6 as measured above to > > 2.25 (since BRK3 is dropped in calculation. The final data-set should > > then be: > > > > Ticker Date Pct_Change Acc_Yr Lag > > XXX 20080218 0.02 200806 1 > > > > -------------------- > > > > I can accomplish the results using a combination of tapply & > > subsetting the data-set for each lag but I thought this kind of > > data-structure is ideal for zoo manipulation, hence the help request. > > > > Thanks in Advance. > > > > Manoj > > > > _______________________________________________ > > R-SIG-Finance@... mailing list > > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > > -- Subscriber-posting only. > > -- If you want to post, subscribe first. > > > _______________________________________________ R-SIG-Finance@... mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first. |
|
|
Re: question on zoo data manipulationThis does not really use zoo in any significant way since it
does not become a time series until the last line of f but here is a solution that does use zoo in that one last line. We use by to split up the data frame with a function f. In f, br intersects the lag0 and lag1 brokers and then we subset x according to those lines having a broker in br. We then take the means, convert the series to zoo and perform diff.zoo on it. There are some aspects of the problem that were not defined such as whether to use lag 1 to compare lag 3 if there is no lag 2 and we did that here but that could be changed by using the lag as the time index. We have also dumped out the data frame, DF, using dput to make it easier to reproduce the solution. DF <- structure(list(Ticker = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "XXX", class = "factor"), Date = c(20080320L, 20080320L, 20080320L, 20080320L, 20080320L, 20080218L, 20080218L, 20080218L, 20080218L), BrokerName = structure(c(1L, 1L, 2L, 2L, 3L, 1L, 1L, 2L, 2L), .Label = c("BRK1", "BRK2", "BRK3"), class = "factor"), Acc_Yr = c(200806L, 200906L, 200806L, 200906L, 200806L, 200806L, 200906L, 200806L, 200906L), Measure = c(2.2, 2.5, 2.3, 2.8, 3.3, 2.2, 2.5, 2.4, 2.8), lag = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L)), .Names = c("Ticker", "Date", "BrokerName", "Acc_Yr", "Measure", "lag"), class = "data.frame", row.names = c(NA, -9L)) library(zoo) f <- function(x) { br <- intersect(x[x$lag == 0, "BrokerName"], x[x$lag == 1, "BrokerName"]) sb <- subset(x, BrokerName %in% br) ag <- aggregate(sb["Measure"], sb[c(1, 2, 4, 6)], mean) transform(tail(ag, -1), Measure = coredata(diff(zoo(ag$Measure), arithmetic = FALSE) - 1)) } do.call("rbind", by(DF, DF[c(1, 4)], f)) On Mon, Apr 14, 2008 at 8:30 AM, Manoj <manojsw@...> wrote: > Hi Zoo-experts, > I am working on the data-set below. > > Ticker Date BrokerName Acc_Yr Measure lag > XXX 20080320 BRK1 200806 2.2 0 > XXX 20080320 BRK1 200906 2.5 0 > XXX 20080320 BRK2 200806 2.3 0 > XXX 20080320 BRK2 200906 2.8 0 > XXX 20080320 BRK3 200806 3.3 0 > XXX 20080218 BRK1 200806 2.2 1 > XXX 20080218 BRK1 200906 2.5 1 > XXX 20080218 BRK2 200806 2.4 1 > XXX 20080218 BRK2 200906 2.8 1 > > > > Using zoo object, Is there a quicker/efficient way of manipulating the > data as per following criteria? > > 1) For any given date/lag - compute mean of column "measure" grouped > by different broker & different accounting year? > so the output data-set should look like: > > Ticker Date Mean Measure Acc_Yr Lag > XXX 20080320 2.6 200806 0 > > 2) For any lag >= 1, calculate returns on aggregate "measure" > constrained on "intersection" of broker-name across lag 0 & lag 1 (so > BRK3 should drop out) ? > > i.e: the intermediate data-set should look like > > Ticker Date Mean Measure Acc_Yr Lag > XXX 20080320 2.25 200806 0 > XXX 20080318 2.3 200806 1 > > > Note that for 200806, the mean changes from 2.6 as measured above to > 2.25 (since BRK3 is dropped in calculation. The final data-set should > then be: > > Ticker Date Pct_Change Acc_Yr Lag > XXX 20080218 0.02 200806 1 > > -------------------- > > I can accomplish the results using a combination of tapply & > subsetting the data-set for each lag but I thought this kind of > data-structure is ideal for zoo manipulation, hence the help request. > > Thanks in Advance. > > Manoj > > _______________________________________________ > R-SIG-Finance@... mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. > -- If you want to post, subscribe first. > _______________________________________________ R-SIG-Finance@... mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first. |
|
|
Re: question on zoo data manipulationI thought about this one a bit more and have two
additional solutions. One uses zoo more intensively by forming the "time" index out of the merge keys. This relies on the fact that zoo can use any class with certain methods, not just the usual time/date classes. However, I think that the best wayof thinking about this problem is from an SQL viewpoint since its basically just a three way self merge followed by an aggregation and the entire thing can be done in a single SQL statement (although it spans several lines). Solution 1 is our prior minimally zoo solution, solution 2 is the much more zoo-ish solution and solution 3 uses sqldf to implement it in SQL. DF <- structure(list(Ticker = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "XXX", class = "factor"), Date = c(20080320L, 20080320L, 20080320L, 20080320L, 20080320L, 20080218L, 20080218L, 20080218L, 20080218L), BrokerName = structure(c(1L, 1L, 2L, 2L, 3L, 1L, 1L, 2L, 2L), .Label = c("BRK1", "BRK2", "BRK3"), class = "factor"), Acc_Yr = c(200806L, 200906L, 200806L, 200906L, 200806L, 200806L, 200906L, 200806L, 200906L), Measure = c(2.2, 2.5, 2.3, 2.8, 3.3, 2.2, 2.5, 2.4, 2.8), lag = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L)), .Names = c("Ticker", "Date", "BrokerName", "Acc_Yr", "Measure", "lag"), class = "data.frame", row.names = c(NA, -9L)) # zoo solution 1 library(zoo) f <- function(x) { br <- intersect(x[x$lag == 0, "BrokerName"], x[x$lag == 1, "BrokerName"]) sb <- subset(x, BrokerName %in% br) ag <- aggregate(sb["Measure"], sb[c(1, 2, 4, 6)], mean) transform(tail(ag, -1), Measure = coredata(diff(zoo(ag$Measure), arithmetic = FALSE) - 1)) } do.call("rbind", by(DF, DF[c(1, 4)], f)) # zoo solution 2 library(zoo) z <- zoo(DF$Measure, apply(DF[c(1, 3, 4, 6)], 1, paste, collapse = ":")) zl <- zoo(DF$Measure, apply(transform(DF[c(1, 3, 4, 6)], lag = lag+1), 1, paste, collapse = ":")) zm <- merge(z, zl, all = FALSE) z01 <- zm[sub(":[0-9]*$", ":1", time(zm)) %in% time(zm)] transform(aggregate(z01, sub(":[^:]*", "", time(z01)), mean), Change = z/zl-1) # solution 3 - sqldf library(sqldf) sqldf("select Ticker, Date__1, Acc_Yr, lag, avg(Measure)/avg(Mprev)-1 Change from (select y.*, x.Measure Mprev from DF x, DF y, DF z where x.Ticker = y.Ticker and x.Acc_Yr = y.Acc_Yr and x.BrokerName = y.BrokerName and x.lag = y.lag - 1 and x.Ticker = z.Ticker and x.Acc_Yr = z.Acc_Yr and x.BrokerName = z.BrokerName and z.lag = 1) group by Ticker, Acc_Yr, lag") On Tue, Apr 15, 2008 at 8:56 AM, Gabor Grothendieck <ggrothendieck@...> wrote: > This does not really use zoo in any significant way since it > does not become a time series until the last line of f but > here is a solution that does use zoo in that one last line. > > We use by to split up the data frame with a function f. > In f, br intersects the lag0 and lag1 brokers and then we > subset x according to those lines having a broker in br. > We then take the means, convert the series to zoo and > perform diff.zoo on it. > > There are some aspects of the problem that were not defined > such as whether to use lag 1 to compare lag 3 if there is no > lag 2 and we did that here but that could be changed by using > the lag as the time index. > > We have also dumped out the data frame, DF, using dput to make > it easier to reproduce the solution. > > DF <- structure(list(Ticker = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L), .Label = "XXX", class = "factor"), Date = c(20080320L, > 20080320L, 20080320L, 20080320L, 20080320L, 20080218L, 20080218L, > 20080218L, 20080218L), BrokerName = structure(c(1L, 1L, 2L, 2L, > 3L, 1L, 1L, 2L, 2L), .Label = c("BRK1", "BRK2", "BRK3"), class = "factor"), > Acc_Yr = c(200806L, 200906L, 200806L, 200906L, 200806L, 200806L, > 200906L, 200806L, 200906L), Measure = c(2.2, 2.5, 2.3, 2.8, > 3.3, 2.2, 2.5, 2.4, 2.8), lag = c(0L, 0L, 0L, 0L, 0L, 1L, > 1L, 1L, 1L)), .Names = c("Ticker", "Date", "BrokerName", > "Acc_Yr", "Measure", "lag"), class = "data.frame", row.names = c(NA, > -9L)) > > library(zoo) > > f <- function(x) { > br <- intersect(x[x$lag == 0, "BrokerName"], x[x$lag == 1, "BrokerName"]) > sb <- subset(x, BrokerName %in% br) > ag <- aggregate(sb["Measure"], sb[c(1, 2, 4, 6)], mean) > transform(tail(ag, -1), Measure = > coredata(diff(zoo(ag$Measure), arithmetic = FALSE) - 1)) > } > do.call("rbind", by(DF, DF[c(1, 4)], f)) > > > On Mon, Apr 14, 2008 at 8:30 AM, Manoj <manojsw@...> wrote: > > > Hi Zoo-experts, > > I am working on the data-set below. > > > > Ticker Date BrokerName Acc_Yr Measure lag > > XXX 20080320 BRK1 200806 2.2 0 > > XXX 20080320 BRK1 200906 2.5 0 > > XXX 20080320 BRK2 200806 2.3 0 > > XXX 20080320 BRK2 200906 2.8 0 > > XXX 20080320 BRK3 200806 3.3 0 > > XXX 20080218 BRK1 200806 2.2 1 > > XXX 20080218 BRK1 200906 2.5 1 > > XXX 20080218 BRK2 200806 2.4 1 > > XXX 20080218 BRK2 200906 2.8 1 > > > > > > > > Using zoo object, Is there a quicker/efficient way of manipulating the > > data as per following criteria? > > > > 1) For any given date/lag - compute mean of column "measure" grouped > > by different broker & different accounting year? > > so the output data-set should look like: > > > > Ticker Date Mean Measure Acc_Yr Lag > > XXX 20080320 2.6 200806 0 > > > > 2) For any lag >= 1, calculate returns on aggregate "measure" > > constrained on "intersection" of broker-name across lag 0 & lag 1 (so > > BRK3 should drop out) ? > > > > i.e: the intermediate data-set should look like > > > > Ticker Date Mean Measure Acc_Yr Lag > > XXX 20080320 2.25 200806 0 > > XXX 20080318 2.3 200806 1 > > > > > > Note that for 200806, the mean changes from 2.6 as measured above to > > 2.25 (since BRK3 is dropped in calculation. The final data-set should > > then be: > > > > Ticker Date Pct_Change Acc_Yr Lag > > XXX 20080218 0.02 200806 1 > > > > -------------------- > > > > I can accomplish the results using a combination of tapply & > > subsetting the data-set for each lag but I thought this kind of > > data-structure is ideal for zoo manipulation, hence the help request. > > > > Thanks in Advance. > > > > Manoj > > > > _______________________________________________ > > R-SIG-Finance@... mailing list > > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > > -- Subscriber-posting only. > > -- If you want to post, subscribe first. > > > _______________________________________________ R-SIG-Finance@... mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first. |
|
|
Re: question on zoo data manipulationThanks a lot Gabor!
I was playing around with sqldf yesterday and thought of the same only to find a solution on those lines - much apppreciated! Cheers Manoj On 4/17/08, Gabor Grothendieck <ggrothendieck@...> wrote: > I thought about this one a bit more and have two > additional solutions. One uses zoo more intensively > by forming the "time" index out of the merge keys. > This relies on the fact that zoo can use any class > with certain methods, not just the usual time/date > classes. However, I think that the best wayof thinking > about this problem is from an SQL viewpoint since its > basically just a three way self merge followed by > an aggregation and the entire thing can be done > in a single SQL statement (although it spans several > lines). Solution 1 is our prior minimally zoo solution, > solution 2 is the much more zoo-ish solution and > solution 3 uses sqldf to implement it in SQL. > > DF <- structure(list(Ticker = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L), .Label = "XXX", class = "factor"), Date = c(20080320L, > 20080320L, 20080320L, 20080320L, 20080320L, 20080218L, 20080218L, > 20080218L, 20080218L), BrokerName = structure(c(1L, 1L, 2L, 2L, > 3L, 1L, 1L, 2L, 2L), .Label = c("BRK1", "BRK2", "BRK3"), class = "factor"), > Acc_Yr = c(200806L, 200906L, 200806L, 200906L, 200806L, 200806L, > 200906L, 200806L, 200906L), Measure = c(2.2, 2.5, 2.3, 2.8, > 3.3, 2.2, 2.5, 2.4, 2.8), lag = c(0L, 0L, 0L, 0L, 0L, 1L, > 1L, 1L, 1L)), .Names = c("Ticker", "Date", "BrokerName", > "Acc_Yr", "Measure", "lag"), class = "data.frame", row.names = c(NA, > -9L)) > > > # zoo solution 1 > > library(zoo) > > f <- function(x) { > br <- intersect(x[x$lag == 0, "BrokerName"], x[x$lag == 1, "BrokerName"]) > sb <- subset(x, BrokerName %in% br) > ag <- aggregate(sb["Measure"], sb[c(1, 2, 4, 6)], mean) > transform(tail(ag, -1), Measure = > coredata(diff(zoo(ag$Measure), arithmetic = FALSE) - 1)) > } > do.call("rbind", by(DF, DF[c(1, 4)], f)) > > # zoo solution 2 > > library(zoo) > > z <- zoo(DF$Measure, apply(DF[c(1, 3, 4, 6)], 1, paste, collapse = ":")) > > zl <- zoo(DF$Measure, > apply(transform(DF[c(1, 3, 4, 6)], lag = lag+1), 1, paste, collapse = ":")) > > zm <- merge(z, zl, all = FALSE) > > z01 <- zm[sub(":[0-9]*$", ":1", time(zm)) %in% time(zm)] > > transform(aggregate(z01, sub(":[^:]*", "", time(z01)), mean), Change = z/zl-1) > > # solution 3 - sqldf > > library(sqldf) > sqldf("select Ticker, Date__1, Acc_Yr, lag, avg(Measure)/avg(Mprev)-1 Change > from (select y.*, x.Measure Mprev from DF x, DF y, DF z > where x.Ticker = y.Ticker and x.Acc_Yr = y.Acc_Yr > and x.BrokerName = y.BrokerName and x.lag = y.lag - 1 > and x.Ticker = z.Ticker and x.Acc_Yr = z.Acc_Yr > and x.BrokerName = z.BrokerName and z.lag = 1) > group by Ticker, Acc_Yr, lag") > > > > On Tue, Apr 15, 2008 at 8:56 AM, Gabor Grothendieck > <ggrothendieck@...> wrote: > > This does not really use zoo in any significant way since it > > does not become a time series until the last line of f but > > here is a solution that does use zoo in that one last line. > > > > We use by to split up the data frame with a function f. > > In f, br intersects the lag0 and lag1 brokers and then we > > subset x according to those lines having a broker in br. > > We then take the means, convert the series to zoo and > > perform diff.zoo on it. > > > > There are some aspects of the problem that were not defined > > such as whether to use lag 1 to compare lag 3 if there is no > > lag 2 and we did that here but that could be changed by using > > the lag as the time index. > > > > We have also dumped out the data frame, DF, using dput to make > > it easier to reproduce the solution. > > > > DF <- structure(list(Ticker = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, > > 1L, 1L), .Label = "XXX", class = "factor"), Date = c(20080320L, > > 20080320L, 20080320L, 20080320L, 20080320L, 20080218L, 20080218L, > > 20080218L, 20080218L), BrokerName = structure(c(1L, 1L, 2L, 2L, > > 3L, 1L, 1L, 2L, 2L), .Label = c("BRK1", "BRK2", "BRK3"), class = "factor"), > > Acc_Yr = c(200806L, 200906L, 200806L, 200906L, 200806L, 200806L, > > 200906L, 200806L, 200906L), Measure = c(2.2, 2.5, 2.3, 2.8, > > 3.3, 2.2, 2.5, 2.4, 2.8), lag = c(0L, 0L, 0L, 0L, 0L, 1L, > > 1L, 1L, 1L)), .Names = c("Ticker", "Date", "BrokerName", > > "Acc_Yr", "Measure", "lag"), class = "data.frame", row.names = c(NA, > > -9L)) > > > > library(zoo) > > > > f <- function(x) { > > br <- intersect(x[x$lag == 0, "BrokerName"], x[x$lag == 1, "BrokerName"]) > > sb <- subset(x, BrokerName %in% br) > > ag <- aggregate(sb["Measure"], sb[c(1, 2, 4, 6)], mean) > > transform(tail(ag, -1), Measure = > > coredata(diff(zoo(ag$Measure), arithmetic = FALSE) - 1)) > > } > > do.call("rbind", by(DF, DF[c(1, 4)], f)) > > > > > > On Mon, Apr 14, 2008 at 8:30 AM, Manoj <manojsw@...> wrote: > > > > > Hi Zoo-experts, > > > I am working on the data-set below. > > > > > > Ticker Date BrokerName Acc_Yr Measure lag > > > XXX 20080320 BRK1 200806 2.2 0 > > > XXX 20080320 BRK1 200906 2.5 0 > > > XXX 20080320 BRK2 200806 2.3 0 > > > XXX 20080320 BRK2 200906 2.8 0 > > > XXX 20080320 BRK3 200806 3.3 0 > > > XXX 20080218 BRK1 200806 2.2 1 > > > XXX 20080218 BRK1 200906 2.5 1 > > > XXX 20080218 BRK2 200806 2.4 1 > > > XXX 20080218 BRK2 200906 2.8 1 > > > > > > > > > > > > Using zoo object, Is there a quicker/efficient way of manipulating the > > > data as per following criteria? > > > > > > 1) For any given date/lag - compute mean of column "measure" grouped > > > by different broker & different accounting year? > > > so the output data-set should look like: > > > > > > Ticker Date Mean Measure Acc_Yr Lag > > > XXX 20080320 2.6 200806 0 > > > > > > 2) For any lag >= 1, calculate returns on aggregate "measure" > > > constrained on "intersection" of broker-name across lag 0 & lag 1 (so > > > BRK3 should drop out) ? > > > > > > i.e: the intermediate data-set should look like > > > > > > Ticker Date Mean Measure Acc_Yr Lag > > > XXX 20080320 2.25 200806 0 > > > XXX 20080318 2.3 200806 1 > > > > > > > > > Note that for 200806, the mean changes from 2.6 as measured above to > > > 2.25 (since BRK3 is dropped in calculation. The final data-set should > > > then be: > > > > > > Ticker Date Pct_Change Acc_Yr Lag > > > XXX 20080218 0.02 200806 1 > > > > > > -------------------- > > > > > > I can accomplish the results using a combination of tapply & > > > subsetting the data-set for each lag but I thought this kind of > > > data-structure is ideal for zoo manipulation, hence the help request. > > > > > > Thanks in Advance. > > > > > > Manoj > > > > > > _______________________________________________ > > > R-SIG-Finance@... mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > > > -- Subscriber-posting only. > > > -- If you want to post, subscribe first. > > > > > > _______________________________________________ R-SIG-Finance@... mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. -- If you want to post, subscribe first. |
| Free Forum Powered by Nabble | Forum Help |