-
Interactive vizualisation of parametric curves with GGobi and Cranvas
2015-12-23
SourceThis article shows an example of interactive data visualization in R using GGobi and its successor Cranvas. A very interesting feature of these two visualization tools is their ability to link two different datasets, and this is such an example I provide here.
Firstly I am going to fit some parametric curves to a set of data, and then I will show how to create interactive and linked plots of these curves on one graphic and of their parameters on another graphic.
Loading data and fitting the curves
Assume for example we have some data with an input variable in the first column and each other column, corresponding to an experiment, contains the values of a measurement made for each value of the input:
library(dplyr) # I use the dplyr format for a better printing dat0 <- read.csv("./data/GGobiFPLdat0.csv") %>% tbl_df print(dat0, n=5) ## Source: local data frame [12 x 31] ## ## concentration M01 M02 M03 M04 M05 M06 M07 M08 ## (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) ## 1 0.20125000 2.554 2.605 2.784 2.839 2.865 2.926 3.009 3.225 ## 2 0.10062500 2.098 2.059 2.191 2.292 2.361 2.295 2.430 2.464 ## 3 0.05031250 1.392 1.409 1.477 1.444 1.717 1.478 1.697 1.720 ## 4 0.02515625 0.833 0.843 0.939 0.943 0.905 0.893 1.045 1.052 ## 5 0.01257813 0.473 0.453 0.504 0.500 0.572 0.482 0.613 0.592 ## .. ... ... ... ... ... ... ... ... ... ## Variables not shown: M09 (dbl), M10 (dbl), M11 (dbl), M12 (dbl), M13 ## (dbl), M14 (dbl), M15 (dbl), M16 (dbl), M17 (dbl), M18 (dbl), M19 (dbl), ## M20 (dbl), M21 (dbl), M22 (dbl), M23 (dbl), M24 (dbl), M25 (dbl), M26 ## (dbl), M27 (dbl), M28 (dbl), M29 (dbl), M30 (dbl)
In this example we fit a four-parameter logistic curve for each of the outcome columns:
# first melt the data library(tidyr) ldat0 <- dat0 %>% gather(experiment, value, -concentration) # fit library(nlme) fit <- nlsList(value~SSfpl(log(concentration), phi1, phi2, phi3, phi4) | experiment, data=ldat0, na.action=na.omit) Params <- coef(fit) Params %>% glimpse(width=70) ## Observations: 30 ## Variables: 4 ## $ phi1 (dbl) 0.022468669, 0.011273365, 0.009305653, 0.018190190, ... ## $ phi2 (dbl) 3.279374, 3.420839, 3.838880, 3.900045, 3.487184, 4.... ## $ phi3 (dbl) -2.737833, -2.659785, -2.551294, -2.553739, -2.90006... ## $ phi4 (dbl) 0.8823794, 0.9093874, 0.9731057, 0.9425553, 0.841277...
Recall that the four-parameter logistic curve is defined by \[ y = \phi_1 + \frac{\phi_2-\phi_1}{1+\exp\left(\dfrac{\phi_3-x}{\phi_4}\right)}. \] It can be shown on the figure below, taken from the book Mixed Effects Models in S and SPlus by Pinheiro and Bates.
Some softwares such as SAS and SoftmaxPro use a different parametrization of the four-parameter logistic function: instead of \(\phi_1\), \(\phi_2\), \(\phi_3\), \(\phi_4\), they use four parameters denoted by \(A\), \(B\), \(C\), \(D\) and the relations between these two different parameterizations are \[ A=\phi_1, \qquad \frac{1}{B}=\phi_4, \qquad \log C = \phi_3, \qquad D=\phi_2. \] If you prefer this parametrization, then you could do
SSfpl(log(concentration), A, D, logC, inverseB)
in thenlsList
function above.Now, another dataset provides some information about the experiments, such as the date, and we merge it with the fitted parameters:
( dat1 <- read.csv("./data/GGobiFPLdat1.csv", colClasses = c(date="Date")) %>% tbl_df ) ## Source: local data frame [30 x 2] ## ## experiment date ## (fctr) (date) ## 1 M01 6-03-20 ## 2 M02 6-03-20 ## 3 M03 6-03-20 ## 4 M04 6-03-20 ## 5 M05 6-03-20 ## 6 M06 6-03-20 ## 7 M07 13-03-20 ## 8 M08 13-03-20 ## 9 M09 13-03-20 ## 10 M10 13-03-20 ## .. ... ... ( Params <- merge(dat1, Params, by.x="experiment", by.y="row.names") %>% tbl_dt ) ## Source: local data table [30 x 6] ## ## experiment date phi1 phi2 phi3 phi4 ## (fctr) (date) (dbl) (dbl) (dbl) (dbl) ## 1 M01 6-03-20 0.022468669 3.279374 -2.737833 0.8823794 ## 2 M02 6-03-20 0.011273365 3.420839 -2.659785 0.9093874 ## 3 M03 6-03-20 0.009305653 3.838880 -2.551294 0.9731057 ## 4 M04 6-03-20 0.018190190 3.900045 -2.553739 0.9425553 ## 5 M05 6-03-20 0.027261568 3.487184 -2.900066 0.8412771 ## 6 M06 6-03-20 0.015027827 4.016394 -2.518598 0.9164116 ## 7 M07 13-03-20 0.013061392 3.922050 -2.734026 0.9434379 ## 8 M08 13-03-20 0.004771255 4.658362 -2.424616 1.0198936 ## 9 M09 13-03-20 0.008700668 4.304961 -2.611473 0.9498171 ## 10 M10 13-03-20 0.020487158 4.147146 -2.764154 0.9110256 ## .. ... ... ... ... ... ...
Note that I transformed the dataset to a local data table by using
tbl_dt
. This is nice for making the dataset I will use to plot the fitted curves:# four-parameter logistic function fpl <- function(x,phi1,phi2,phi3,phi4){ phi1+(phi2-phi1)/(1+exp((phi3-x)/phi4)) } x <- with(dat0, seq(min(log(concentration)), max(log(concentration)), length.out=25)) Curves <- Params[, c(.SD, list(x=x, y=fpl(x,phi1,phi2,phi3,phi4))), by="experiment"] library(ggplot2); library(scales) ggplot(Curves, aes(x=x, y=y, color=experiment)) + geom_line()
Using GGobi
As we can see, the legend does not allow to identify a curve: there are too many. This is an opportunity to use GGobi with the help of the rggobi package:
library(rggobi) # put x and y in the first and second column ggdata1 <- data.frame(Curves)[, taRifx::shift(seq_along(Curves), -2)] g <- ggobi_longitudinal(ggdata1, id=experiment)
The above line of code opens GGobi. We firstly select
Brush
in theInteraction
menu to get the graphics at left below. Then we selectIdentify
in the same menu and get the graphics at right.Now, let???s look at the parameters in function of the date:
lParams <- Params %>% gather(parameter, value, -experiment, -date) ggplot(lParams, aes(x=date, y=value)) + geom_point() + facet_grid(parameter~., scales="free_y") + scale_x_date(labels = date_format("%m-%Y"))
And let???s plot each parameter vs each other on a scatter matrix:
library(GGally) ggpairs(Params[, list(phi1,phi2,phi3,phi4)])
Similarly, we would like to know which experiment corresponds to one point, or even which curve. So we add the
Params
dataset to GGobi and we open a new display:ggdata2 <- data.frame(Params) # GGobi does not handle the Date format - convert in integer ggdata2 <- transform(ggdata2, ndate=as.integer(date)) g$ggdata2 <- ggdata2 display(g["ggdata2"], vars=list(X="ndate", Y="phi2"))
And now this becomes really fun. We can create several graphics from the two datasets and they are linked to each other:
Using Cranvas
Now let???s do the same with Cranvas instead of GGobi, except that scatter matrices are not implemented yet in Cranvas.
library(cranvas) qdata1 <- qdata(data.frame(Curves)) qdata2 <- qdata(data.frame(Params)) qtime(x, y, data=qdata1, hdiv=experiment, main="Fitted curves") qscatter(date, phi2, data=qdata2, main="phi2 vs date") # link the two datasets id <- link_cat(qdata1, "experiment", qdata2, "experiment")
Contrary to GGobi, there is no standalone Cranvas software. Everything is run from R with the cranvas package, but some keyboard keys act on the graphics, for example to zoom or to change the size of points (see
?qtime
,?qscatter
, etc).