-
The `kantorovich` package
2016-01-11
SourceI have just released a first version of the
kantorovich
package on Github. It is based on the code of my post Using R to compute the Kantorovich distance.This package has two main features:
- It computes the extreme joinings of two probability measures \(\mu\) and \(\nu\) on a finite set;
- It computes the Kantorovich distance between these two measures, for a given distance on their finite state space.
With the help of the
rccd
andgmp
packages, thekantorovich
package can return the exact values of the extreme joinings and of the Kantorovich distance.Quick example
As an example, take \(\mu\) and \(\nu\) the uniform probability measures on a finite set having three elements.
mu <- nu <- c(1/3, 1/3, 1/3)
The
ejoinings
function returns the extreme joinings of \(\mu\) and \(\nu\). In this case these are the \(6!\) permutation matrices:library(kantorovich) ejoinings(mu, nu) ## Message: You should enter mu and nu in rational with the gmp package. ## [[1]] ## 1 2 3 ## 1 0.3333333 0.0000000 0.0000000 ## 2 0.0000000 0.0000000 0.3333333 ## 3 0.0000000 0.3333333 0.0000000 ## ## [[2]] ## 1 2 3 ## 1 0.3333333 0.0000000 0.0000000 ## 2 0.0000000 0.3333333 0.0000000 ## 3 0.0000000 0.0000000 0.3333333 ## ## [[3]] ## 1 2 3 ## 1 0.0000000 0.3333333 0.0000000 ## 2 0.0000000 0.0000000 0.3333333 ## 3 0.3333333 0.0000000 0.0000000 ## ## [[4]] ## 1 2 3 ## 1 0.0000000 0.3333333 0.0000000 ## 2 0.3333333 0.0000000 0.0000000 ## 3 0.0000000 0.0000000 0.3333333 ## ## [[5]] ## 1 2 3 ## 1 0.0000000 0.0000000 0.3333333 ## 2 0.0000000 0.3333333 0.0000000 ## 3 0.3333333 0.0000000 0.0000000 ## ## [[6]] ## 1 2 3 ## 1 0.0000000 0.0000000 0.3333333 ## 2 0.3333333 0.0000000 0.0000000 ## 3 0.0000000 0.3333333 0.0000000
Since
mu
andnu
were unnamed, the vector namesc(1,2,3)
has been automatically assigned to them. The Kantorovich distance between \(\mu\) and \(\nu\) is relative to a given distance on the state space of \(\mu\) and \(\nu\), represented by their vector names. By default, thekantorovich
package takes the discrete \(0\mathrm{-}1\) distance. Obviously the Kantorovich distance is \(0\) on this example, because \(\mu=\nu\).kantorovich(mu, nu) ## Message: You should enter mu and nu in rational with the gmp package. ## [1] 0
Note the message returned by both the
ejoinings
and thekantorovich
functions. In order to get exact results, use rational numbers with thegmp
package:library(gmp) mu <- nu <- as.bigq(c(1,1,1), c(3,3,3)) # shorter: as.bigq(c(1,1,1), 3) ejoinings(mu, nu) ## [[1]] ## 1 2 3 ## 1 "1/3" "0" "0" ## 2 "0" "0" "1/3" ## 3 "0" "1/3" "0" ## ## [[2]] ## 1 2 3 ## 1 "1/3" "0" "0" ## 2 "0" "1/3" "0" ## 3 "0" "0" "1/3" ## ## [[3]] ## 1 2 3 ## 1 "0" "1/3" "0" ## 2 "0" "0" "1/3" ## 3 "1/3" "0" "0" ## ## [[4]] ## 1 2 3 ## 1 "0" "1/3" "0" ## 2 "1/3" "0" "0" ## 3 "0" "0" "1/3" ## ## [[5]] ## 1 2 3 ## 1 "0" "0" "1/3" ## 2 "0" "1/3" "0" ## 3 "1/3" "0" "0" ## ## [[6]] ## 1 2 3 ## 1 "0" "0" "1/3" ## 2 "1/3" "0" "0" ## 3 "0" "1/3" "0"
User-specified distance
Let us try an example with a user-specified distance. Let’s say that the state space of \(\mu\) and \(\nu\) is \(\{a, b, c\}\), and then we use
c("a","b","c")
as the vector names.mu <- as.bigq(c(1,2,4), 7) nu <- as.bigq(c(3,1,5), 9) names(mu) <- names(nu) <- c("a", "b", "c")
Define distance as a matrix
The distance can be specified as a matrix.
Assume the distance \(\rho\) is given by \(\rho(a,b)=1\), \(\rho(a,c)=2\) and \(\rho(b,c)=4\). The
bigq
matrices offered by thegmp
package do not handle dimension names. But, in our example, the distance \(\rho\) takes only integer values, therefore one can use a numerical matrix:M <- matrix( c( c(0, 1, 2), c(1, 0, 4), c(2, 4, 0) ), byrow = TRUE, nrow = 3, dimnames = list(c("a","b","c"), c("a","b","c"))) kantorovich(mu, nu, dist=M) ## Big Rational ('bigq') : ## [1] 13/63
If the distance takes rational values, one can proceed as before with a character matrix:
M <- matrix( c( c("0", "3/13", "2/13"), c("1/13", "0", "4/13"), c("2/13", "4/13", "0") ), byrow = TRUE, nrow = 3, dimnames = list(c("a","b","c"), c("a","b","c"))) kantorovich(mu, nu, dist=M) ## Big Rational ('bigq') : ## [1] 1/63
Define distance as a function
One can enter the distance as a function. In such an example, this does not sound convenient:
rho <- function(x,y){ if(x==y) { return(0) } else { if(x=="a" && y=="b") return(1) if(x=="a" && y=="c") return(2) if(x=="b" && y=="c") return(4) return(rho(y,x)) } } kantorovich(mu, nu, dist=rho) ## Big Rational ('bigq') : ## [1] 13/63
Using a function could be more convenient in the case when the names are numbers:
names(mu) <- names(nu) <- 1:3
But one has to be aware that there are in character mode:
names(mu) ## [1] "1" "2" "3"
Thus, one can define a distance function as follows, for example with \(\rho(x,y)=\frac{x-y}{x+y}\):
rho <- function(x,y){ x <- as.numeric(x); y <- as.numeric(y) return(as.bigq(x-y, x+y)) } kantorovich(mu, nu, dist=rho) ## Big Rational ('bigq') : ## [1] 107/1890
A non-square example
The
kantorovich
package also handles the case whenmu
andnu
have different lengths, such as this example:mu <- as.bigq(c(1,2,4), 7) nu <- as.bigq(c(3,1), 4) names(mu) <- c("a", "b", "c") names(nu) <- c("b", "c") ejoinings(mu, nu) ## Caution: some names of mu and/or nu were missing or not compatible - automatic change ## [[1]] ## b c ## a "1/7" "0" ## b "1/28" "1/4" ## c "4/7" "0" ## ## [[2]] ## b c ## a "1/7" "0" ## b "2/7" "0" ## c "9/28" "1/4" ## ## [[3]] ## b c ## a "0" "1/7" ## b "5/28" "3/28" ## c "4/7" "0" ## ## [[4]] ## b c ## a "0" "1/7" ## b "2/7" "0" ## c "13/28" "3/28" kantorovich(mu, nu) ## Caution: some names of mu and/or nu were missing or not compatible - automatic change ## Big Rational ('bigq') : ## [1] 13/28