Information-theoretic functions

View: New views
2 Messages — Rating Filter:   Alert me  

Information-theoretic functions

by Joseph Wakeling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello all,

For my own private work with Octave I have prepared a set of
information-theoretic functions which I thought I would offer to the
community.

>From browsing the archives I recognise that another user recently
contributed a similar set of functions, but the .tar.gz attachment was
not available, so I could not compare.  I would very much like to see
those if possible.

My own functions include one which I think was not in the earlier
bundle, to calculate the information gain ratio or uncertainty coefficient.

One problem, which I'm not sure how to get round, is in the main
information entropy function: it requires vectors at input, but at
present only works with row or column vectors, i.e. not with vectors
where the active dimension is > 2.

The functions are:

        infoentr(x,y)
        # if one input, calculates info entropy of sequence x.
        # if two inputs, calculates joint entropy of sequences x and y.

        condentr(x,y)
        # calculates entropy of x conditional on y

        mutualinfo(x,y)
        # calculates the mutual information of two sequences x and y.
        # note that this is symmetric in its inputs. :-)

        infogain(x,y)
        # calculates the information gain ratio of x conditional on y.

I hope these are relevant and would welcome comments on what if anything
needs to be done to bring them up to scratch as serious Octave
functions.  I suspect the existing contributed bundle is far superior,
but I thought I'd give people the opportunity to review these things.

Best wishes,

    -- Joe


function H = infoentr(x,y)

# If just one input, calculates Shannon Information Entropy
# of the sequence x:
#      H(X) = \sum_{x \in X} p(x) log2(1/p(x))
#
# If two inputs, calculates joint entropy of the concurrent
# sequences x and y:
#    H(X,Y) = \sum_{x \in X, y \in Y} p(x,y) log2(1/p(x,y))

if(nargin<1 || nargin>2)
        usage("infoentr(x,y)")
endif

if(nargin==2)
        if((rows(x)~=rows(y)) || (columns(x)~=columns(y)))
                error("Arguments do not have same dimension.")
        endif
endif

# We check that first argument is a vector, and
# if necessary convert to row vector.
if(columns(x)==1)
        x = x'
elseif(rows(x)~=1)
        error("First argument is not a vector.");
endif


if(nargin==1)
        X = create_set(x);
        Nx = length(X);
       
        # Calculate probability Pr(x)
        for i=1:Nx
                Pr(i) = sum(x==X(i));
        endfor
        if(sum(Pr) ~= length(x))
                fprintf(stdout,"Sum is wrong.\n");
        endif
        Pr = Pr/length(x);
       
        # Calculate Shannon information content h(x) = log2(1/Pr(x))
        h = log2(1 ./ Pr);
        h(find(h==Inf)) = 0;
        H = sum(Pr .* h);
else
        # Ensure that the second argument is a vector, and
        # if necessary convert to row vector.  Actually
        # this is probably taken care of by the check on
        # dimension agreement and the check on x above. :-)
        if(columns(y)==1)
                y = y'
        elseif(rows(y)~=1)
                error("Second argument is not a vector.");
        endif
       
        X = create_set(x);
        Y = create_set(y);
        Nx = length(X);
        Ny = length(Y);
       
        # Calculate joint probability Pr(x,y)
        for i=1:Nx
                for j=1:Ny
                        Pr(i,j) = (x==X(i))*(y==Y(j))';
                endfor
        endfor
        if sum(sum(Pr)) ~= length(x)
                fprintf(stdout,"Sum is wrong.\n");
        endif
        Pr = Pr/length(x);
       
        # Calculate Shannon information content h(x,y) = log2(1/Pr(x,y))
        h = log2(1 ./ Pr);
        h(find(h==Inf)) = 0;
       
        H = sum(sum(Pr .* h));
endif

function Hcond = condentr(x,y)

# Calculates information entropy of the sequence x
# conditional on the sequence y:
#      H(X|Y) = H(X,Y) - H(Y)

if nargin!=2
        usage("condentr(x,y)")
endif

Hcond = infoentr(x,y) - infoentr(y);

function I = mutualinfo(x,y)

# Calculates mutual information of the sequences x and y:
#      I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) = I(Y;X)

if nargin!=2
        usage("mutualinfo(x,y)")
endif

I = infoentr(x) - condentr(x,y);

function IGR = infogain(x,y)

# Gives the information gain ratio (also known as the
# `uncertainty coefficient') of the sequence x
# conditional on y:
#        I(X|Y) = I(X;Y)/H(X)

if nargin!=2
        usage("infogain(x,y)")
endif

IGR = mutualinfo(x,y)/infoentr(x);

# Could also do
# IGR = 1 - condentr(x,y)/infoentr(x);

_______________________________________________
Octave-sources mailing list
Octave-sources@...
https://www.cae.wisc.edu/mailman/listinfo/octave-sources

Re: Information-theoretic functions

by David Bateman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Joseph Wakeling wrote:

> Hello all,
>
> For my own private work with Octave I have prepared a set of
> information-theoretic functions which I thought I would offer to the
> community.
>
> >From browsing the archives I recognise that another user recently
> contributed a similar set of functions, but the .tar.gz attachment was
> not available, so I could not compare.  I would very much like to see
> those if possible.
>
> My own functions include one which I think was not in the earlier
> bundle, to calculate the information gain ratio or uncertainty coefficient.
>
> One problem, which I'm not sure how to get round, is in the main
> information entropy function: it requires vectors at input, but at
> present only works with row or column vectors, i.e. not with vectors
> where the active dimension is > 2.
>  
You should check with Muthu Annamalai for the overlap with his info
theory functions in octave-forge, if any, and get these committed to
octave-forge. Perhaps Muthu can do the commit for you..

D.

--
David Bateman                                David.Bateman@...
Motorola Labs - Paris                        +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin    +33 6 72 01 06 33 (Mob)
91193 Gif-Sur-Yvette FRANCE                  +33 1 69 35 77 01 (Fax)

The information contained in this communication has been classified as:

[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary

_______________________________________________
Octave-sources mailing list
Octave-sources@...
https://www.cae.wisc.edu/mailman/listinfo/octave-sources
LightInTheBox - Buy quality products at wholesale price