Wednesday, June 27, 2012

Retrieving Named Arrays in Statistics::R

This is mostly a note to myself, but anyone else who uses Statistics::R from cpan to link Perl scripts to the statistics engine R can benefit. I find myself needing to retrieve named arrays from R for use in programs, and there's no easy way to do that with the built-in functions. So here's the code:

sub get_hash { # custom function that returns a vector in 
               # a hash indexed by variable names 
   my ($self, $varname) = @_;
   my $values_str = $self->run(qq{cat($varname)});
   my $keys_str = $self->run(qq{cat(names($varname))});
   my @values = split(/ /,$values_str);
   my @keys = split(/ /,$keys_str);
   my %hash;
   my $v;
   my $k;
   while(@keys) {
    $k = pop(@keys);
    $v = pop(@values);
    $hash{$k} = $v;
   }
   return \%hash;
}

Here's an example, using a matrix:
#take the number of rows in the matrix minus the column sums of blanks
$R->send(qq{n2= nrow(cols2.mat) - colSums(is.na(cols2.mat))});

#now we have a vector (with variable names) that has the number of non-blanks

#retrieve it from R into perl 
$n2 = $R->get_hash('n2');
 
In the example, the return value is a reference to a hash. You get the values out with something like:
$val_for_variable = ${$n2}{$name_of_variable};

2 comments:

  1. Hi,

    Do you have any more examples of using Statistics::R?

    ReplyDelete
    Replies
    1. I use it quite a lot, mostly for data mining. There's a post on it here: http://highered.blogspot.com/2013/01/finding-meaning-in-data-part-i.html

      Delete