Hi Ted and other SenseClusters folks,

I've updated the svdpackout.pl file so that I can extract the component U, S, and V' (V-transpose) matrices rather than only being able to extract the reconstructed matrix or the rows-only.

I've attached my version to this message. Feel free to use it as you see fit.

Here's a brief changelog:
* Added feature to output the component U, S, and V' matrices.
* Added a new command-line option "--output" with three options:
        reconstruct - reconstructs the rank-k matrix (default)
        rowonly - same as --rowonly
        components - output U, S, V' matrices to U.txt, S.txt, VT.txt
* Added a new command-line option "--negatives":
        Allows negative values; otherwise all negative values are set
        to 0 (except in component output).
* New options maintain backward compatibility
* Updated the documentation.
* Passes all tests (testA1-A4,B1-B2)

Hope this helps you -- it helped my students!

As an aside -- and I'd be happy to post this to the main newsgroup if you'd rather -- what is the purpose of the "rowonly" feature? Why do you multiply U by the sqrt of the S values? Is there some theoretical reason to do this?

Thanks!

-Rich

-- Richard Wicentowski
Assistant Professor
Computer Science Department
Swarthmore College
(610) 690-5643

#!/usr/bin/perl -w



=head1 NAME



svdpackout.pl



=head1 SYNOPSIS



Reconstructs a matrix from its singular values and singular vectors created 

by SVDPack.



=head1 USGAE



svdpackout.pl [OPTIONS] lav2 lao2



=head1 INPUT



=head2 Required Arguments:



=head3 lav2 



Binary output file created by SVDPack's las2



=head3 lao2



ASCII output file created by SVDPack's las2



=head2 Optional Arguments:



=head4 --rowonly 



Only the row vectors are reconstructed. By default, svdpackout

reconstructs entire matrix. 



=head4 --negatives



Allows negative values; otherwise all negative values are set to 0 (except in component output).



=head4 --output OUTPUT



Specifies the output of this program:

  reconstruct - reconstructs the rank-k matrix (default)

  rowonly - same as --rowonly

  components - output the U, S, and V matrices



=head4 --format FORM



Specifies numeric format for representing output matrix values. 

Following formats are supported with --format :



iN - Output matrix will contain integer values each occupying N spaces



fM.N - Output matrix will contain real values each occupying total M spaces

of which last N digits show fractional part. M spaces for each entry include 

the decimal point and +/- sign if any.



Default format value is f16.10.



=head3 Other Options :



=head4 --help



Displays this message.



=head4 --version



Displays the version information.



=head1 OUTPUT



svdpackout displays a matrix reconstructed from the Singular Triplets

created by SVD. By default, entire matrix (product of left and right

signular vectors and singular values) is reconstructed. When --rowonly

is ON, or when --output rowonly is set, only the reduced row vectors

are built. When --output components is set, the three component

matrices, U, S, and V, are output separately.



=head1 SYSTEM REQUIREMENTS



SVDPACK - http://netlib.org/svdpack/



PDL - http://search.cpan.org/dist/PDL/



=head1 AUTHOR



Amruta Purandare, Ted Pedersen.

University of Minnesota at Duluth.



=head1 COPYRIGHT



Copyright (c) 2002-2005,



Amruta Purandare, University of Pittsburgh.

[EMAIL PROTECTED]



Ted Pedersen, University of Minnesota, Duluth.

[EMAIL PROTECTED]



This program is free software; you can redistribute it and/or modify it under

the terms of the GNU General Public License as published by the Free Software

Foundation; either version 2 of the License, or (at your option) any later

version.



This program is distributed in the hope that it will be useful, but WITHOUT

ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS

FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.



You should have received a copy of the GNU General Public License along with

this program; if not, write to



The Free Software Foundation, Inc.,

59 Temple Place - Suite 330,

Boston, MA  02111-1307, USA.



=cut



###############################################################################



#                               THE CODE STARTS HERE



# $0 contains the program name along with the 

# complete path. Extract just the program

# name and use in error messages

$0=~s/.*\/(.+)/$1/;



use PDL;

use PDL::NiceSlice;

use PDL::Primitive;



###############################################################################



#                           ================================

#                            COMMAND LINE OPTIONS AND USAGE

#                           ================================



# command line options

use Getopt::Long;

GetOptions ("help","version","format=s","output=s","rowonly","negatives");

# show help option

if (defined $opt_help) {

  $opt_help=1;

  &showhelp();

  exit;

}



# show version information

if (defined $opt_version) {

  $opt_version=1;

  &showversion();

  exit;

}



#output options

if (defined $opt_output) {

  if ((defined $opt_rowonly) && ($opt_output !~ /rowonly/)) {

    printf ("Incompatible options: --rowonly --output=%s\n", $opt_output);

    &showhelp();

    exit;

  }

  if (($opt_output ne "reconstruct") && 

      ($opt_output ne "rowonly") && 

      ($opt_output ne "components")) {

    printf ("Invalid output format: %s\n", $opt_output);

    &showhelp();

    exit;

  }

} elsif (defined $opt_rowonly) {

  $opt_output="rowonly";

  undef $opt_rowonly;

} else {

  $opt_output="reconstruct";

}







# formatting matrix values

if (defined $opt_format) {

  # integer format

  if ($opt_format=~/^i(\d+)$/) {

    $format="%$1d";

    $lower_format="-";

    while (length($lower_format)<($1-1)) {

      $lower_format.="9";

    }

    if ($lower_format eq "-") {

      $lower_format="0";

    }

    $upper_format="";

    while (length($upper_format)<($1-1)) {

      $upper_format.="9";

    }

  }

  # floating point format

  elsif ($opt_format=~/^f(\d+)\.(\d+)$/) {

    $format="%$1.$2f";

    $lower_format="-";

    while (length($lower_format)<($1-$2-2)) {

      $lower_format.="9";

    }

    $lower_format.=".";

    while (length($lower_format)<($1-1)) {

      $lower_format.="9";

    }



    $upper_format="";

    while (length($upper_format)<($1-$2-2)) {

      $upper_format.="9";

    }

    $upper_format.=".";

    while (length($upper_format)<($1-1)) {

      $upper_format.="9";

    }

  } else {

    print STDERR "ERROR($0):

	Wrong format value --format=$opt_format.\n";

    exit;

  }

}

# default

else {

  #	$format="%8.3f";

  $format="%16.10f";

  $lower_format="-999.9999999999";

  $upper_format="9999.9999999999";

}



# show minimal usage message if no arguments

if ($#ARGV<1) {

  &showminimal();

  exit;

}



#############################################################################



#                       ================================

#                          INITIALIZATION AND INPUT

#                       ================================



# accept lav2 file name

$lav2=$ARGV[0];

if (!-e $lav2) {

  print STDERR "ERROR($0):

        lav2 file <$lav2> doesn't exist...\n";

  exit;

}

open(LAV2,$lav2) || die "Error($0):

        Error(code=$!) in opening lav2 file <$lav2>.\n";



#accept lao2 file name

$lao2=$ARGV[1];

if (!-e $lao2) {

  print STDERR "ERROR($0):

        lao2 file <$lao2> doesn't exist...\n";

  exit;

}

open(LAO2,$lao2) || die "Error($0):

        Error(code=$!) in opening lao2 file <$lao2>.\n";



##############################################################################



#			========================

#			      CODE SECTION

#			========================



# -------------------

# Reading file lao2 

# -------------------

while (<LAO2>) {

  # e-pairs = K, reduction factor specified by the user

  if (/MAX. NO. OF EIGENPAIRS\s*=\s*(\d+)/) {

    $k=$1;

  }

  # rows

  if (/NO\. OF TERMS\s*\(ROWS\)\s*=\s*(\d+)/) {

    $rows=$1;

    next;

  }

  # cols

  if (/NO\. OF DOCUMENTS\s*\(COLS\)\s*=\s*(\d+)/) {

    $cols=$1;

    next;

  }

  # nsig

  if (/NSIG\s*=\s*(\d+)/) {

    $nsig=$1;

    next;

  }

  # after this the S-values follow

  if (/COMPUTED S-VALUES/) {

    last;

  }

}



# check: valid values of rows, cols, k, nsig 

# are obtained from lao2

if (!defined $rows) {

  print STDERR "ERROR($0):

	#ROWS not found in lao2 file <$lao2>.\n";

    exit;

}

if (!defined $cols) {

  print STDERR "ERROR($0):

        #COLS not found in lao2 file <$lao2>.\n";

    exit;

}

if (!defined $nsig) {

  print STDERR "ERROR($0):

        NSIG value not found in lao2 file <$lao2>.\n";

  exit;

}

if (!defined $k) {

  print STDERR "ERROR($0):

        NO. OF EIGENPAIRS not found in lao2 file <$lao2>.\n";

  exit;

}



# one line is blank in lao2 after "COMPUTED S-VALUES..." line

<LAO2>;



$d=zeroes($nsig);

# reading singular values



# singular values are reverse ordered in lao2 

# such that the most significant/highest value 

# occurs last 

for ($i=$nsig-1 ; $i>=0 ; $i--) {

  $line=<LAO2>;

  if (!defined $line) {

    print STDERR "ERROR($0):

	lao2 file <$lao2> doesn't have $nsig S-values.\n";

    exit;

  }

  $line=~s/^\s+//;

  # line containing S-value contains 

  # ellipses, index of S-value, S-value, RES NORMS

  ($ellipses,$ind,$sval,@rest)=split(/\s+/,$line);

  # we only need the S-value

  undef $ellipses;

  undef $ind;

  undef @rest;

  set($d,$i,$sval);

}



# taking minimum of #nsig and k

$k=($k<$nsig) ? $k : $nsig;



# --------------------

# Reading file lav2

# --------------------



# binary file needs to specify

# binmode

binmode LAV2;



# lav2 contains some header information

# before the actual S-vectors



# the header length is length(pack(2l+d))

$longsize=length(pack("l",()));

$doubsize=length(pack("d",()));



# right S-vectors start after header

$vstart=(2*$longsize) + $doubsize;



# left S-vectors start after right S-vectors

$ustart=$vstart+($cols*$nsig*$doubsize);



# printing rows cols on first line 

if ($opt_output eq "reconstruct") {

  print "$rows $cols\n";

} elsif ($opt_output eq "rowonly") {

  print "$rows $k\n";

} elsif ($opt_output eq "components") {

  

}

#output components

if ($opt_output eq "components") {

  print STDERR "Writing to U.txt, S.txt and VT.txt\n";



  open (U, ">U.txt") or die "$!: U.txt\n";

  print U "$rows $k\n";

  for ($i=0;$i<$rows;$i++) {

    for ($m=$nsig-1;$m>=$nsig-$k;$m--) {

      $index=$ustart+((($m*$rows)+$i)*$doubsize);

      seek(LAV2,$index,0);

      read(LAV2,$value,$doubsize);

      if (!defined $value) {

	print STDERR "ERROR($0):

	lav2 file <$lav2> doesn't have sufficient S-vectors.\n";

	exit;

      }

      #unpacking

      $value=unpack("d",$value);

      printf U ($format, $value);

    }

    print U "\n";

  }

  close(U);



  open (S, ">S.txt") or die "$!: S.txt\n";

  print S "$k $k\n";

  for ($i=0 ; $i<$k ; $i++) {

    for ($j=0; $j<$i ; $j++) { printf S ($format, 0); }

    printf S ($format, $d->at($i));

    for ($j=$i+1; $j<$k ; $j++) { printf S ($format, 0); }

    print S "\n";

  }

  close(S);



  open (VT, ">VT.txt") or die "$!: VT.txt\n";

  print VT "$k $cols\n";

  for ($j=0;$j<$cols;$j++) {

    for ($m=$nsig-1;$m>=$nsig-$k;$m--) {

      $index=$vstart+((($m*$cols)+$j)*$doubsize);

      seek(LAV2,$index,0);

      read(LAV2,$value,$doubsize);

      if (!defined $value) {

	print STDERR "ERROR($0):

        lav2 file <$lav2> doesn't have sufficient S-vectors.\n";

	exit;

      }

      # unpacking

      $value = unpack("d", $value);

      printf VT ($format, $value);

    }

    print VT "\n";

  }

  close(VT);



  exit;

} 

# reconstruct full matrix

elsif ($opt_output eq "reconstruct") {

  $u=zeroes($k);

  $v=zeroes($k);

  for ($i=0;$i<$rows;$i++) {

    $u->inplace->zeroes;

    for ($m=$nsig-1;$m>=$nsig-$k;$m--) {

      $index=$ustart+((($m*$rows)+$i)*$doubsize);

      seek(LAV2,$index,0);

      read(LAV2,$value,$doubsize);

      if (!defined $value) {

	print STDERR "ERROR($0):

	lav2 file <$lav2> doesn't have sufficient S-vectors.\n";

	exit;

      }

      #unpacking

      $value=unpack("d",$value);

      set($u,$nsig-1-$m,$value);

    }

    for ($j=0;$j<$cols;$j++) {

      $v->inplace->zeroes;

      for ($m=$nsig-1;$m>=$nsig-$k;$m--) {

	$index=$vstart+((($m*$cols)+$j)*$doubsize);

	seek(LAV2,$index,0);

	read(LAV2,$value,$doubsize);

	if (!defined $value) {

	  print STDERR "ERROR($0):

        lav2 file <$lav2> doesn't have sufficient S-vectors.\n";

	  exit;

	}

        # unpacking

	$value = unpack("d", $value);

	set($v,$nsig-1-$m,$value);

      }

      $recon=$u * $d(0:$k-1) x transpose $v;

      $number=sprintf($format,$recon->sclr);

      # replacing -0.0+ to 0

      if ((!defined $opt_negatives) && ($number=~/\-(0.)?0+/)) { $number=0; }

      if ($number<$lower_format) {

	print STDERR "ERROR($0):

        Floating point underflow.

        Value <$number> can't be represented with format $format.\n";

	exit 1;

      }

      if ($number>$upper_format) {

	print STDERR "ERROR($0):

        Floating point overflow.

        Value <$number> can't be represented with format $format.\n";

	exit 1;

      }

      printf($format,$number);

    }

    print "\n";

  }

}

# recontruct only row vectors 

elsif ($opt_output eq "rowonly") {

  for ($i=0;$i<$rows;$i++) {	

    for ($j=$nsig-1;$j>=$nsig-$k;$j--) {

      # computing index of each value from beginning of file

      $index = $ustart+((($j*$rows)+$i)*$doubsize);

      seek(LAV2,$index,0);

      read(LAV2,$value,$doubsize);

      if (!defined $value) {

	print STDERR "ERROR($0):

        lav2 file <$lav2> doesn't have sufficient left S-vectors.\n";

	exit;

      }

      # unpacking

      $value = unpack("d", $value);

      $recon = $value * sqrt at($d,$nsig-1-$j);

      $number=sprintf($format,$recon);

      # replacing -0.0+ to 0

      if ((!defined $opt_negatives) && ($number=~/\-(0.)?0+/)) { $number=0; }

      if ($number<$lower_format) {

	print STDERR "ERROR($0):

        Floating point underflow.

        Value <$number> can't be represented with format $format.\n";

	exit 1;

      }

      if ($number>$upper_format) {

	print STDERR "ERROR($0):

        Floating point overflow.

        Value <$number> can't be represented with format $format.\n";

	exit 1;

      }

      printf($format,$number);

    }

    print "\n";

  }	

}



##############################################################################



#                      ==========================

#                          SUBROUTINE SECTION

#                      ==========================



#-----------------------------------------------------------------------------

#show minimal usage message

sub showminimal()

  {

    print "Usage: svdpackout.pl [OPTIONS] LAV2 LAO2";

    print "\nTYPE svdpackout.pl --help for help\n";

  }



#-----------------------------------------------------------------------------

#show help

sub showhelp()

  {

    print "Usage:  svdpackout.pl [OPTIONS] LAV2 LAO2 



Reconstructs a rank-k matrix from output files LAV2 and LAO2 created by las2 

program of SVDPack.



LAV2

	Binary output file created by las2

	

LAO2

	ASCII output file created by las2



OPTIONS:



--rowonly 

	Reconstructs only row vectors.



--negatives

        Allows negative values; otherwise all negative values are set

        to 0 (except in component output).



--output OUTPUT

        reconstruct - reconstructs the rank-k matrix (default)

	rowonly - same as --rowonly

        components - output U, S, VT matrices to U.txt, S.txt, VT.txt



--format FORM

	Specifies the format for displaying output matrix. Default is f16.10.



--help

        Displays this message.

--version

        Displays the version information.



Type 'perldoc svdpackout.pl' to view detailed documentation of svdpackout.\n";

  }



#------------------------------------------------------------------------------

#version information

sub showversion()

  {

    print "svdpackout.pl      -       Version 0.04\n";

    print "Reconstructs a rank-k matrix from output of SVDPack.\n";

    print "Copyright (c) 2002-2005, Amruta Purandare & Ted Pedersen.\n";

    print "Date of Last Update:     06/02/2004\n";

  }



#############################################################################



-------------------------------------------------------------------------

Take Surveys. Earn Cash. Influence the Future of IT

Join SourceForge.net's Techsay panel and you'll get the chance to share your

opinions on IT & business topics through brief surveys - and earn cash

http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________

senseclusters-developers mailing list

[email protected]

https://lists.sourceforge.net/lists/listinfo/senseclusters-developers

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to