Hi Ted and other SenseClusters folks,
I've updated the svdpackout.pl file so that I can extract the
component U, S, and V' (V-transpose) matrices rather than only being
able to extract the reconstructed matrix or the rows-only.
I've attached my version to this message. Feel free to use it as you
see fit.
Here's a brief changelog:
* Added feature to output the component U, S, and V' matrices.
* Added a new command-line option "--output" with three options:
reconstruct - reconstructs the rank-k matrix (default)
rowonly - same as --rowonly
components - output U, S, V' matrices to U.txt, S.txt, VT.txt
* Added a new command-line option "--negatives":
Allows negative values; otherwise all negative values are set
to 0 (except in component output).
* New options maintain backward compatibility
* Updated the documentation.
* Passes all tests (testA1-A4,B1-B2)
Hope this helps you -- it helped my students!
As an aside -- and I'd be happy to post this to the main newsgroup if
you'd rather -- what is the purpose of the "rowonly" feature? Why do
you multiply U by the sqrt of the S values? Is there some
theoretical reason to do this?
Thanks!
-Rich
--
Richard Wicentowski
Assistant Professor
Computer Science Department
Swarthmore College
(610) 690-5643
#!/usr/bin/perl -w
=head1 NAME
svdpackout.pl
=head1 SYNOPSIS
Reconstructs a matrix from its singular values and singular vectors created
by SVDPack.
=head1 USGAE
svdpackout.pl [OPTIONS] lav2 lao2
=head1 INPUT
=head2 Required Arguments:
=head3 lav2
Binary output file created by SVDPack's las2
=head3 lao2
ASCII output file created by SVDPack's las2
=head2 Optional Arguments:
=head4 --rowonly
Only the row vectors are reconstructed. By default, svdpackout
reconstructs entire matrix.
=head4 --negatives
Allows negative values; otherwise all negative values are set to 0 (except in component output).
=head4 --output OUTPUT
Specifies the output of this program:
reconstruct - reconstructs the rank-k matrix (default)
rowonly - same as --rowonly
components - output the U, S, and V matrices
=head4 --format FORM
Specifies numeric format for representing output matrix values.
Following formats are supported with --format :
iN - Output matrix will contain integer values each occupying N spaces
fM.N - Output matrix will contain real values each occupying total M spaces
of which last N digits show fractional part. M spaces for each entry include
the decimal point and +/- sign if any.
Default format value is f16.10.
=head3 Other Options :
=head4 --help
Displays this message.
=head4 --version
Displays the version information.
=head1 OUTPUT
svdpackout displays a matrix reconstructed from the Singular Triplets
created by SVD. By default, entire matrix (product of left and right
signular vectors and singular values) is reconstructed. When --rowonly
is ON, or when --output rowonly is set, only the reduced row vectors
are built. When --output components is set, the three component
matrices, U, S, and V, are output separately.
=head1 SYSTEM REQUIREMENTS
SVDPACK - http://netlib.org/svdpack/
PDL - http://search.cpan.org/dist/PDL/
=head1 AUTHOR
Amruta Purandare, Ted Pedersen.
University of Minnesota at Duluth.
=head1 COPYRIGHT
Copyright (c) 2002-2005,
Amruta Purandare, University of Pittsburgh.
[EMAIL PROTECTED]
Ted Pedersen, University of Minnesota, Duluth.
[EMAIL PROTECTED]
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.
=cut
###############################################################################
# THE CODE STARTS HERE
# $0 contains the program name along with the
# complete path. Extract just the program
# name and use in error messages
$0=~s/.*\/(.+)/$1/;
use PDL;
use PDL::NiceSlice;
use PDL::Primitive;
###############################################################################
# ================================
# COMMAND LINE OPTIONS AND USAGE
# ================================
# command line options
use Getopt::Long;
GetOptions ("help","version","format=s","output=s","rowonly","negatives");
# show help option
if (defined $opt_help) {
$opt_help=1;
&showhelp();
exit;
}
# show version information
if (defined $opt_version) {
$opt_version=1;
&showversion();
exit;
}
#output options
if (defined $opt_output) {
if ((defined $opt_rowonly) && ($opt_output !~ /rowonly/)) {
printf ("Incompatible options: --rowonly --output=%s\n", $opt_output);
&showhelp();
exit;
}
if (($opt_output ne "reconstruct") &&
($opt_output ne "rowonly") &&
($opt_output ne "components")) {
printf ("Invalid output format: %s\n", $opt_output);
&showhelp();
exit;
}
} elsif (defined $opt_rowonly) {
$opt_output="rowonly";
undef $opt_rowonly;
} else {
$opt_output="reconstruct";
}
# formatting matrix values
if (defined $opt_format) {
# integer format
if ($opt_format=~/^i(\d+)$/) {
$format="%$1d";
$lower_format="-";
while (length($lower_format)<($1-1)) {
$lower_format.="9";
}
if ($lower_format eq "-") {
$lower_format="0";
}
$upper_format="";
while (length($upper_format)<($1-1)) {
$upper_format.="9";
}
}
# floating point format
elsif ($opt_format=~/^f(\d+)\.(\d+)$/) {
$format="%$1.$2f";
$lower_format="-";
while (length($lower_format)<($1-$2-2)) {
$lower_format.="9";
}
$lower_format.=".";
while (length($lower_format)<($1-1)) {
$lower_format.="9";
}
$upper_format="";
while (length($upper_format)<($1-$2-2)) {
$upper_format.="9";
}
$upper_format.=".";
while (length($upper_format)<($1-1)) {
$upper_format.="9";
}
} else {
print STDERR "ERROR($0):
Wrong format value --format=$opt_format.\n";
exit;
}
}
# default
else {
# $format="%8.3f";
$format="%16.10f";
$lower_format="-999.9999999999";
$upper_format="9999.9999999999";
}
# show minimal usage message if no arguments
if ($#ARGV<1) {
&showminimal();
exit;
}
#############################################################################
# ================================
# INITIALIZATION AND INPUT
# ================================
# accept lav2 file name
$lav2=$ARGV[0];
if (!-e $lav2) {
print STDERR "ERROR($0):
lav2 file <$lav2> doesn't exist...\n";
exit;
}
open(LAV2,$lav2) || die "Error($0):
Error(code=$!) in opening lav2 file <$lav2>.\n";
#accept lao2 file name
$lao2=$ARGV[1];
if (!-e $lao2) {
print STDERR "ERROR($0):
lao2 file <$lao2> doesn't exist...\n";
exit;
}
open(LAO2,$lao2) || die "Error($0):
Error(code=$!) in opening lao2 file <$lao2>.\n";
##############################################################################
# ========================
# CODE SECTION
# ========================
# -------------------
# Reading file lao2
# -------------------
while (<LAO2>) {
# e-pairs = K, reduction factor specified by the user
if (/MAX. NO. OF EIGENPAIRS\s*=\s*(\d+)/) {
$k=$1;
}
# rows
if (/NO\. OF TERMS\s*\(ROWS\)\s*=\s*(\d+)/) {
$rows=$1;
next;
}
# cols
if (/NO\. OF DOCUMENTS\s*\(COLS\)\s*=\s*(\d+)/) {
$cols=$1;
next;
}
# nsig
if (/NSIG\s*=\s*(\d+)/) {
$nsig=$1;
next;
}
# after this the S-values follow
if (/COMPUTED S-VALUES/) {
last;
}
}
# check: valid values of rows, cols, k, nsig
# are obtained from lao2
if (!defined $rows) {
print STDERR "ERROR($0):
#ROWS not found in lao2 file <$lao2>.\n";
exit;
}
if (!defined $cols) {
print STDERR "ERROR($0):
#COLS not found in lao2 file <$lao2>.\n";
exit;
}
if (!defined $nsig) {
print STDERR "ERROR($0):
NSIG value not found in lao2 file <$lao2>.\n";
exit;
}
if (!defined $k) {
print STDERR "ERROR($0):
NO. OF EIGENPAIRS not found in lao2 file <$lao2>.\n";
exit;
}
# one line is blank in lao2 after "COMPUTED S-VALUES..." line
<LAO2>;
$d=zeroes($nsig);
# reading singular values
# singular values are reverse ordered in lao2
# such that the most significant/highest value
# occurs last
for ($i=$nsig-1 ; $i>=0 ; $i--) {
$line=<LAO2>;
if (!defined $line) {
print STDERR "ERROR($0):
lao2 file <$lao2> doesn't have $nsig S-values.\n";
exit;
}
$line=~s/^\s+//;
# line containing S-value contains
# ellipses, index of S-value, S-value, RES NORMS
($ellipses,$ind,$sval,@rest)=split(/\s+/,$line);
# we only need the S-value
undef $ellipses;
undef $ind;
undef @rest;
set($d,$i,$sval);
}
# taking minimum of #nsig and k
$k=($k<$nsig) ? $k : $nsig;
# --------------------
# Reading file lav2
# --------------------
# binary file needs to specify
# binmode
binmode LAV2;
# lav2 contains some header information
# before the actual S-vectors
# the header length is length(pack(2l+d))
$longsize=length(pack("l",()));
$doubsize=length(pack("d",()));
# right S-vectors start after header
$vstart=(2*$longsize) + $doubsize;
# left S-vectors start after right S-vectors
$ustart=$vstart+($cols*$nsig*$doubsize);
# printing rows cols on first line
if ($opt_output eq "reconstruct") {
print "$rows $cols\n";
} elsif ($opt_output eq "rowonly") {
print "$rows $k\n";
} elsif ($opt_output eq "components") {
}
#output components
if ($opt_output eq "components") {
print STDERR "Writing to U.txt, S.txt and VT.txt\n";
open (U, ">U.txt") or die "$!: U.txt\n";
print U "$rows $k\n";
for ($i=0;$i<$rows;$i++) {
for ($m=$nsig-1;$m>=$nsig-$k;$m--) {
$index=$ustart+((($m*$rows)+$i)*$doubsize);
seek(LAV2,$index,0);
read(LAV2,$value,$doubsize);
if (!defined $value) {
print STDERR "ERROR($0):
lav2 file <$lav2> doesn't have sufficient S-vectors.\n";
exit;
}
#unpacking
$value=unpack("d",$value);
printf U ($format, $value);
}
print U "\n";
}
close(U);
open (S, ">S.txt") or die "$!: S.txt\n";
print S "$k $k\n";
for ($i=0 ; $i<$k ; $i++) {
for ($j=0; $j<$i ; $j++) { printf S ($format, 0); }
printf S ($format, $d->at($i));
for ($j=$i+1; $j<$k ; $j++) { printf S ($format, 0); }
print S "\n";
}
close(S);
open (VT, ">VT.txt") or die "$!: VT.txt\n";
print VT "$k $cols\n";
for ($j=0;$j<$cols;$j++) {
for ($m=$nsig-1;$m>=$nsig-$k;$m--) {
$index=$vstart+((($m*$cols)+$j)*$doubsize);
seek(LAV2,$index,0);
read(LAV2,$value,$doubsize);
if (!defined $value) {
print STDERR "ERROR($0):
lav2 file <$lav2> doesn't have sufficient S-vectors.\n";
exit;
}
# unpacking
$value = unpack("d", $value);
printf VT ($format, $value);
}
print VT "\n";
}
close(VT);
exit;
}
# reconstruct full matrix
elsif ($opt_output eq "reconstruct") {
$u=zeroes($k);
$v=zeroes($k);
for ($i=0;$i<$rows;$i++) {
$u->inplace->zeroes;
for ($m=$nsig-1;$m>=$nsig-$k;$m--) {
$index=$ustart+((($m*$rows)+$i)*$doubsize);
seek(LAV2,$index,0);
read(LAV2,$value,$doubsize);
if (!defined $value) {
print STDERR "ERROR($0):
lav2 file <$lav2> doesn't have sufficient S-vectors.\n";
exit;
}
#unpacking
$value=unpack("d",$value);
set($u,$nsig-1-$m,$value);
}
for ($j=0;$j<$cols;$j++) {
$v->inplace->zeroes;
for ($m=$nsig-1;$m>=$nsig-$k;$m--) {
$index=$vstart+((($m*$cols)+$j)*$doubsize);
seek(LAV2,$index,0);
read(LAV2,$value,$doubsize);
if (!defined $value) {
print STDERR "ERROR($0):
lav2 file <$lav2> doesn't have sufficient S-vectors.\n";
exit;
}
# unpacking
$value = unpack("d", $value);
set($v,$nsig-1-$m,$value);
}
$recon=$u * $d(0:$k-1) x transpose $v;
$number=sprintf($format,$recon->sclr);
# replacing -0.0+ to 0
if ((!defined $opt_negatives) && ($number=~/\-(0.)?0+/)) { $number=0; }
if ($number<$lower_format) {
print STDERR "ERROR($0):
Floating point underflow.
Value <$number> can't be represented with format $format.\n";
exit 1;
}
if ($number>$upper_format) {
print STDERR "ERROR($0):
Floating point overflow.
Value <$number> can't be represented with format $format.\n";
exit 1;
}
printf($format,$number);
}
print "\n";
}
}
# recontruct only row vectors
elsif ($opt_output eq "rowonly") {
for ($i=0;$i<$rows;$i++) {
for ($j=$nsig-1;$j>=$nsig-$k;$j--) {
# computing index of each value from beginning of file
$index = $ustart+((($j*$rows)+$i)*$doubsize);
seek(LAV2,$index,0);
read(LAV2,$value,$doubsize);
if (!defined $value) {
print STDERR "ERROR($0):
lav2 file <$lav2> doesn't have sufficient left S-vectors.\n";
exit;
}
# unpacking
$value = unpack("d", $value);
$recon = $value * sqrt at($d,$nsig-1-$j);
$number=sprintf($format,$recon);
# replacing -0.0+ to 0
if ((!defined $opt_negatives) && ($number=~/\-(0.)?0+/)) { $number=0; }
if ($number<$lower_format) {
print STDERR "ERROR($0):
Floating point underflow.
Value <$number> can't be represented with format $format.\n";
exit 1;
}
if ($number>$upper_format) {
print STDERR "ERROR($0):
Floating point overflow.
Value <$number> can't be represented with format $format.\n";
exit 1;
}
printf($format,$number);
}
print "\n";
}
}
##############################################################################
# ==========================
# SUBROUTINE SECTION
# ==========================
#-----------------------------------------------------------------------------
#show minimal usage message
sub showminimal()
{
print "Usage: svdpackout.pl [OPTIONS] LAV2 LAO2";
print "\nTYPE svdpackout.pl --help for help\n";
}
#-----------------------------------------------------------------------------
#show help
sub showhelp()
{
print "Usage: svdpackout.pl [OPTIONS] LAV2 LAO2
Reconstructs a rank-k matrix from output files LAV2 and LAO2 created by las2
program of SVDPack.
LAV2
Binary output file created by las2
LAO2
ASCII output file created by las2
OPTIONS:
--rowonly
Reconstructs only row vectors.
--negatives
Allows negative values; otherwise all negative values are set
to 0 (except in component output).
--output OUTPUT
reconstruct - reconstructs the rank-k matrix (default)
rowonly - same as --rowonly
components - output U, S, VT matrices to U.txt, S.txt, VT.txt
--format FORM
Specifies the format for displaying output matrix. Default is f16.10.
--help
Displays this message.
--version
Displays the version information.
Type 'perldoc svdpackout.pl' to view detailed documentation of svdpackout.\n";
}
#------------------------------------------------------------------------------
#version information
sub showversion()
{
print "svdpackout.pl - Version 0.04\n";
print "Reconstructs a rank-k matrix from output of SVDPack.\n";
print "Copyright (c) 2002-2005, Amruta Purandare & Ted Pedersen.\n";
print "Date of Last Update: 06/02/2004\n";
}
#############################################################################
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
senseclusters-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-developers
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users