So,
essentially:
- A file updated in-place doesn't warrant
a new GUID?
When Mark was talking about 'hash' he was
actually refering to md5 checksum. I think you're talking about some other kind
of hash.
Another important point, for our specific
case, is that our files are 'non-versionable' as far as I understand this.
Meaning that in 99% of the cases they are not .exe or .dll, but .py, .txt, .xml
etc.
Now that I've expanded this a bit, let me
go full length.
In our case, a directory of .py files (a
'package') would be considered a 'component', and my previous understanding was
that a updated file would warrant a new GUID, since it's essentially a 'new
version' of the 'package'. Is that a bad assumption?
--
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
De: [EMAIL PROTECTED] em nome
de Derek Cicerone Enviada: qua 17/5/2006 17:08 Para: 'James
Carter' Cc: 'Mark Hammond';
WiX-devs@lists.sourceforge.net Assunto: RE: [WiX-devs] Re: [WiX-users]
Extending Heat
I’m not sure if a moved
file is all that interesting to be honest. Moved files don’t need to keep
the same component guid because it’s essentially a different resource at that
point because it’s being installed to a different location. I think the
only thing that’s interesting is where a file will be installed (not necessarily
where it came from). If you make assumptions about a file being installed
to the same location it came from, then this could work. The hash doesn’t
seem to really help because it doesn’t account for in-place file updates.
As far as Windows Installer is concerned, a file is identified by where it
installs – it could completely change versions or content, but if it’s the same
path, it’s the same file.
Derek
From: James
Carter [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 17, 2006 2:14
AM To:
[EMAIL PROTECTED] Cc: Mark Hammond;
WiX-devs@lists.sourceforge.net Subject: Re: [WiX-devs] Re: [WiX-users]
Extending Heat
Mark, definitely what you are proposing is
beyond what my design is intended to handle. I'm interested in hearing more of
your thoughts on the subject.
Derek, I was wondering if you could explain
why you don't think hashing would be good enough to tell if a file matched a
previously shipped file? A scenario I can think of is if a file is kept the same
but moved to a different directory and different component. If that happened,
then you would have a false positive by using just the hash. You would need to
take into account the original location of the file as well. I think that would
do it. Thoughts?
On 5/16/06, Derek Cicerone <[EMAIL PROTECTED] >
wrote:
Inline
The overall message here is
that a component catalog can solve a very narrow set of problems and it's a
good first step but other logic is needed in order to be able to safely
generate setup every time (and I'm not sure its entirely
possible). The catalog can protect against component
rule violations, but it cannot keep you from intentionally shipping a
file, probably won't handle components with non-file resources (like
websites), and it would be tricky to come up with an algorithm for
identifying that a file matched a previously shipped file (hashing isn't
necessarily good enough).
-----Original Message----- From: Mark
Hammond [mailto: [EMAIL PROTECTED]] Sent: Tuesday, May 16, 2006 5:29
PM To: [EMAIL PROTECTED]; 'James Carter'; WiX-devs@lists.sourceforge.net Subject: RE: [WiX-devs] Re:
[WiX-users] Extending Heat
Hi James and Derek, We are also
playing with the design of a component library - it would be great if we can
put our efforts towards a single tool the entire community can
use.
There are a couple of things I'm not clear about - but I fear some
of them are simply a lack of understanding about MSI/WiX.
>
- Nothing should ever
be removed from the database. New components > and files
should only be added as they are shipped. The database needs >
to remember anything which you've shipped to ensure there are
no collisions.
Can you please elaborate on this
requirement? Why is a history necessary? If a component changes
and its GUID is accordingly changed, what collisions are
possible? [DerekC] Very good question - this gets to the core of why a
component catalog is necessary in the first place. The idea of a
component catalog is to capture _anything_ which shipped to a customer to
ensure that there will be no collisions for any customers (regardless of
which version of your product they are installing). So anything
which you consider to have "shipped" needs to be stored in the component
catalog. While you are developing a setup, its perfectly fine to
trash the component rules because there are no assumptions that the install
isn't going to trash a machine while its being developed. But
once you consider something "shipped" - it needs to be
supported. Just to add more info here: if you consider creating a
build to be "shipping" because you support build-to-build upgrades or
something like that, then the component catalog should be updated for
every build. If you only ship once a year, then that's when the
component catalog needs to be updated.
The way I see it, the primary
role of a catalog can be reduced to 2 tasks:
* Check if the filesystem
currently, exactly matches the previous 'expansion' of a component. * If
the file-system does not exactly match, generate new GUIDs and take a new
snapshot of the current 'expansion'
So I'm missing how the history is
relevant to the tool (I see how the history is useful to the developer -
hence I propose the database format be XML, suitable for use with their
existing source control system) [DerekC] The points above focus on the
developer-related stability which is desired from a component catalog: stable
names and guids over time. However, this is not very important in terms of
component rule violations which can ruin a customer's
machine. Another issue to think about here is that guids need to
be specific to a resource, but not necessarily a particular version of a
file. This is a little complicated, so let me explain: if you
ship a file A to a directory X and then at some later point in time the file
is updated to A', it needs to keep the same component guid to ensure MSI can
properly handle the upgrade. This is why merely hashing files is
not enough - it would actually miss the case of files being updated in
place. What you really need to do is track files in terms of where
they will be installed on the user's machine. Since component's
are tracked per-directory, you don't need to worry about the user being able
to re-target a directory.
>
- Components and
files should be added to the database from the msi > files just prior
to being shipped. This ensures that the guids and >
identifiers which you care about in terms of the component rules are
the ones > you are tracking.
I had pictured the database being
updated as the MSI is being built, rather than post build. I see
that from a process POV, people may find it useful to be done post-build -
but am I missing a reason it can't be done at build time, with the updated
component database checked into source-control at the same time?
I'm
not sure of any details of the design James has come up with, but for the
sake of discussion, I include our current thinking on the authoring (note
that our interest is for an open source project with many
thousand files).
A 'component specification' describes components in
terms of wildcard specifications. For example, below is an
example of a 2 specifications (which is maintained in its own XML
file)
<Shadow xmlns="
http://schemas.enfoldsystems.com/wix/shadow/catalog'
> <!-- The component definitions
--> <!-- ComponentGroup provides a logical
"grouping" of components
automatically. Child nodes specify the source of files
(eg Manifest allows
wildcards to be specified, but one can imagine
other
techniques) --> <ComponentGroup
Id="Zope"> <Manifest
style="component_per_directory"> <![CDATA[ #
here is a 'manifest' specifying the files to collect.
recursive-include
$(env.BUILDPATH)/Products/*.py recursive-exclude
.svn
CVS ]]> </Manifest> </ComponentGroup> <ComponentGroup
Id="Python">
... </ComponentGroup> </Shadow>
Note
that the above is only the *specification* - the tool will need to process
this specification and maintain the database with the hash values, file stat
info etc for the expanded specification. Even though the
resulting database is not designed to be human edited, I picture
this database also being XML to make it friendly to source
control.
Below is an example WiX source file that references the
catalog:
<Wix xmlns=" http://schemas.microsoft.com/wix/2003/01/wi"
xmlns:se=' http://schemas.enfoldsystems.com/wix/shadow/expansion'
> <!-- the 'se' namespace is (ab)used for
the component catalog expansion.
--> ... <Feature
Id="ProductFeature" Level="1" Title="Core
Product"> <!--
'include' ComponentRef nodes for the catalog items
--> <se:ComponentRefs
Id="Python"/> <se:ComponentRefs
Id="Zope"/> </Feature>
... <!-- 'include' the component definitions
for the catalog items --> <se:ComponentDefs
Id='Python'/> <se:ComponentDefs
Id='Zope'/> </Wix>
The extension 'simply' (ha!:) processes
the tree, with anything in the 'se' namespace substituting information from
the catalog. Our design does get a little more complex when
thinking about how to add additional Wix elements to a file in the component
specification (eg, shortcut to a .exe), but the above describes the gist of
it.
I'd appreciate all comments - including 'you are mad!'
:) Am I making this too complex for its own good? Is
there something obvious I have overlooked? Is your idea (James) anything like
this? [DerekC] The design here is way beyond a component catalog and well
into automatically generated setup. I think it's a fine thing to
strive towards, but there are some inherent dangers in doing setup this
way. The most visible problem you'll hit in this scenario is that
new files will be added as long as they fall within the specifications - so
you need to worry about accidentally shipping a file
unintentionally. This concern could be mitigated by having checks
against a known good file list (perhaps against the source control server to
ensure all the files were intentionally checked in if they are under source
control). But its still risky.
Beyond that concern (and it's
a big one for some groups because it could cause security problems), the
component catalog may be able to handle the other majority of the cases to
ensure that guids and identifiers stay reasonable over time. Some
sticky areas might be in handling changes to SelfReg registry keys that are
automatically extracted and other scenarios like that in which the content of
the component is more complicated than being merely a
file.
Cheers,
Mark.
|