So, essentially:
 
- A file updated in-place doesn't warrant a new GUID?
 
When Mark was talking about 'hash' he was actually refering to md5 checksum. I think you're talking about some other kind of hash.
 
Another important point, for our specific case, is that our files are 'non-versionable' as far as I understand this. Meaning that in 99% of the cases they are not .exe or .dll, but .py, .txt, .xml etc.
 
Now that I've expanded this a bit, let me go full length.
 
In our case, a directory of .py files (a 'package') would be considered a 'component', and my previous understanding was that a updated file would warrant a new GUID, since it's essentially a 'new version' of the 'package'. Is that a bad assumption?
--
Sidnei da Silva
Enfold Systems                http://enfoldsystems.com
Fax +1 832 201 8856     Office +1 713 942 2377 Ext 214


De: [EMAIL PROTECTED] em nome de Derek Cicerone
Enviada: qua 17/5/2006 17:08
Para: 'James Carter'
Cc: 'Mark Hammond'; WiX-devs@lists.sourceforge.net
Assunto: RE: [WiX-devs] Re: [WiX-users] Extending Heat

I’m not sure if a moved file is all that interesting to be honest.  Moved files don’t need to keep the same component guid because it’s essentially a different resource at that point because it’s being installed to a different location.  I think the only thing that’s interesting is where a file will be installed (not necessarily where it came from).  If you make assumptions about a file being installed to the same location it came from, then this could work.  The hash doesn’t seem to really help because it doesn’t account for in-place file updates.  As far as Windows Installer is concerned, a file is identified by where it installs – it could completely change versions or content, but if it’s the same path, it’s the same file.

 

Derek

 


From: James Carter [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 17, 2006 2:14 AM
To: [EMAIL PROTECTED]
Cc: Mark Hammond; WiX-devs@lists.sourceforge.net
Subject: Re: [WiX-devs] Re: [WiX-users] Extending Heat

 

Mark, definitely what you are proposing is beyond what my design is intended to handle. I'm interested in hearing more of your thoughts on the subject.

Derek, I was wondering if you could explain why you don't think hashing would be good enough to tell if a file matched a previously shipped file? A scenario I can think of is if a file is kept the same but moved to a different directory and different component. If that happened, then you would have a false positive by using just the hash. You would need to take into account the original location of the file as well. I think that would do it. Thoughts?

On 5/16/06, Derek Cicerone <[EMAIL PROTECTED] > wrote:

Inline

The overall message here is that a component catalog can solve a very narrow
set of problems and it's a good first step but other logic is needed in
order to be able to safely generate setup every time (and I'm not sure its
entirely possible).  The catalog can protect against component rule
violations, but it cannot keep you from intentionally shipping a file,
probably won't handle components with non-file resources (like websites),
and it would be tricky to come up with an algorithm for identifying that a
file matched a previously shipped file (hashing isn't necessarily good
enough).

-----Original Message-----
From: Mark Hammond [mailto: [EMAIL PROTECTED]]
Sent: Tuesday, May 16, 2006 5:29 PM
To: [EMAIL PROTECTED]; 'James Carter';
WiX-devs@lists.sourceforge.net
Subject: RE: [WiX-devs] Re: [WiX-users] Extending Heat

Hi James and Derek,
  We are also playing with the design of a component library - it would be
great if we can put our efforts towards a single tool the entire community
can use.

There are a couple of things I'm not clear about - but I fear some of them
are simply a lack of understanding about MSI/WiX.

> -          Nothing should ever be removed from the database.  New
components
> and files should only be added as they are shipped.  The database needs
> to remember anything which you've shipped to ensure there are no
collisions.

Can you please elaborate on this requirement?  Why is a history necessary?
If a component changes and its GUID is accordingly changed, what collisions
are possible?
[DerekC] Very good question - this gets to the core of why a component
catalog is necessary in the first place.  The idea of a component catalog is
to capture _anything_ which shipped to a customer to ensure that there will
be no collisions for any customers (regardless of which version of your
product they are installing).  So anything which you consider to have
"shipped" needs to be stored in the component catalog.  While you are
developing a setup, its perfectly fine to trash the component rules because
there are no assumptions that the install isn't going to trash a machine
while its being developed.  But once you consider something "shipped" - it
needs to be supported.  Just to add more info here: if you consider creating
a build to be "shipping" because you support build-to-build upgrades or
something like that, then the component catalog should be updated for every
build.  If you only ship once a year, then that's when the component catalog
needs to be updated.

The way I see it, the primary role of a catalog can be reduced to 2 tasks:

* Check if the filesystem currently, exactly matches the previous
'expansion' of a component.
* If the file-system does not exactly match, generate new GUIDs and take a
new snapshot of the current 'expansion'

So I'm missing how the history is relevant to the tool (I see how the
history is useful to the developer - hence I propose the database format be
XML, suitable for use with their existing source control system)
[DerekC] The points above focus on the developer-related stability which is
desired from a component catalog: stable names and guids over time.
However, this is not very important in terms of component rule violations
which can ruin a customer's machine.  Another issue to think about here is
that guids need to be specific to a resource, but not necessarily a
particular version of a file.  This is a little complicated, so let me
explain: if you ship a file A to a directory X and then at some later point
in time the file is updated to A', it needs to keep the same component guid
to ensure MSI can properly handle the upgrade.  This is why merely hashing
files is not enough - it would actually miss the case of files being updated
in place.  What you really need to do is track files in terms of where they
will be installed on the user's machine.  Since component's are tracked
per-directory, you don't need to worry about the user being able to
re-target a directory.

> -          Components and files should be added to the database from the
msi
> files just prior to being shipped.  This ensures that the guids and
> identifiers which you care about in terms of the component rules are the
ones
> you are tracking.

I had pictured the database being updated as the MSI is being built, rather
than post build.  I see that from a process POV, people may find it useful
to be done post-build - but am I missing a reason it can't be done at build
time, with the updated component database checked into source-control at the
same time?

I'm not sure of any details of the design James has come up with, but for
the sake of discussion, I include our current thinking on the authoring
(note that our interest is for an open source project with many thousand
files).

A 'component specification' describes components in terms of wildcard
specifications.  For example, below is an example of a 2 specifications
(which is maintained in its own XML file)

<Shadow xmlns=" http://schemas.enfoldsystems.com/wix/shadow/catalog' >
    <!-- The component definitions -->
    <!-- ComponentGroup provides a logical "grouping" of components
         automatically.  Child nodes specify the source of files (eg
         Manifest allows wildcards to be specified, but one can imagine
         other techniques)
    -->
    <ComponentGroup Id="Zope">
        <Manifest style="component_per_directory">
<![CDATA[
            # here is a 'manifest' specifying the files to collect.
            recursive-include $(env.BUILDPATH)/Products/*.py
            recursive-exclude .svn CVS
]]>
        </Manifest>
    </ComponentGroup>
    <ComponentGroup Id="Python">
...
    </ComponentGroup>
</Shadow>

Note that the above is only the *specification* - the tool will need to
process this specification and maintain the database with the hash values,
file stat info etc for the expanded specification.  Even though the
resulting database is not designed to be human edited, I picture this
database also being XML to make it friendly to source control.

Below is an example WiX source file that references the catalog:

<Wix xmlns=" http://schemas.microsoft.com/wix/2003/01/wi"
     xmlns:se=' http://schemas.enfoldsystems.com/wix/shadow/expansion' >
     <!-- the 'se' namespace is (ab)used for the component catalog
expansion. -->
...
        <Feature Id="ProductFeature" Level="1" Title="Core Product">
                <!-- 'include' ComponentRef nodes for the catalog items -->
            <se:ComponentRefs Id="Python"/>
            <se:ComponentRefs Id="Zope"/>
        </Feature>
...
    <!-- 'include' the component definitions for the catalog items -->
    <se:ComponentDefs Id='Python'/>
    <se:ComponentDefs Id='Zope'/>
</Wix>

The extension 'simply' (ha!:) processes the tree, with anything in the 'se'
namespace substituting information from the catalog.  Our design does get a
little more complex when thinking about how to add additional Wix elements
to a file in the component specification (eg, shortcut to a .exe), but the
above describes the gist of it.

I'd appreciate all comments - including 'you are mad!' :)  Am I making this
too complex for its own good?  Is there something obvious I have overlooked?
Is your idea (James) anything like this?
[DerekC] The design here is way beyond a component catalog and well into
automatically generated setup.  I think it's a fine thing to strive towards,
but there are some inherent dangers in doing setup this way.  The most
visible problem you'll hit in this scenario is that new files will be added
as long as they fall within the specifications - so you need to worry about
accidentally shipping a file unintentionally.  This concern could be
mitigated by having checks against a known good file list (perhaps against
the source control server to ensure all the files were intentionally checked
in if they are under source control).  But its still risky.

Beyond that concern (and it's a big one for some groups because it could
cause security problems), the component catalog may be able to handle the
other majority of the cases to ensure that guids and identifiers stay
reasonable over time.  Some sticky areas might be in handling changes to
SelfReg registry keys that are automatically extracted and other scenarios
like that in which the content of the component is more complicated than
being merely a file.

Cheers,

Mark.

 

Reply via email to