"Rishabh Gupta" <[EMAIL PROTECTED]> wrote in a4eje9$ip8$[EMAIL PROTECTED]:">news:a4eje9$ip8$[EMAIL PROTECTED]:
> Hi All, > I'm a research student at the Department Of Electronics, University > Of > York, UK. I'm working a project related to music analysis and > classification. I am at the stage where I perform some analysis on > music files (currently only in MIDI format) and extract about 500 > variables that are related to music properties like pitch, rhythm, > polyphony and volume. I am performing basic analysis like mean and > standard deviation but then I also perform more elaborate analysis like > measuring complexity of melody and rhythm. > > The aim is that the variables obtained can be used to perform a number > of different operations. > - The variables can be used to classify / categorise each piece of > music, on its own, in terms of some meta classifier (e.g. rock, pop, > classical). > - The variables can be used to perform comparison between two > files. A > variable from one music file can be compared to the equivalent variable > in the other music file. By comparing all the variables in one file > with the equivalent variable in the other file, an overall similarity > measurement can be obtained. > > The next stage is to test the ability of the of the variables obtained > to perform the classification / comparison. I need to identify > variables that are redundant (redundant in the sense of 'they do not > provide any information' and 'they provide the same information as the > other variable') so that they can be removed and I need to identify > variables that are distinguishing (provide the most amount of > information). > > My Basic Questions Are: > - What are the best statistical techniques / methods that should be > applied here. E.g. I have looked at Principal Component Analysis; this > would be a good method to remove the redundant variables and hence > reduce some the amount of data that needs to be processed. Can anyone > suggest any other sensible statistical anaysis methods? > - What are the ideal tools / software to perform the clustering / > classification. I have access to SPSS software but I have never used it > before and am not really sure how to apply it or whether it is any good > when dealing with 100s of variables. > > So far I have been analysing each variable on its own 'by eye' by > plotting the mean and sd for all music files. However this approach is > not feasible in the long term since I am dealing with such a large > number of variables. In addition, by looking at each variable on its > own, I do not find clusters / patterns that are only visible through > multivariate analysis. If anyone can recommend a better approach I > would be greatly appreciated. > > Any help or suggestion that can be offered will be greatly appreciated. > > Many Thanks! > > Rishabh Gupta In SPSS, Factor Analysis would help you reduce your many variables down to bigger, more general ones. As well, Cluster Analysis will let you see how your variables group themselves. The results might look like the following: Factor 1: (percussiveness) volume of drums number of drum types drum melodies... Factor 2: (happiness) minor modes speed pitch... Factor 3: (memorableness) melodic structure folk music precursor.... The cluster analysis would be similar, but would have the variables on a branching tree that showed that speed and pitch were closer than drum type and folk precursor, say. Would be interesting to see how this works. I wonder if you could calculate some kind of fractal dimension for the music too? Doug H ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =================================================================