@ Igor Tandetnik: You are right, of course. I come from R, and it is quite likely that my task would be accomplished more easily using that language.
@ John Stanton: Thank you for your suggestion, I will keep it in mind when comes time to generate the ID entry. @ John Machin: You are also right. Re-reading my post, I realize that my problem was very poorly defined. Still, you managed to understand pretty much exactly what I need. Your trade statistics example was quite on point. In essence, I want to generate blank entries for all possible "directed dyad-year" observations. To answer your question, I plan to use this table as a "master-table" into which I would merge observations from various sources. This explains why there is no data at the moment. >>> what's the point of having another (redundant) column? Some of the statistical packages that I use need to identify clusters of data (in this case directed dyads), but can only do so by reference to a single variable in the dataset, not two. Your python-like example is also quite helpful. As I understand it, you basically implement Igor's suggestion of running loops on the vectors. I should be able to do that quite easily. Most importantly though, it appears that I need to revise my understanding of the division of labour between sqlite and R. Thanks a lot for your help, and have a great week! Vincent On Mon, Jun 1, 2009 at 7:55 PM, John Machin <[email protected]> wrote: > On 2/06/2009 8:07 AM, Vincent Arel wrote: > > Hi everyone, > > > > I'm very, very new to SQLite, and would appreciate any help I can get. > > Unless I'm very very confused, this has very little to do with SQL at > all (let alone SQLite) apart from using an INSERT statement to dispose > of the final product. > > You might be better off asking on a forum related to whatever language > you are using. > > Your problem description is a tad ambiguous, so let's see if we can get > that improved before we send you off elsewhere. > > Let's guess that you ultimately want to record two-way trade statistics > for all possible1 country pairs for all possible2 years, for some > definition of possible1 and possible2. > > > > > I have 3 long vectors that look like this: > > {"ALB","CAN", "DZA",...} > > {"ALB","CAN", "DZA",...} > > {"1961","1962", "1963",...} > > > > And I want to create a table that looks like this: > > > > ID Var1 Var2 Var3 > > Using meaningful names is strongly suggested ... e.g. from_country, > to_country, trade_year > > > > 1 ALB CAN 1961 > > 1 ALB CAN 1962 > > 1 ALB CAN 1963 > > 2 ALB DZA 1961 > > 2 ALB DZA 1962 > > 2 ALB DZA 1963 > > 3 CAN ALB 1961 > > 3 CAN ALB 1962 > > 3 CAN ALB 1963 > > 4 CAN DZA 1961 > > 4 CAN DZA 1962 > > 4 CAN DZA 1963 > > 5 DZA ALB 1961 > > 5 DZA ALB 1962 > > 5 DZA ALB 1963 > > 6 DZA CAN 1961 > > 6 DZA CAN 1962 > > 6 DZA CAN 1963 > > > > In short, I need to include every possible pair of Var1/Var2 values > (where > > Var1/Var2 != Var2/Var1. > > Huh? Var1/Var2 == Var2/Var1 iff var1 == var2 ... if you mean "where var1 > != var2" why not say so? If you mean something else (possible since you > later mention /also/ dropping rows where var1 == var2) then please explain. > > > I want to keep permutations.) > > What does that mean? > > > For each of these > > pairs, I need to create separate rows for each different value of Var3. > > What if there are gaps in your third vector? Should they be filled in? > > > I also need to drop rows where Var1 == Var2. > > See above. > > > Finally, I would like to generate > > a unique ID number for each Var1/Var2 pair. > > Aha! A vague connection with SQL :-) Why bother? You require the (var1, > var2) tuple to be unique anyway ... what's the point of having another > (redundant) column? > > > Of course, I do not expect a ready-made answer. However, if some of you > > could tell me where I should start looking for a solution to my problem, > or > > if you have any conceptual programming hints that could help me produce > the > > needed table, I would be extremely grateful. > > Expressing the "hints" in an expressive language like Python, and > assuming the widest definitions of "possible", but basing it on your data: > > var1 = ["ALB", "CAN", "DZA", ...] > var2 = ["ALB", "CAN", "DZA", ...] > var3 = ["1961", "1962", "1963",...] > # get a list of unique country codes, in sorted order > countries = list(set(var1 + var2)) > countries.sort() > # convert years to integer, find range > var3int = [int(y) for y in var3] > firsty = min(var3int) > lasty = max(var3int) > year_range = range(firsty, lasty + 1) > # do the business > id = 0 > ncountries = len(countries) > for i in range(ncountries - 1): > for j in range(i + 1, ncountries): > assert countries[i] != countries[j] > id += 1 > for year in year_range: > print id, countries[i], countries[j], year > > BTW, this is all keys ... where's the data? > > HTH, > John > _______________________________________________ > sqlite-users mailing list > [email protected] > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Vincent Arel-Bundock _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

