Re: [sqlite] Populating dyadic dataset

Vincent Arel Mon, 01 Jun 2009 17:17:50 -0700

@ Igor Tandetnik: You are right, of course. I come from R, and it is quite
likely that my task would be accomplished more easily using that language.


@ John Stanton: Thank you for your suggestion, I will keep it in mind when
comes time to generate the ID entry.

@ John Machin: You are also right. Re-reading my post, I realize that my
problem was very poorly defined. Still, you managed to understand pretty
much exactly what I need.

Your trade statistics example was quite on point. In essence, I want to
generate blank entries for all possible "directed dyad-year" observations.
To answer your question, I plan to use this table as a "master-table" into
which I would merge observations from various sources. This explains why
there is no data at the moment.

>>> what's the point of having another (redundant) column?
Some of the statistical packages that I use need to identify clusters of
data (in this case directed dyads), but can only do so by reference to a
single variable in the dataset, not two.

Your python-like example is also quite helpful. As I understand it, you
basically implement Igor's suggestion of running loops on the vectors. I
should be able to do that quite easily. Most importantly though, it appears
that I need to revise my understanding of the division of labour between
sqlite and R.

Thanks a lot for your help, and have a great week!

Vincent


On Mon, Jun 1, 2009 at 7:55 PM, John Machin <[email protected]> wrote:

> On 2/06/2009 8:07 AM, Vincent Arel wrote:
> > Hi everyone,
> >
> > I'm very, very new to SQLite, and would appreciate any help I can get.
>
> Unless I'm very very confused, this has very little to do with SQL at
> all (let alone SQLite) apart from using an INSERT statement to dispose
> of the final product.
>
> You might be better off asking on a forum related to whatever language
> you are using.
>
> Your problem description is a tad ambiguous, so let's see if we can get
> that improved before we send you off elsewhere.
>
> Let's guess that you ultimately want to record two-way trade statistics
> for all possible1 country pairs for all possible2 years, for some
> definition of possible1 and possible2.
>
> >
> > I have 3 long vectors that look like this:
> > {"ALB","CAN", "DZA",...}
> > {"ALB","CAN", "DZA",...}
> > {"1961","1962", "1963",...}
> >
> > And I want to create a table that looks like this:
> >
> > ID    Var1    Var2    Var3
>
> Using meaningful names is strongly suggested ... e.g. from_country,
> to_country, trade_year
>
>
> > 1    ALB    CAN    1961
> > 1    ALB    CAN    1962
> > 1    ALB    CAN    1963
> > 2    ALB    DZA    1961
> > 2    ALB    DZA    1962
> > 2    ALB    DZA    1963
> > 3    CAN    ALB    1961
> > 3    CAN    ALB    1962
> > 3    CAN    ALB    1963
> > 4    CAN    DZA    1961
> > 4    CAN    DZA    1962
> > 4    CAN    DZA    1963
> > 5    DZA    ALB    1961
> > 5    DZA    ALB    1962
> > 5    DZA    ALB    1963
> > 6    DZA    CAN    1961
> > 6    DZA    CAN    1962
> > 6    DZA    CAN    1963
> >
> > In short, I need to include every possible pair of Var1/Var2 values
> (where
> > Var1/Var2 != Var2/Var1.
>
> Huh? Var1/Var2 == Var2/Var1 iff var1 == var2 ... if you mean "where var1
> != var2" why not say so? If you mean something else (possible since you
> later mention /also/ dropping rows where var1 == var2) then please explain.
>
> > I want to keep permutations.)
>
> What does that mean?
>
> > For each of these
> > pairs, I need to create separate rows for each different value of Var3.
>
> What if there are gaps in your third vector? Should they be filled in?
>
> > I also need to drop rows where Var1 == Var2.
>
> See above.
>
>  > Finally, I would like to generate
> > a unique ID number for each Var1/Var2 pair.
>
> Aha! A vague connection with SQL :-) Why bother? You require the (var1,
> var2) tuple to be unique anyway ... what's the point of having another
> (redundant) column?
>
> > Of course, I do not expect a ready-made answer. However, if some of you
> > could tell me where I should start looking for a solution to my problem,
> or
> > if you have any conceptual programming hints that could help me produce
> the
> > needed table, I would be extremely grateful.
>
> Expressing the "hints" in an expressive language like Python, and
> assuming the widest definitions of "possible", but basing it on your data:
>
> var1 = ["ALB", "CAN", "DZA", ...]
> var2 = ["ALB", "CAN", "DZA", ...]
> var3 = ["1961", "1962", "1963",...]
> # get a list of unique country codes, in sorted order
> countries = list(set(var1 + var2))
> countries.sort()
> # convert years to integer, find range
> var3int = [int(y) for y in var3]
> firsty = min(var3int)
> lasty = max(var3int)
> year_range = range(firsty, lasty + 1)
> # do the business
> id = 0
> ncountries = len(countries)
> for i in range(ncountries - 1):
>     for j in range(i + 1, ncountries):
>        assert countries[i] != countries[j]
>         id += 1
>         for year in year_range:
>             print id, countries[i], countries[j], year
>
> BTW, this is all keys ... where's the data?
>
> HTH,
> John
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Vincent Arel-Bundock
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Populating dyadic dataset

Reply via email to