How it represents data internally is very important, depending on the real
goal :
http://en.wikipedia.org/wiki/Column-oriented_DBMS
Gabor Grothendieck ggrothendi...@gmail.com wrote in message
news:971536df1001271710o4ea62333l7f1230b860114...@mail.gmail.com...
How it represents data internally
Its only important internally. Externally its undesirable that the
user have to get involved in it. The idea of making software easy to
write and use is to hide the implementation and focus on the problem.
That is why we use high level languages, object orientation, etc.
On Thu, Jan 28, 2010 at
Are you claiming that SQL is that utopia? SQL is a row store. It cannot
give the user the benefits of column store.
For example, why does SQL take 113 seconds in the example in this thread :
http://tolstoy.newcastle.edu.au/R/e9/help/10/01/1872.html
but data.table takes 5 seconds to get the
I think one would only be concerned about such internals if one were
primarily interested in performance; otherwise, one would be more
interested in ease of specification and part of that ease is having it
independent of implementation and separating implementation from
specification activities.
I'm talking about ease of use to. The first line of the Details section in
?[.data.table says :
Builds on base R functionality to reduce 2 types of time :
1. programming time (easier to write, read, debug and maintain)
2. compute time
Once again, I am merely saying that the
Regarding the explanation of where the time goes it might be parsing
the statement or the development of the query plan. The SQL statement
for the more complex query is obviously much longer and its generated
query plan involves 95 lines of byte code vs 19 lines of generated
code for the simpler
I have a table (contact) with several fields and it's PK is an auto
increment field. I'm bulk loading data to this table from files which if
successful will be about 3.5million rows (approx 16000 rows per file).
However, I have a linking table (an_contact) to resolve a m:m
relationship between
Hi Nathan,
I have a table (contact) with several fields and it's PK is an auto
increment field. I'm bulk loading data to this table from files
which if successful will be about 3.5million rows (approx 16000 rows
per file). However, I have a linking table (an_contact) to resolve a
m:m
On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com wrote:
How many columns, and of what type are the columns ? As Olga asked too, it
would be useful to know more about what you're really trying to do.
3.5m rows is not actually that many rows, even for 32bit R. Its depends
sqldf(select * from BOD order by Time desc limit 3)
Exactly. SQL requires use of order by. It knows the order, but it isn't
ordered. Thats not good, but might be fine, depending on what the real goal
is.
Gabor Grothendieck ggrothendi...@gmail.com wrote in message
How it represents data internally should not be important as long as
you can do what you want. SQL is declarative so you just specify what
you want rather than how to get it and invisibly to the user it
automatically draws up a query plan and then uses that plan to get the
result.
On Wed, Jan
11 matches
Mail list logo