sounds interesting! i've got something like
for irow in eachrow(df)
# do stuff on irow
# when done, copy to master data
push!(master,array(irow))
end
the point is that I have to maintain df as it is for the next iteration, so
I'm forced to copy the data somewhere else.
can you give m
So you have a stream that produces rows one after another? I feel like this
might be a place for an abstraction like collect.
-- John
On Sep 12, 2014, at 3:51 PM, Florian Oswald wrote:
> it's literally just a single row. I have a master_df where I collect the
> data. then there is a smaller
it's literally just a single row. I have a master_df where I collect the
data. then there is a smaller groups_df on which I do repeatedly some
operations on rows, and after each operation on a single row I want to copy
it to the master_df with
push!(master_df, array( current_row_of_groups_df ) )
Well, slow might be a little unfair. Are you transferring only a subset of rows
from the other DataFrame? If so, this might be a good approach. If you're
copying the whole thing, it seems a lot slower.
-- John
On Sep 12, 2014, at 3:42 PM, Florian Oswald wrote:
> oh, i didnt' know it's slow.
oh, i didnt' know it's slow. yes in my case it's a way of transferring a
row from one df to another. what's a better way of doing this?
On 12 September 2014 22:39, John Myles White
wrote:
> What does that mean? A DataFrameRow can't be easily created without
> reference to an existing DataFrame,
What does that mean? A DataFrameRow can't be easily created without reference
to an existing DataFrame, so this seems like it's either a mechanism for
transferring rows from one DataFrame to another very slowly or a mechanism for
inserting duplicate rows.
-- John
On Sep 12, 2014, at 3:37 PM,
I'll submit a PR for Base.append!(adf::AbstracDataFrame,dfr::DataFrameRow)
unless you tell me that's useless.
On 12 September 2014 22:31, Florian Oswald wrote:
> Leah: yeah that works. but i think i almost prefer my previous solution,
> instead of this
> push!(df2,[v for (_,v) in e])
> that:
Leah: yeah that works. but i think i almost prefer my previous solution,
instead of this
push!(df2,[v for (_,v) in e])
that:
push!(df2,array(e))
not sure about the performance implications though.
On 12 September 2014 22:18, Gray Calhoun wrote:
> Oh, I wasn't thinking of that. Good p
Oh, I wasn't thinking of that. Good point. A mutating OrderedDict constructor
would allow reuse, but isn't as generic.
Doing a convert(OrderedDict, DataFrameRow) seems like it's going to be a much
worse performance hit than copying everything into a specific OrderedDict
that's reused, because you're going to allocate memory for a new OrderedDict
object on every iteration.
-- John
On Sep 12, 2014, at 2:44 PM,
Probably not in most, you're right.
Can't you get generic code as long as a method to convert to OrderedDict is
supplied, though?
When you don't need anything more specific, convert the dataframe row to an
OrderedDict, then either work with that object or convert it into a more
appropriate int
I'm not sure that losing zero copy semantics is actually a big performance hit
in most pipelines.
I think much more important is that you can't write generic code right now
because the abstractions aren't linked in any way. The rows you fetch from a
database using DBI aren't mutable, whereas th
It seems like standardizing on "convert" would be a natural approach when
one needs to go from one to the other. I don't know the DBI semantics, but
myrow = convert(Dict, mydataframerow)
myrow2 = convert(OrderedDict, mydataframerow),
etc is transparent and lets different data storage object
We really need to standardize on a single type that reflects a single row of a
tabular data structure that gets used both by DBI and by DataFrames.
DataFrameRow is really nice because it's a zero-copy operation for DataFrames,
but we can't provide zero-copy semantics when pulling rows out of a d
Oh, I didn't realize that. So, `eachrow(df)` is giving you
`[(:a,"hi"),(:x,0.703943)]` when you need `["hi",0.703943]` to use `push!`.
~~~
julia> df = DataFrame(a=["hi","there"],x = rand(2))
2x2 DataFrame
|---|-|--|
| Row # | a | x|
| 1 | "hi"| 0.703943 |
yeah I wasn't very clear in that example. i really need to append one row
at a time.
On 12 September 2014 14:50, Leah Hanson wrote:
> Have you tried append!(df2,df)?
>
> ~~~
> julia> using DataFrames
>
>
>
> julia> df = DataFrame(a=["hi","there"],x = rand(2))
>
>
> 2x2 DataFrame
> |---|-
Have you tried append!(df2,df)?
~~~
julia> using DataFrames
julia> df = DataFrame(a=["hi","there"],x = rand(2))
2x2 DataFrame
|---|-|--|
| Row # | a | x|
| 1 | "hi"| 0.862957 |
| 2 | "there" | 0.101378 |
julia> df2 = DataFrame(a=["oh","yeah"],x
i'm trying to do this:
using DataFrames
df = DataFrame(a=["hi","there"],x = rand(2))
df2 = DataFrame(a=["oh","yeah"],x = rand(2))
for e in eachrow(df)
append!(df2,e)
end
ERROR: `append!` has no method matching append!(::DataFrame,
::DataFrameRow{DataFrame})
in anonymous at no file:2
or
juli
18 matches
Mail list logo