Raymond Hettinger wrote:
[Scott David Daniels]
def most_frequent(arr, N): ...
In Py2.4 and later, see heapq.nlargest().
I should have remembered this one
In Py3.1, see collections.Counter(data).most_common(n)
This one is from Py3.2, I think.
--Scott David Daniels
scott.dani...@acm.org
--
[Scott David Daniels]
> def most_frequent(arr, N):
> '''Return the top N (freq, val) elements in arr'''
> counted = frequency(arr) # get an iterator for freq-val pairs
> heap = []
> # First, just fill up the array with the first N distinct
> for i in range(N):
> tr
"mclovin" wrote in message
news:c5332c9b-2348-4194-bfa0-d70c77107...@x3g2000yqa.googlegroups.com...
> Currently I need to find the most common elements in thousands of
> arrays within one large array (arround 2 million instances with ~70k
> unique elements)
>
> so I set up a dictionary to handle
Peter Otten wrote:
Scott David Daniels wrote:
Scott David Daniels wrote:
t = timeit.Timer('sum(part[:-1]==part[1:])',
'from __main__ import part')
What happens if you calculate the sum in numpy? Try
t = timeit.Timer('(part[:-1]==part[1:]).sum()',
Scott David Daniels wrote:
> Scott David Daniels wrote:
> t = timeit.Timer('sum(part[:-1]==part[1:])',
> 'from __main__ import part')
What happens if you calculate the sum in numpy? Try
t = timeit.Timer('(part[:-1]==part[1:]).sum()',
'from __main__ im
On Sun, 05 Jul 2009 17:30:58 -0700, Scott David Daniels wrote:
> Summary: when dealing with numpy, (or any bulk <-> individual values
> transitions), try several ways that you think are equivalent and
> _measure_.
This advice is *much* more general than numpy -- it applies to any
optimization ex
Scott David Daniels wrote:
... Here's a heuristic replacement for my previous frequency code:
I've tried to mark where you could fudge numbers if the run time
is at all close.
Boy, I cannot let go. I did a bit of a test checking for cost to
calculated number of discovered samples, and found af
On Sat, 04 Jul 2009 07:19:48 -0700, Scott David Daniels wrote:
> Actually the next step is to maintain a min-heap as you run down the
> sorted array. Something like:
Not bad.
I did some tests on it, using the following sample data:
arr = np.array([xrange(i, i+7000) for i in xrange(143)] +
On Sat, 04 Jul 2009 15:06:29 -0700, mclovin wrote:
> like I said I need to do this 480,000 times so to get this done
> realistically I need to analyse about 5 a second. It appears that the
> average matrix size contains about 15 million elements.
Have you considered recording the element counts a
On 7/4/2009 12:33 AM mclovin said...
Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)
so I set up a dictionary to handle the counting so when I am
iterating I
** up the count on the corro
mclovin wrote:
On Jul 4, 3:29 pm, MRAB wrote:
mclovin wrote:
[snip]
like I said I need to do this 480,000 times so to get this done
realistically I need to analyse about 5 a second. It appears that the
average matrix size contains about 15 million elements.
I threaded my program using your c
mclovin wrote:
On Jul 4, 12:51 pm, Scott David Daniels wrote:
mclovin wrote:
OK then. I will try some of the strategies here but I guess things
arent looking too good. I need to run this over a dataset that someone
pickled. I need to run this 480,000 times so you can see my
frustration. So it
On Jul 4, 3:29 pm, MRAB wrote:
> mclovin wrote:
>
> [snip]
>
> > like I said I need to do this 480,000 times so to get this done
> > realistically I need to analyse about 5 a second. It appears that the
> > average matrix size contains about 15 million elements.
>
> > I threaded my program using y
mclovin wrote:
[snip]
like I said I need to do this 480,000 times so to get this done
realistically I need to analyse about 5 a second. It appears that the
average matrix size contains about 15 million elements.
I threaded my program using your code and I did about 1,000 in an hour
so it is stil
On Jul 4, 12:51 pm, Scott David Daniels wrote:
> mclovin wrote:
> > OK then. I will try some of the strategies here but I guess things
> > arent looking too good. I need to run this over a dataset that someone
> > pickled. I need to run this 480,000 times so you can see my
> > frustration. So it d
mclovin wrote:
OK then. I will try some of the strategies here but I guess things
arent looking too good. I need to run this over a dataset that someone
pickled. I need to run this 480,000 times so you can see my
frustration. So it doesn't need to be "real time" but it would be nice
it was done s
mclovin wrote:
> OK then. I will try some of the strategies here but I guess things
> arent looking too good. I need to run this over a dataset that someone
> pickled. I need to run this 480,000 times so you can see my
> frustration. So it doesn't need to be "real time" but it would be nice
> it wa
2009/7/4 Steven D'Aprano :
> On Sat, 04 Jul 2009 13:42:06 +, Steven D'Aprano wrote:
>
>> On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote:
>>
>>> 2009/7/4 Andre Engels :
On Sat, Jul 4, 2009 at 9:33 AM, mclovin wrote:
> Currently I need to find the most common elements in thousand
OK then. I will try some of the strategies here but I guess things
arent looking too good. I need to run this over a dataset that someone
pickled. I need to run this 480,000 times so you can see my
frustration. So it doesn't need to be "real time" but it would be nice
it was done sorting this month
On Sat, 04 Jul 2009 13:42:06 +, Steven D'Aprano wrote:
> On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote:
>
>> 2009/7/4 Andre Engels :
>>> On Sat, Jul 4, 2009 at 9:33 AM, mclovin wrote:
Currently I need to find the most common elements in thousands of
arrays within one large
Vilya Harvey wrote:
2009/7/4 Andre Engels :
On Sat, Jul 4, 2009 at 9:33 AM, mclovin wrote:
Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)...
Try flattening the arrays into a single large ar
On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote:
> 2009/7/4 Andre Engels :
>> On Sat, Jul 4, 2009 at 9:33 AM, mclovin wrote:
>>> Currently I need to find the most common elements in thousands of
>>> arrays within one large array (arround 2 million instances with ~70k
>>> unique elements)
..
You can join all your arrays into a single big array with concatenate.
>>> import numpy as np
>>> a = np.concatenate(array_of_arrays)
Then count the number of occurrances of each unique element using this trick
with searchsorted. This should be pretty fast.
>>> a.sort()
>>> unique_a = np.unique(
2009/7/4 Andre Engels :
> On Sat, Jul 4, 2009 at 9:33 AM, mclovin wrote:
>> Currently I need to find the most common elements in thousands of
>> arrays within one large array (arround 2 million instances with ~70k
>> unique elements)
>>
>> so I set up a dictionary to handle the counting so when I a
On Sat, Jul 4, 2009 at 9:33 AM, mclovin wrote:
> Currently I need to find the most common elements in thousands of
> arrays within one large array (arround 2 million instances with ~70k
> unique elements)
>
> so I set up a dictionary to handle the counting so when I am
> iterating I up the count o
On Sat, Jul 4, 2009 at 12:33 AM, mclovin wrote:
> Currently I need to find the most common elements in thousands of
> arrays within one large array (arround 2 million instances with ~70k
> unique elements)
>
> so I set up a dictionary to handle the counting so when I am
> iterating I up the count
Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)
so I set up a dictionary to handle the counting so when I am
iterating I up the count on the corrosponding dictionary element. I
then iterate thr
27 matches
Mail list logo