Pro tip. I've seen both push based systems and pull based systems at
work. The push based systems tend to break whenever the thing that
you're pushing to has problems. Pull-based systems tend to be much
more reliable in my experience.
You have described a push-based system. I would therefore
Ben Tilly emitted:
Pro tip. I've seen both push based systems and pull based systems at
work. The
push based systems tend to break whenever the thing that you're pushing to
has problems. Pull-based systems tend to be much more reliable in my
experience.
[...]
If you disregard this tip,
Queuing systems aren't really new or 'technofrippery'. In-memory FIFO
stacks are ridiculously fast compared to transaction safe rdbms' for this
simple purpose. Databases incur a lot of overhead for wonderful things
that don't aid this cause.
This isn't magic, sometimes it's just the right tool
On Fri, Apr 5, 2013 at 12:04 PM, John Redford eire...@hotmail.com wrote:
Ben Tilly emitted:
Pro tip. I've seen both push based systems and pull based systems at
work. The
push based systems tend to break whenever the thing that you're pushing to
has problems. Pull-based systems tend to be
Anthony Caravello writes:
Queuing systems aren't really new or 'technofrippery'. In-memory FIFO
stacks
are ridiculously fast compared to transaction safe rdbms' for this simple
purpose. Databases incur a lot of overhead for wonderful things that
don't aid
this cause.
No one said queuing
Ben Tilly expands:
On Fri, Apr 5, 2013 at 12:04 PM, John Redford eire...@hotmail.com wrote:
Your writing is FUD.
Are you reading something into what I wrote that wasn't there?
Because I'm pretty sure that what I wrote isn't FUD.
It was. Ask anyone. I'm not your English tutor.
A
I bow to you. I've been on this list for a long time and figured my 20
years of development and engineering experience might be of assistance and
for the first time I offered it. From now on, you should answer all the
questions.
-unsubscribe
On Apr 5, 2013 6:05 PM, John Redford
On Apr 3, 2013, at 10:34 AM, David Larochelle da...@larochelle.name wrote:
Currently, the driver process periodically queries a database to get a
list of URLs to crawler. It then stores these url's to be downloaded in a
complex in memory and pipes them to separate processes that do the actual
Thanks for all the feedback. I left out a lot of details about the system
because I didn't want to complicate things.
The purpose of the system is comprehensively study online media. We need
the system to run 24 hours a day to download news articles in media sources
such as the New York Times. We
This sounds like a perfect fit for a queuing service like RabbitMQ.
Logstash uses Redis lists for this as it's simple to setup and pretty
reliable, but there are many such applications available. The queue's
would allow multiple backend processes to check for and take items as they
became
On Thu, Apr 04, 2013 at 04:21:54PM -0400, David Larochelle wrote:
My hope is to split the engine process into two pieces that ran in
parallel: one to query the database and another to send downloads to
fetchers. This way it won't matter how long the db query takes as long as
we can get URLs
David Larochelle wrote:
[...]
We're using Postgresql 8.4 and running on Ubuntu. Almost all data is
stored in
the database. The system contains a list of media sources with associated
RSS
feeds. We have a downloads table that has all of the URLs that we want to
download or have downloaded in
I'm trying to optimize a database driven web crawler and I was wondering if
anyone could offer any recommendations for interprocess communications.
Currently, the driver process periodically queries a database to get a
list of URLs to crawler. It then stores these url's to be downloaded in a
not_speaking_for_the_firm
-Original Message-
From: Boston-pm [mailto:boston-pm-bounces+william.ricker=fmr@mail.pm.org]
On Behalf Of David Larochelle
Sent: Wednesday, April 03, 2013 10:34 AM
To: Boston Perl Mongers
Subject: [Boston.pm] Passing large complex data structures between process
I'm trying
To: Boston Perl Mongers
Subject: [Boston.pm] Passing large complex data structures between process
I'm trying to optimize a database driven web crawler and I was wondering if
anyone could offer any recommendations for interprocess communications.
Currently, the driver process periodically queries
-pm-bounces+william.ricker=
fmr@mail.pm.org] On Behalf Of David Larochelle
Sent: Wednesday, April 03, 2013 10:34 AM
To: Boston Perl Mongers
Subject: [Boston.pm] Passing large complex data structures between process
I'm trying to optimize a database driven web crawler and I was wondering
:34 AM
To: Boston Perl Mongers
Subject: [Boston.pm] Passing large complex data structures between
process
I'm trying to optimize a database driven web crawler and I was wondering
if
anyone could offer any recommendations for interprocess communications.
Currently, the driver process
On Wed, Apr 03, 2013 at 10:34:17AM -0400, David Larochelle wrote:
I'm trying to optimize a database driven web crawler and I was wondering if
anyone could offer any recommendations for interprocess communications.
Currently, the driver process periodically queries a database to get a
list
Another option that I've used in similar situations:
1. have a process hit the database and generate a storable of the data
2. have multiple crawlers execute and unfreeze the storable into memory
3. do what you need to do with the data, pushing back to the database when
necessary.
Instead of
19 matches
Mail list logo