Re: [Toolserver-l] Wpipe

Dr. Trigon Sun, 22 Jul 2012 02:51:40 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

To me as someone always struggling with sql queries (since I am not an
expert at all) this sound somehow promising. May be there will be a
comprehensive set of queries going to pipes one day. Then I would simply
have to pick up the pipe and connect it further. (I like this flavour
of UNIX philosophy ;)


Greetings
DrTrigon

On 19.07.2012 22:34, Magnus Manske wrote:
> On Thu, Jul 19, 2012 at 8:06 PM, Platonides <platoni...@gmail.com>
> wrote:
>> I'm not convinced about its utility. What tools would need
>> combining? If I just need the results of a SQL query, it may be
>> easier for me than using this system. Maybe a better interface
>> would help.
>> 
>> The use case I see more interesting is for taking a tool which
>> outputs a list of pages and provide for input of another tool.
>> Some page/user to work with seems to be the most common input. 
>> Maybe we should just standarize the input parameters and let some
>> tools chain to another, simply using a special format parameter.
>> 
>> For instance I usually use names like: art, lang, project such
>> as: $_REQUEST += array('art'=>'', 'lang'=>'en',
>> 'project'=>'wikipedia' );
> 
> I believe we mean the same thing; maybe I didn't describe the
> asset thing very well.
> 
> It's not for a "single page run" of some tool; one reason I chose
> my CatScan rewrite as demo ist that it can run for a long time
> (two-digit number of minutes), and generate a vast list of results
> (tens of thousands of pages), depending on the query. The idea is
> that (a) you're not "blocking" while waiting for that to finish,
> before you can do something else; (b) you can access the results of
> the run again, maybe if the subsequent tool fails, or you want to
> try a different filter or subset, or a different subsequent tool
> altogether; (c) you can define new data sources, maybe a tool where
> you just paste in page titles, or another tool that gets the newest
> 1.000 articles, or 1.000 random articles, or the last 1.000
> articles you edited, or /insert crazy idea here/, and all
> subsequent tools will just run with it.
> 
> And you can chain tools together via a single number; no file path 
> that the other guy doesn't have access to, no sql query that runs
> for a few minutes every time (that is, /if/ your tool can be
> reduced to that...), no massive paste orgy, no loss of meta-data
> between tools.
> 
> I also envision longer chains: Give me all articles that are in
> both these two category trees; remove the ones that have images
> (except template symbol icons, if possible); remove the ones that
> have language links; remove the ones that had an edit less than a
> month old; render that as wikitext. There's a subject-specific
> "needs work" finder from simple components. UNIX philosophy at its
> finest :-)
> 
> 
>> Magnus Manske wrote:
>>> Right now, a tool is started by this page via "nohup &"; that
>>> could change to the job submission system, if that's possible
>>> from the web servers, but right now it seems overly complicated
>>> (runtime estimation? memory estimation? sql server access?
>>> whatnot) The web page then returns the reserved output asset
>>> ID, while the actual tool is running; another tool could thus
>>> be "watching" asynchronously, by pulling the status every few
>>> seconds.
>> 
>> Yes, it can be called. I use it in a script for scheduling a
>> cleanup of the created temporary files.
>> 
>> The relevant code:
>>> $dt = new DateTime( "now", new DateTimeZone( "UTC" ) ); $tmpdir
>>> = dirname( __FILE__ ) . "/tmp"; @mkdir( $tmpdir, 0711 ); $shell
>>> = "mktemp -d --tmpdir=" . escapeshellarg($tmpdir) . "
>>> catdown.XXXXXXXX";
>>> 
>>> $tmpdir2 = trim( `$shell` ); // Program the folder destruction 
>>> // Note that qsub is 'slow' to return, so we perform it in the
>>> background $dt->add( new DateInterval( "PT1H" ) ); exec(
>>> "SGE_ROOT=/sge/GE qsub -a " . $dt->format("YmdHi.s") . " -wd "
>>> . escapeshellarg( $tmpdir ) . " -j y -b y /bin/rm -r " .
>>> escapeshellarg( $tmpdir2 ) . " 2>&1 &" );
> 
> 
> Thanks, that looks interesting. I'll play with it, thou I still
> face the problem of estimating resource requirements for a tool by
> a generic wrapper. /Shudder/
> 
> Cheers, Magnus
> 
> _______________________________________________ Toolserver-l
> mailing list (Toolserver-l@lists.wikimedia.org) 
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting
> guidelines for this list:
> https://wiki.toolserver.org/view/Mailing_list_etiquette
> 


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlALzSAACgkQAXWvBxzBrDD+TgCfWRF59s6oaGaRANJW+NscTix3
Jl8AoOIoaqBPwV/NWw4TeIZhqvj14/Qx
=t1Fk
-----END PGP SIGNATURE-----

_______________________________________________
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wpipe

Reply via email to