On Thu, Jul 19, 2012 at 8:06 PM, Platonides <platoni...@gmail.com> wrote:
> I'm not convinced about its utility. What tools would need combining?
> If I just need the results of a SQL query, it may be easier for me than
> using this system. Maybe a better interface would help.
>
> The use case I see more interesting is for taking a tool which outputs a
> list of pages and provide for input of another tool. Some page/user to
> work with seems to be the most common input.
> Maybe we should just standarize the input parameters and let some tools
> chain to another, simply using a special format parameter.
>
> For instance I usually use names like: art, lang, project such as:
>  $_REQUEST += array('art'=>'', 'lang'=>'en', 'project'=>'wikipedia' );

I believe we mean the same thing; maybe I didn't describe the asset
thing very well.

It's not for a "single page run" of some tool; one reason I chose my
CatScan rewrite as demo ist that it can run for a long time (two-digit
number of minutes), and generate a vast list of results (tens of
thousands of pages), depending on the query. The idea is that (a)
you're not "blocking" while waiting for that to finish, before you can
do something else; (b) you can access the results of the run again,
maybe if the subsequent tool fails, or you want to try a different
filter or subset, or a different subsequent tool altogether; (c) you
can define new data sources, maybe a tool where you just paste in page
titles, or another tool that gets the newest 1.000 articles, or 1.000
random articles, or the last 1.000 articles you edited, or /insert
crazy idea here/, and all subsequent tools will just run with it.

And you can chain tools together via a single number; no file path
that the other guy doesn't have access to, no sql query that runs for
a few minutes every time (that is, /if/ your tool can be reduced to
that...), no massive paste orgy, no loss of meta-data between tools.

I also envision longer chains: Give me all articles that are in both
these two category trees; remove the ones that have images (except
template symbol icons, if possible); remove the ones that have
language links; remove the ones that had an edit less than a month
old; render that as wikitext. There's a subject-specific "needs work"
finder from simple components. UNIX philosophy at its finest :-)


> Magnus Manske wrote:
>> Right now, a tool is started by this page via "nohup &"; that could
>> change to the job submission system, if that's possible from the web
>> servers, but right now it seems overly complicated (runtime
>> estimation? memory estimation? sql server access? whatnot)
>> The web page then returns the reserved output asset ID, while the
>> actual tool is running; another tool could thus be "watching"
>> asynchronously, by pulling the status every few seconds.
>
> Yes, it can be called. I use it in a script for scheduling a cleanup of
> the created temporary files.
>
> The relevant code:
>> $dt = new DateTime( "now", new DateTimeZone( "UTC" ) );
>> $tmpdir = dirname( __FILE__ ) . "/tmp";
>> @mkdir( $tmpdir, 0711 );
>> $shell = "mktemp -d --tmpdir=" . escapeshellarg($tmpdir) . " 
>> catdown.XXXXXXXX";
>>
>> $tmpdir2 = trim( `$shell` );
>> // Program the folder destruction
>> // Note that qsub is 'slow' to return, so we perform it in the background
>> $dt->add( new DateInterval( "PT1H" ) );
>> exec( "SGE_ROOT=/sge/GE qsub -a " . $dt->format("YmdHi.s") . " -wd " . 
>> escapeshellarg( $tmpdir ) . " -j y -b y /bin/rm -r " . escapeshellarg( 
>> $tmpdir2 ) . " 2>&1 &" );


Thanks, that looks interesting. I'll play with it, thou I still face
the problem of estimating resource requirements for a tool by a
generic wrapper. /Shudder/

Cheers,
Magnus

_______________________________________________
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Reply via email to