-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 To me as someone always struggling with sql queries (since I am not an expert at all) this sound somehow promising. May be there will be a comprehensive set of queries going to pipes one day. Then I would simply have to pick up the pipe and connect it further. (I like this flavour of UNIX philosophy ;)
Greetings DrTrigon On 19.07.2012 22:34, Magnus Manske wrote: > On Thu, Jul 19, 2012 at 8:06 PM, Platonides <platoni...@gmail.com> > wrote: >> I'm not convinced about its utility. What tools would need >> combining? If I just need the results of a SQL query, it may be >> easier for me than using this system. Maybe a better interface >> would help. >> >> The use case I see more interesting is for taking a tool which >> outputs a list of pages and provide for input of another tool. >> Some page/user to work with seems to be the most common input. >> Maybe we should just standarize the input parameters and let some >> tools chain to another, simply using a special format parameter. >> >> For instance I usually use names like: art, lang, project such >> as: $_REQUEST += array('art'=>'', 'lang'=>'en', >> 'project'=>'wikipedia' ); > > I believe we mean the same thing; maybe I didn't describe the > asset thing very well. > > It's not for a "single page run" of some tool; one reason I chose > my CatScan rewrite as demo ist that it can run for a long time > (two-digit number of minutes), and generate a vast list of results > (tens of thousands of pages), depending on the query. The idea is > that (a) you're not "blocking" while waiting for that to finish, > before you can do something else; (b) you can access the results of > the run again, maybe if the subsequent tool fails, or you want to > try a different filter or subset, or a different subsequent tool > altogether; (c) you can define new data sources, maybe a tool where > you just paste in page titles, or another tool that gets the newest > 1.000 articles, or 1.000 random articles, or the last 1.000 > articles you edited, or /insert crazy idea here/, and all > subsequent tools will just run with it. > > And you can chain tools together via a single number; no file path > that the other guy doesn't have access to, no sql query that runs > for a few minutes every time (that is, /if/ your tool can be > reduced to that...), no massive paste orgy, no loss of meta-data > between tools. > > I also envision longer chains: Give me all articles that are in > both these two category trees; remove the ones that have images > (except template symbol icons, if possible); remove the ones that > have language links; remove the ones that had an edit less than a > month old; render that as wikitext. There's a subject-specific > "needs work" finder from simple components. UNIX philosophy at its > finest :-) > > >> Magnus Manske wrote: >>> Right now, a tool is started by this page via "nohup &"; that >>> could change to the job submission system, if that's possible >>> from the web servers, but right now it seems overly complicated >>> (runtime estimation? memory estimation? sql server access? >>> whatnot) The web page then returns the reserved output asset >>> ID, while the actual tool is running; another tool could thus >>> be "watching" asynchronously, by pulling the status every few >>> seconds. >> >> Yes, it can be called. I use it in a script for scheduling a >> cleanup of the created temporary files. >> >> The relevant code: >>> $dt = new DateTime( "now", new DateTimeZone( "UTC" ) ); $tmpdir >>> = dirname( __FILE__ ) . "/tmp"; @mkdir( $tmpdir, 0711 ); $shell >>> = "mktemp -d --tmpdir=" . escapeshellarg($tmpdir) . " >>> catdown.XXXXXXXX"; >>> >>> $tmpdir2 = trim( `$shell` ); // Program the folder destruction >>> // Note that qsub is 'slow' to return, so we perform it in the >>> background $dt->add( new DateInterval( "PT1H" ) ); exec( >>> "SGE_ROOT=/sge/GE qsub -a " . $dt->format("YmdHi.s") . " -wd " >>> . escapeshellarg( $tmpdir ) . " -j y -b y /bin/rm -r " . >>> escapeshellarg( $tmpdir2 ) . " 2>&1 &" ); > > > Thanks, that looks interesting. I'll play with it, thou I still > face the problem of estimating resource requirements for a tool by > a generic wrapper. /Shudder/ > > Cheers, Magnus > > _______________________________________________ Toolserver-l > mailing list (Toolserver-l@lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting > guidelines for this list: > https://wiki.toolserver.org/view/Mailing_list_etiquette > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlALzSAACgkQAXWvBxzBrDD+TgCfWRF59s6oaGaRANJW+NscTix3 Jl8AoOIoaqBPwV/NWw4TeIZhqvj14/Qx =t1Fk -----END PGP SIGNATURE----- _______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette