The scholar, focusing on the mathematics, admonishes the "pragmatic
idealist":
>> That is not a problem. That is an algorithm whose first step is not even
clear:
>> a Google search on {stats "view all sites"} returns a "normal response", a
list of 224,000 pages from different websites.
Magic Banana politely requested:
>> please express the actual problem clearly. [paraphrasing] in less than ten
paragraphs
The folowing scheme worked OK with a list of about a million visitors'
hostnames.
(1) Collect Recent Visitor data with a Google search on {stats "view all
sites"}.
Hmmm. We seem both to be writing at once ...
Magic Banana is saying:
Quoting amenex:
> I want to guard against double-counting, as with 01j01.txt or 01j02.txt vs
02j01.txt, and that requires
> some heavy-duty concentration.
>> "My" solution (since my first post in this thread) joins one
Magic Banana may be re-stating my objective differently than I have been
stating it:
>> Also, if, in my previous post, I have understood what you wanted to do
with Leafpad and 43 manual executions
>> (i.e., "join every file with the union of all other files"), here is a
slightly modified
Magic Banana requested clarification of my future plans:
In reply to what I said:
> Repeating it for the other 43 combinations should now be a breeze, as I can
switch the file names around with Leafpad.
>> I am not sure I understand what you want to do (join every file with the
union of
Magic Banana asked:
>> Here is what I actually proposed (where filenames are appended *before*
the join):
>>
https://trisquel.info/forum/grep-consumes-all-my-ram-and-swap-ding-big-job#comment-142474
When I collapsed the script into a one-liner, it would start ... but nothing
ensued for
Using awk to execute the series of join commands described above produces
only syntax errors.
Here are those commands:
> join -1 2 -2 2 join -1 2 -2 2 join -1 2 -2 2 join -1 2 -2 2 join -1 2
-2 2 join -1 2 -2 2 join -1 2 -2 2 join -1 2 -2 2 join -1 2 -2 2 join -1
2 -2 2 join -1
Taking Magic Banana's cue, I applied the join command in round-robin fashion:
> HNs.Ed.tropic.ssec.wisc.edu.txt join -1 2 -2 2 join -1 2 -2 2 join -1 2
-2 2 join -1 2 -2 2 join -1 2 -2 2 time cat `ls
Followup:
Starting with the setup script:
> time awk < HNs.bst.lt.txt '{print $2}' > HNusage/HNs.bst.lt/temp
When the grep command acts on a group of files that have been sorted, the
script works much more quickly as
well as utilizing much less RAM without need of swap support:
> time grep
Magic Banana wonders:
> However, your input looks wrong: on line 674 of HNs.bst_.lt_.txt, the
second column only contains the character 0...
> and your 'grep' selects (among others) all the lines that include this
character. I assume you want whole domain matches.
I checked the original
10 matches
Mail list logo