Had to run and missed a couple of important items. One is that you can
calculate the likelihood that a link is missing. (Its similar to Googles
page rank) If the likelihood turns out to be to small you simply don't
report anything. You also can skip reporting if you don't have any
intervening searc
I tried to convince myself to stay out of this thread, but this was
somewhat interesting. ;)
I'm not quite sure this will work out for every case, but my gross idea
is like this:
Imagine an user trying to get an answare about some kind of problem. He
searches with Google and dumps into the most o
John at Darkstar wrote:
> If someone wants to work on this I have some ideas to make something
> usefull out of this log, but I'm a bit short on time. Basically its two
> ideas that are really usefull; one is to figure out which articles are
> most interesting to show in a portal and the other is h
Some articles are always very seldom referred and those can be used to
uniquely identify a machine. Then there are all those who do something
that goes into public logs. The later are very difficult to obfuscate,
but the first one is possible to solve by setting a time frame long
enough that suffic
If someone wants to work on this I have some ideas to make something
usefull out of this log, but I'm a bit short on time. Basically its two
ideas that are really usefull; one is to figure out which articles are
most interesting to show in a portal and the other is how to detect
articles with missi
On Fri, Jun 5, 2009 at 9:20 PM, Gregory Maxwell wrote:
> On Fri, Jun 5, 2009 at 10:13 PM, Robert Rohde wrote:
> There is a lot of private data in user agents ("MSIE 4.123; WINNT 4.0;
> bouncing_ferret_toolbar_1.23 drunken_monkey_downloader_2.34" may be
> uniquely identifying). There is even private
Scrubbing log files to make the data private is hard work. You'd be
impressed by what researchers have been able to do - taking purportedly
anonymous data and using it to identify users en masse by correlating it
with publicly available data from other sites such as Amazon, Facebook and
Netflix. Ma
On Fri, Jun 5, 2009 at 10:13 PM, Robert Rohde wrote:
> On Fri, Jun 5, 2009 at 6:38 PM, Tim Starling wrote:
>> Peter Gervai wrote:
>>> Is there a possibility to write a code which process raw squid data?
>>> Who do I have to bribe? :-/
>>
>> Yes it's possible. You just need to write a script that ac
On Fri, Jun 5, 2009 at 6:38 PM, Tim Starling wrote:
> Peter Gervai wrote:
>> Is there a possibility to write a code which process raw squid data?
>> Who do I have to bribe? :-/
>
> Yes it's possible. You just need to write a script that accepts a log
> stream on stdin and builds the aggregate data
Peter Gervai wrote:
> Hello,
>
> I see I've created quite a stir around, but so far nothing really
> useful popped up. :-(
>
> But I see that one from Neil:
>> Yes, modifying the http://stats.grok.se/ systems looks like the way to go.
>
> For me it doesn't really seem to be, since it seems to be
Peter Gervai wrote:
> Is there a possibility to write a code which process raw squid data?
> Who do I have to bribe? :-/
Yes it's possible. You just need to write a script that accepts a log
stream on stdin and builds the aggregate data from it. If you want
access to IP addresses, it needs to run
Hello,
I see I've created quite a stir around, but so far nothing really
useful popped up. :-(
But I see that one from Neil:
> Yes, modifying the http://stats.grok.se/ systems looks like the way to go.
For me it doesn't really seem to be, since it seems to be using an
extremely dumbed down versi
12 matches
Mail list logo