On Sun, Jan 31, 2010 at 8:46 AM, Thaths <[email protected]> wrote:
> On Sun, Jan 31, 2010 at 7:09 AM, Udhay Shankar N <[email protected]> wrote:
>> Speaking for myself, most news I get is from online sources (I do read
>> the ToI for entertainment value while having breakfast). A site like
>> google news makes it quite easy to see if a story has been repeated
>> across multiple sites.
> Google News does help get multiple points of view through its
> clustering mechanism.

I found this following analysis of recent coverage on the Google-China
thing interesting:

http://www.niemanlab.org/2010/02/the-googlechina-hacking-case-how-many-news-outlets-do-the-original-reporting-on-a-big-story/

We often talk about the new news ecosystem — the network of
traditional outlets, new startups, nonprofits, and individuals who are
creating and filtering the news. But how is the work of reporting
divvied up among the members of that ecosystem?

To try to build a datapoint on that question, I chose a single big
story and read every single version listed on Google News to see who
was doing the work. Out of the 121 distinct versions of last week’s
story about tracing Google’s recent attackers to two schools in China,
13 (11 percent) included at least some original reporting. And just
seven organizations (six percent) really got the full story
independently.

But as usual, things are a little more subtle than that. I chose the
Google-China story because it’s complex, international, sensitive, and
important. It’s the sort of big story that requires substantial
investigative effort, perhaps including inside sources and
foreign-language reporting. Call it a stress test for our reporting
infrastructure, a real-life worst case.

The New York Times broke the story last Thursday, writing that unnamed
sources involved in the investigation of last year’s hacking of a
number of American companies had traced the attacks to a prestigious
technical university and a vocational college in mainland China. The
article included comment from representatives of the schools and,
while it had a San Francisco dateline, credited contributions from
Shanghai staff. Immediately, the story was everywhere. Just about
every major American newspaper and all the wires covered it.

When I started investigating the issue on Monday morning, Google News
showed 800 different reports. But how many of these reports actually
brought new information to light? By default, Google does not display
duplicate copies of syndicated (or stolen) content, bringing the total
down to more than 100 unique pieces of copy. I read each one, and
several hours later, I had a spreadsheet recording the sourcing for
each story. I also recorded the country of publication, the dateline
or contributor location if noted, and the primary publishing medium of
each outlet (paper, online, radio, etc.) An excerpt of this data is
reproduced in the table below.

Here’s what I found:

— Out of 121 unique stories, 13 (11 percent) contained some amount of
original reporting. I counted a story as containing original reporting
if it included at least an original quote. From there, things get
fuzzy. Several reports, especially the more technical ones, also
brought in information from obscure blogs. In some sense they didn’t
publish anything new, but I can’t help feeling that these outlets were
doing something worthwhile even so. Meanwhile, many newsrooms
diligently called up the Chinese schools to hear exactly the same
denial, which may not be adding much value.

— Only seven stories (six percent) were primarily based on original
reporting. These were produced by The New York Times, The Washington
Post, the Wall Street Journal, The Guardian, Tech News World,
Bloomberg, Xinhua (China), and the Global Times (China).

— Of the 13 stories with original reporting, eight were produced by
outlets that primarily publish on paper,  four were produced by wire
services, and one was produced by a primarily online outlet. For this
story, the news really does come from newspapers.

— 14 reports (12 percent) were produced by Chinese outlets, had a
China dateline, or mentioned the assistance of staff in China. For a
story about China, that seems awfully low to me. Perhaps this has to
do with cutbacks of foreign correspondents?

— Nine reports (7 percent) mentioned no source at all. Five more were
partially unsourced. Given the ease of hyperlinks, this frightens me.

— Google News tended to rank solid original stories fairly high in its
list. Google says they rank stories based on criteria such as the
reputation of a source, number of references by other articles, and
the headline clickthrough rate — though they won’t reveal exactly how
it’s done. The spreadsheet and table below list stories in the order
that Google News ranked them.

— Google’s story-clustering algorithm included three unrelated stories
and missed at least one original report. The three extraneous stories
were about Google and China, but not about the recent trace. The
exclusion of the Financial Times’ excellent piece is a disappointment
— perhaps this has something to do with their paywall? Maybe I’m
biased because, as a computer scientist, I appreciate the difficulty
of the problem — but I actually think this means that Google News
works remarkably well, for a completely unsupervised algorithm that
crawls billions of pages to find millions of stories in dozens of
languages.

— What were those other 100 reporters doing? When I think of how much
human effort when into re-writing those hundred other unique stories
that contained no original reporting, I cringe. That’s a huge amount
of journalistic effort that could have gone into reporting other
deserving stories. Why are we doing this? What are the legal,
technical, economic and cultural barriers to simply linking to the
best version of each story and moving on?

— The punchline is that no English-language outlet picked up the
original reporting of Chinese-language Qilu Evening News, which was
even helpfully translated by Hong Kong blogger Roland Soong. A Chinese
reporter visited one of the schools in question and advanced the story
by clarifying that serious hackers were unlikely to have been trained
in the vocational computer classes offered there. Soong told me that
Lanxiang Vocational School is well known in China for their cheesy
late-night commercials and low-quality schooling — more of an
educational chop shop for cooks and mechanics than the training ground
for military hackers than the Times claims.

Tracing one story doesn’t prove anything conclusive beyond that one
story, of course. And using Google News as a filter doesn’t truly
represent the new news ecosystem: It excludes lots of smaller blogs
and other outlets. Soong said Google News told him that his site is
not eligible for inclusion in their results because they don’t include
small blogs written by a single author. This seems like an arbitrary
distinction, but it’s hard to imagine what defensible choice Google
could make in an era where the definition of a news source is so up
for grabs.

-- 
"Marge, you being a cop makes you the man! Which makes me the woman... and
I have no interest in that, besides wearing the occasional underwear, which
as we discussed is strictly a comfort thing." -- Homer J. Simpson
Sudhakar Chandra                                    Slacker Without Borders

Reply via email to