Engine Fuel Or Engine Fool?

Sydney Morning Herald

Tuesday November 6, 2001

Nicole Manktelow

The struggle to find quality in amongst the quantity still plagues search engines, writes Nicole Manktelow.

In a labyrinth of new and constantly changing content, search engines may be the best hope for finding our way through the Internet, but these tools are constantly subject to deception and manipulation.

Searching is considered to be the second most popular activity on the Internet, behind email.

And just as email users call junk messages ``spam" and their purveyors ``spammers", the search industry also uses these terms in describing those who deluge search systems with their listings, use sophisticated cloaking techniques, or create numerous dummy pages in a bid to fool the technology.

``Search engine companies are deploying considerable resources countering these tactics," said David Hawking, of CSIRO's Electronic Content Technologies group.

Many of these tactics exploit a search engine's traditional quest to access the most content and perform the broadest possible online search.

As the content on the Web continues to grow at an astronomical rate, the search industry pushes on, arguing that issues of quality and relevance are considerably more useful than volume.

But psychologically at least, size still counts in search engine comparisons.

It is little coincidence that the top-rated Google (www.google .com) also claims the maximum number of indexed pages. In fact, Google is Australia's most popular place to search, with 1.9 million Australians or one in four local Internet users accessing the engine in September, according to research from Jupiter Media Metrix.

``There is in excess of 2 billion `visible' Web pages, and that's conservative," Danny Sullivan, editor of industry news site SearchEngineWatch.com, said. ``Google gathers the most of it, 1.1 billion pages actually visited, with an extra half billion that it knows about via link analysis."

Exact details of Google's PageRank search methodology are a secret but in general, link analysis involves gathering the links from within Web pages and comparing the destinations with the anchor text used to describe them.

For example, most links to www.theage.com.au would presumably be described in the anchor text as ``The Age's Web site".

While using anchor text is an effective way of extending the reach of any search system, Hawking warned that these methods were also subject to exploitation.

``Say we decide to create a Web page that links to John Howard's Web site but the anchor text includes unflattering descriptions particularly if it is something uncommon such as `rotten varmint'... a search for `rotten varmint' could mean his page would come up in the results list," Hawking explained.

A key ploy of search engine spammers is to create Web pages for the sole purpose of publishing links to their main site. They hope the links from these pages will be added to the catch when search engines go fishing making any site appear more popular to a search engine, Hawking explained.

Meanwhile, cloaking, a technique that has many legitimate applications in allowing Web sites to deliver different versions to different viewers, can also be used to deceive search engines.

``At the moment the query `Osama bin Laden' is quite popular. If you had an adult content site, you might insert CGI script to detect who is visiting the site. If the request is from Google or another search engine, their request would fetch a page about Osama, so the search engine records the information and that's what gets indexed," Hawkins said. ``But when people from elsewhere attempt to access the page, they see the page of naked women or whatever," he said.

While most site owners are unaware of the legitimate things that they should do to enhance their visibility, there is only a ``relatively small group of people who go beyond the legitimate and are overly aggressive with search engines", Sullivan said.

``Unfortunately, this small group can generate a huge amount of work for search engines, which then have to devote resources to filter out this spam," he said.

Newer search engines such as Wisenut (www.wisenut.com) and Teoma (www.teoma.com), unable to match Google's bulk, are focusing on their own proprietary search technologies and claim greater accuracy.

Human judgment is, however, still the last line of defence for some.

``Google employs two people to test the engine and try to defeat it," Hawking said. Meanwhile, Internet guide Yahoo has relied on human editors for the bulk of its service. The local outpost, like its other international offices, has a department of ``surfers" to make sure sites are listed in appropriate categories. ``Like everybody, we are targets for spammers," said Lynne Hughes, surfing manager for Yahoo Australia and New Zealand.

``Our software can tell if a site is in the directory already. Sometimes people will attempt to list a site with a slightly different URL, but when the surfer goes there they can see this," she said.

Unlike its search engine cousins, Yahoo's philosophy is to list what its staff and users believe to be ``the best of the Web, not everything on the Web", Hughes said.

``Size isn't everything," agreed Sullivan. ``Having a good collection of important pages and good relevancy to rank those pages is generally better."

And it may be a moot point. Experts argue that search engines will never get beyond accessing a fraction of the material published online, or would have difficulty managing the information.

``There's a lot of debate about how big the Web is and how much you have to index," Hawking explained. ``There are some sites, such as online calendars, which can generate an infinite number of pages. And there is masses of stuff publishers do not allow to be indexed."

The CSIRO is another participant in the race to build a better search engine, a co-founder of the P@noptic search engine along with the Australian University's Co-operative Research Centre.

P@noptic is used by Sydney University's Department of Industry, Science and Resources. ``If we get it working on 10 sites we might consider commercialising it," Hawking said.

nicole@auscape.net.au

© 2001 Sydney Morning Herald

Back to News Index | Back to Home

News Archive

2009

2008

2007

2003

2002

2001

2000