Networked information flows in East Asia:
a pilot study on research uses
of the Altavista search engine

by
Dr T. Matthew Ciolek,
Research School of Pacific and Asian Studies,
Australian National University, Canberra ACT 0200, Australia
tmciolek@coombs.anu.edu.au

Document created: 10 Jun 2000. Last revised: 21 May 2001

0. Abstract

Electronic information residing on web servers is neither uniformly distributed across the countries of the world, nor is it uniformily linked to. This methodological note uncovers a series of clear-cut patterns of geographic preference/avoidance amongst the hypertext connections in web resources located in China, Hong Kong, Japan, Korea and Taiwan. It also shows how the Altavista search engine can be productively used to reveal hitherto hidden patterns in networked information flows.

1. Introduction

The Altavista search engine (www.altavista.com) is one of the biggest (Bharat and Broder 1998; Compaq Co. 2000), most powerful and most widely known WWW search engines. In addition to being very fast and fairly comprehensive, Altavista also offers a free, though relatively little known online tool suitable for a systematic study of the informational relationships between various networked entities. Such entities can be as specific as individual web pages and subdirectories on a web server, or they can be more general and amorphous, and subsume whole web sites, computers, networks, and even countries.

The Altavista tool in question consists of two simultaneously issued commands, both of which are directed to the search engine's query box. These commands are, in their most generic form: This expression means: within the realm of all web pages known at this moment to Altavista's database, give me a number (and also display a detailed list) of all the pages which contain a hypertext link to computer address A. At the same time, eliminate from the report data about links from pages residing on computer address B.

Altavista's "link:argument -host:argument" command can take a number of specific forms:

2. Practical uses of the "link: -host:" command

The most common practical application of Altavista's "link: -host:" command is a query concerning details of web pages which carry a hypertext link to a particular online resource. An answer to such a question could be inportant in two situations. Firstly, it measures in objective terms a degree of "online presence" exercised by a document, or online resource or institution in question. It also enables to make comparisons between two or more web resources.

For instance, three Altavista queries:

generate (in early June 2000) the following results:

which reveals that if the online visibility, or "presence" of the US Library of Congress (www.loc.gov) is taken as the standard unit of measurement, (i.e. one (1) "Loc"), then the RSPAS' seminal Asia-Pacific Studies Web server, the Coombsweb (coombs.anu.edu.au) in June 2000 was worth about 0.58 Loc, while the National Library of Australia (www.nla.gov.au) was, at that time, worth about 0.30 Loc.

Secondly, Altavista's command "link: -host:" can be used to determine exactly who is making a link to a web page under our juridisction, so that the person pointing at our resource can be notified about changes to our site's computer address or to policies governing access to its contents.

However, in addition to these two everyday uses of the Altavista's commands, there is also another one, which from a research point of view is the most exciting.

3. Research uses of the "link: -host:" command

The cyberspace, defined here as the total body of web-based, publicly accessible information, can be visualised as a constellation of interlinked nodes.

These informational nodes can be seen to be as small and as precise as an individually addressed (# type addresses within a particular web page) paragraphs of a document, or the documents themselves, or bundles in the form of a thematic subdirectory or site. On the other hand, these nodes can be fairly broad and refer to a body of digital information published on a particular machine, or on a particular network, or on an address belonging to a particular country. In the remainder of this methodological note I shall focus on the latest case, the country-level nodes. (For a handy list of the Internet two-character country codes see RIPE Network Coordination Centre (1997).

Nodes of cyberspace are connected by a series of explicit (published on public web pages) and implicit (stored on people's individual web browsers as 'bookmarks') hypertext links. By definition, only public material is subject to a direct public inquiry. Therefore, for the sake of simplicity, for rest of this note the existence of privately stored bookmarks will be disregarded. Links can be one-directional (e.g. A => B, but not vice versa, which means that the node B is oblivious or indifferent to the online resources offered by the node A) and bi-directional (e.g. A => B; B => A). When a number of such links are established to span informational contents of a number of nodes in two or more areas, the chances are that either these linkages are balanced (e.g. a similar number of links established in places A and B, point at the information residing in places B and A respectively); or unbalanced (e.g. a palce A makes more links to a place B than vice versa).

Countries (as well as networks, institutions, and individual web sites) which have more links to them than from them, can be considered as information exporters. Conversely, countries which monitor the world at large for publicly accessible documents and snippets of factual data more intensively than they themselves are monitored, can be considered as information importers.

Armed with this image, we can now proceed with an exploratory analysis of informational relationships between a sample of countries in Asia.

According to current statistics (Internet Software Consortium 2000 in January 2000 there were in the world over 72 mln networked hosts and over 9.5 mln web sites (Zakon 2000). These are respectable figures and indicate that our preliminary investigation will be based on ample data.

4. A study of informational relationships

For the purposes of this pilot study five East Asian countries were looked at: China (Internet code .cn); Hong Kong (.hk); Japan (.jp); South Korea (.kr); and Taiwan (.tw). The investigation was carried out in a sequence of steps:

Step 1:

Determine the overall number of web links pointing to online resources resident in each of these five countries:

Step 2a:

Determine the overall number of web links pointing at China's online information from web sites resident in five respective countries:

Step 2b:

Determine the overall number of web links pointing at Hong Kong's online information from web sites resident in five respective countries:

Step 2c:

Determine the overall number of web links pointing at Japan's online information from web sites resident in five respective countries:

Step 2d:

Determine the overall number of web links pointing at South Korea's online information from web sites resident in five respective countries:

Step 2e:

Determine the overall number of web links pointing at Taiwan's online information from web sites resident in five respective countries:

Step 3:

Compile numeric values for all 25 permutations of linkages existing between the five countries under investigation.

Step 4:

Convert raw values into percentages.

5. Calculations - General Picture

Steps 1-5 yielded a 5x5 matrix with absolute numbers of web links (as well as with their percentage values) which originate and terminate at five East Asian countries (see Table 1).

TABLE 1
GENERAL VIEW: numbers Links to: Links to: Links to: Links to: Links to: TOTAL
Links from: China HK Japan Sth Korea Taiwan
China 742,043 19,246 4,586 983 19,321 786,179
HK 9,341 260,854 2,365 1,052 7,317 280,929
Japan 9,219 7,377 1,031,258 3,785 6,743 1,058,382
South Korea 7,412 6,323 7,472 1,131,148 5,444 1,157,799
Taiwan 9,571 10,773 6,964 786 1,063,693 1,091,787
GENERAL VIEW: % Links to: Links to: Links to: Links to: Links to: TOTAL
Links from: China HK Japan Sth Korea Taiwan
China 94% 2% 1% 0% 2% 100%
HK 3% 93% 1% 0% 3% 100%
Japan 1% 1% 97% 0% 1% 100%
South Korea 1% 1% 1% 98% 0% 100%
Taiwan 1% 1% 1% 0% 97% 100%

The data show that in all five investigated countries the overwhelming majority of web links were directed towards internal resources. The most inward-looking country in the studied sample was South Korea (98% of South Korean links terminating in the East Asia region were the self-orientated links), while least inward looking country was Hong Kong (93% of the Hong Kong's East Asian links were the self-orientated ones).

6. Calculations - Detailed Picture

A closer look (Table 2) at the residual numbers of links directed towards other countries in the region reveals additional patterns:

TABLE 2
DETAILED VIEW: numbers Links to: Links to: Links to: Links to: Links to: TOTAL
Links from: China HK Japan Sth Korea Taiwan
China 19,246 4,586 983 19,321 44,136
HK 9,341 2,365 1,052 7,317 20,075
Japan 9,219 7,377 3,785 6,743 27,124
South Korea 7,412 6,323 7,472 5,444 26,651
Taiwan 9,571 10,773 6,964 786 28,094
DETAILED VIEW: % Links to: Links to: Links to: Links to: Links to: TOTAL
Links from: China HK Japan Sth Korea Taiwan
China 44% 10% 2% 44% 100%
HK 47% 12% 5% 36% 100%
Japan 34% 27% 14% 25% 100%
South Korea 28% 24% 28% 20% 100%
Taiwan 34% 38% 25% 3% 100%

Table 2 shows that at the time of this study (early June 2000)

7. Concluding notes

The above patterns are incomplete and rough because they only take into account interactions and informational dependencies between five places. A clearer and more energetic picture would certainly emerge if a larger sample of Asian countries, if not all of them, would be investigated by means of the technique described here.

A word of warning, though. The more ambitious a study, the more complicated and time-consuming it becomes. This pilot investigation has looked at details of five (5) countries and therefore had to analyse all 25 relationships between variables. A study of, say, 10 countries will necessitate an analysis of 100 relationships, whereas the full picture of patterns of networked information on the continent of Asia will involve no less than an analysis of 49x49 or 2401 relationships between countries ranging from Afghanistan and Armenia to Vietnam and Yemen (i.e. the Middle East: 'bh','ir','iq','il','jo','kw','lb','om','qa','sa','sy','tr','ae','ye'; Caucasus: 'am','az','ge'; Central Asia 'kz','kg','mn','tj','tm','uz'; South Asia: 'af','bd','bt','in','mv','np','pk','lk'; South East Asia: 'bn','mm','kh','tt','id','la','my','ph','sg','th','vn', and finally East Asia: 'cn','hk','jp','kp','kr','mo','tw'). Since the data collection for this pilot study was completed in a single day, it can be extrapolated that an indvidual researcher would be able to collect Altavista data for patterns of informational interaction amongst all 49 Asian countries in about 96 days. Therefore, a more ambitious research into these matters automatically implies a division of labour, and a cheerful collaboration between scholars.

Since its creation in 1969 the Internet has always been known as a convenient source of information about established facts, documents, people and institutions. In that mode the Internet functions as a congenial tool for rapid provision of missing information about the already identified topics (Ciolek 2000).

However, as the above methodological note shows, Altavista's powerful and swift "link: -host:" facility can be easily adapted for researching the topic of information interdependencies between networked entities. This, in turn, suggests that the Internet can also be viewed as a tool for creation of brand new information, one whose very existence researchers were not aware of beforehand. In that mode the Internet ceases to function as a massive reference aid, and - instead - becomes a powerful and handy research tool, akin to an X-ray machine or a theodolite.

8. Acknowledgements

My thanks are due to Olaf Ciolek for his useful comments on the first draft of this paper.

9. References


Site Meter
visitors to www.ciolek.com since 08 May 1997.

Maintainer: Dr T.Matthew Ciolek (tmciolek@ciolek.com)

Copyright (c) 2000 by T.Matthew Ciolek. All rights reserved. This Web page may be freely linked to other Web pages. Contents may not be republished, altered or plagiarized.

This page has been tested for full accessibility

URL http://www.ciolek.com/PAPERS/easian-info-flows.html

[ Asian Studies WWW VL ] [ www.ciolek.com ] [ Buddhist Studies WWW VL ]