Networked information flows in East Asia:
a pilot study on research uses
of the Altavista search engine
by
Dr T. Matthew Ciolek,
Research School of Pacific and Asian Studies,
Australian National University, Canberra ACT 0200, Australia
tmciolek@coombs.anu.edu.au
Document created: 10 Jun 2000. Last revised: 21 May 2001
0. Abstract
Electronic information residing on web servers is neither uniformly
distributed across the countries of the world, nor is it uniformily
linked to. This methodological note uncovers a series of clear-cut
patterns of geographic preference/avoidance amongst the hypertext
connections in web resources located in China, Hong Kong, Japan,
Korea and Taiwan. It also shows how the Altavista search engine can be
productively used to reveal hitherto hidden patterns in networked
information flows.
1. Introduction
The Altavista search engine (www.altavista.com) is one of the biggest
(Bharat and Broder 1998; Compaq Co. 2000), most powerful and most
widely known WWW search engines. In addition to being very fast and
fairly comprehensive, Altavista also offers a free, though relatively
little known online tool suitable for a systematic study of the
informational relationships between various networked entities. Such
entities can be as specific as individual web pages and subdirectories
on a web server, or they can be more general and amorphous, and
subsume whole web sites, computers, networks, and even countries.
The Altavista tool in question consists of two simultaneously issued
commands, both of which are directed to the search engine's query box.
These commands are, in their most generic form:
link:computer-address-A -host:computer-address-B
This expression means: within the realm of all web pages known at this moment
to Altavista's database, give me a number (and also display a detailed list) of all the
pages which contain a hypertext link to computer address A. At the same time,
eliminate from the report data about links from pages residing on
computer address B.
Altavista's "link:argument -host:argument" command can take a number of specific forms:
- for instance,
link:
machineA.organizationA.networkcodeA.countrycodeA/subdirectory-or-file-A1
-host:machineB.organizationB.networkcodeB.countrycodeB/subdirectory-or-file-B1
(e.g. link:coombs.anu.edu.au/ASAA/ -host:www.gu.edu.au/school/ais/asaapubs/eastasia.html)
locates all references to web page A1, minus any references
originating from web page B1.
- or,
link:machineA.organizationA.networkcodeA.countrycodeA
-host:machineA.organizationA.networkcodeA.countrycodeA
(e.g. link:www.nla.gov.au -host:www.nla.gov.au)
locates all references to computer address A, minus references
originating from that very same address, i.e. minus any self-references.
- or,
link:networkcodeA.countrycodeA
-host:.networkcodeB.countrycodeA
(e.g.link:.gov.au -host:.edu.au)
locates all references to the government's web sites (".gov") in
Australia, minus any references originating from university servers (".edu" web sites
in that country).
- or,
machineA.organizationA.networkcodeA.countrycodeA/subdirectory-or-file-A1
-host:.countrycodeB
(e.g.link:kaladarshan.arts.ohio-state.edu/exhib/kaney/pgs/kaneintr.html -host:.jp)
locates all references to a page on Japanese calligraphy ("/kaneintr.html") at
the Huntington Archive of Buddhist and Related Art, Ohio State
University, USA, minus any references to the page which originate from Japan (".jp").
2. Practical uses of the "link: -host:" command
The most common practical application of Altavista's "link: -host:" command is
a query concerning details of web pages which carry a hypertext link to a particular
online resource. An answer to such a question could be inportant in two situations.
Firstly, it measures in objective terms a degree of "online presence" exercised by
a document, or online resource or institution in question. It also enables to make
comparisons between two or more web resources.
For instance, three Altavista queries:
link:www.loc.gov -host:www.loc.gov
link:coombs.anu.edu.au -host:coombs.anu.edu.au
link:www.nla.gov.au -host:www.nla.gov.au
generate (in early June 2000) the following results:
www.loc.gov - about 53,921 pages found
coombs.anu.edu.au - about 31,431 pages found
www.nla.gov.au - about 16,276 pages found
which reveals that if the online visibility, or "presence" of the
US Library of Congress (www.loc.gov) is taken as the standard unit of
measurement, (i.e. one (1) "Loc"), then the RSPAS' seminal Asia-Pacific Studies
Web server, the Coombsweb (coombs.anu.edu.au) in June 2000 was worth about
0.58 Loc, while the National Library of Australia (www.nla.gov.au) was, at that
time, worth about 0.30 Loc.
Secondly, Altavista's command "link: -host:" can be used to
determine exactly who is making a link to a web page under our juridisction,
so that the person pointing at our resource can be notified about changes to
our site's computer address or to policies governing access to its contents.
However, in addition to these two everyday uses of the Altavista's
commands, there is also another one, which from a research point of
view is the most exciting.
3. Research uses of the "link: -host:" command
The cyberspace, defined here as the total body of web-based, publicly
accessible information, can be visualised as a constellation of
interlinked nodes.
These informational nodes can be seen to be
as small and as precise as an individually addressed (# type addresses
within a particular web page) paragraphs of a document, or the
documents themselves, or bundles in the form of a thematic subdirectory
or site. On the other hand, these nodes can be fairly broad and refer
to a body of digital information published on a particular machine, or
on a particular network, or on an address belonging to a particular
country. In the remainder of this methodological note I shall focus on
the latest case, the country-level nodes. (For a handy list of the
Internet two-character country codes see RIPE Network Coordination
Centre (1997).
Nodes of cyberspace are connected by a series of explicit (published on public web
pages) and implicit (stored on people's individual web browsers as 'bookmarks')
hypertext links. By definition, only public material is subject to a
direct public inquiry. Therefore, for the sake of simplicity, for rest of this note the existence of
privately stored bookmarks will be disregarded. Links can be
one-directional (e.g. A => B, but not vice versa, which means that the node B is oblivious
or indifferent to the online resources offered by the node A) and bi-directional (e.g.
A => B; B => A). When a number of such links are established to span
informational contents of a number of nodes in two or more areas,
the chances are that either these linkages are balanced (e.g. a similar
number of links established in places A and B, point at the information residing
in places B and A respectively); or unbalanced (e.g. a palce A makes
more links to a place B than vice versa).
Countries (as well as networks, institutions, and individual web sites) which
have more links to them than from them, can be considered as
information exporters. Conversely, countries which monitor the world at
large for publicly accessible documents and snippets of factual data
more intensively than they themselves are monitored, can be considered as
information importers.
Armed with this image, we can now proceed with an exploratory analysis
of informational relationships between a
sample of countries in Asia.
According to current statistics (Internet Software Consortium 2000
in January 2000 there were in the world over 72 mln networked hosts
and over 9.5 mln web sites (Zakon 2000). These are respectable figures and
indicate that our preliminary investigation will be based on ample data.
4. A study of informational relationships
For the purposes of this pilot study five East Asian countries were
looked at: China (Internet code .cn); Hong Kong (.hk); Japan (.jp);
South Korea (.kr); and Taiwan (.tw). The investigation was carried out in a sequence of steps:
Step 1:
Determine the overall number of web links pointing to
online resources resident in each of these five countries:
- QUERY link:.cn RESULT about 1,581,438 pages found pointing to China web resources
- QUERY link:.hk RESULT about 674,511 pages found pointing to Hong Kong
- QUERY link:.jp RESULT about 2,153,516 pages found pointing to Japan
- QUERY link:.kr RESULT about 1,840,565 pages found pointing to South Korea
- QUERY link:.tw RESULT about 1,617,220 pages found pointing to Taiwan
Step 2a:
Determine the overall number of web links pointing at China's
online information from web sites resident in five respective countries:
- QUERY link:.cn -host:.cn RESULT about 839,395 pages found.
(CALCULATION 1,581,438 - 839,395 = 742,043) i.e. there are 742,043 China-based pages with links pointing to
China's web
resources.
- QUERY link:.cn -host:.hk RESULT about 1,572,097 pages found
(CALCULATION 1,581,438 - 1,572,097 = 9,341) i.e. there are 9,341 HK-based pages with links pointing to China's web
resources.
- QUERY link:.cn -host:.jp RESULT about 1,572,219 pages found
(CALCULATION 1,581,438 - 1,572,219 = 9,219) i.e. there are 9,219 Japan-based pages with links pointing to China's web
resources.
- QUERY link:.cn -host:.kr RESULT about 1,574,026 pages found
(CALCULATION 1,581,438 - 1,574,026 = 7,412) i.e. there are 7,412 South Korea-based pages with links
pointing to China's web
resources.
- QUERY link:.cn -host:.tw RESULT about 1,571,867 pages found
(CALCULATION 1,581,438 - 1,571,867 = 9,571) i.e. there are 9,571 Taiwan-based pages with links pointing to China's web
resources.
Step 2b:
Determine the overall number of web links pointing at Hong Kong's
online information from web sites resident in five respective countries:
- QUERY link:.hk -host:.hk RESULT about 413,657 pages found
(CALCULATION 674,511 - 413,657 = 260,854) i.e. there are 260,854 HK-based pages with links pointing to Hong Kong's web
resources.
- QUERY link:.hk -host:.cn RESULT about 655,265 pages found
(CALCULATION 674,511 - 655,265 = 19,246) i.e. there are 19,246 China-based pages with links pointing to Hong Kong's web
resources.
- QUERY link:.hk -host:.jp RESULT about 667,134 pages found
(CALCULATION 674,511 - 667,134 = 7,377) i.e. there are 7,377 Japan-based pages with links pointing to Hong Kong's web
resources.
- QUERY link:.hk -host:.kr RESULT about 668,188 pages found
(CALCULATION 674,511 - 668,188 = 6,323) i.e. there are 6,323 South Korea-based pages with links pointing to Hong Kong's web
resources.
- QUERY link:.hk -host:.tw RESULT about 663,738 pages found
(CALCULATION 674,511 - 663,738 = 10,773) i.e. there are 10,773 Taiwan-based pages with links pointing to Hong Kong's web
resources.
Step 2c:
Determine the overall number of web links pointing at Japan's
online information from web sites resident in five respective countries:
- QUERY link:.jp -host:.jp RESULT about 1,122,258 pages found
(CALCULATION 2,153,516 - 1,122,258 = 1,031,258) i.e. there are 1,031,258 Japan-based pages with links pointing to Japan's web
resources.
- QUERY link:.jp -host:.cn RESULT about 2,148,930 pages found
(CALCULATION 2,153,516 - 2,148,930 = 4,586) i.e. there are 4,586 China-based pages with links pointing to Japan's web
resources.
- QUERY link:.jp -host:.hk RESULT about 2,151,151 pages found
(CALCULATION 2,153,516 - 2,151,151 = 2,365) i.e. there are 2,365 HK-based pages with links pointing to Japan's web
resources.
- QUERY link:.jp -host:.kr RESULT about 2,146,044 pages found
(CALCULATION 2,153,516 - 2,146,044 = 7,472) i.e. there are 7,472 South Korea-based pages with links pointing to Japan's web
resources.
- QUERY link:.jp -host:.tw RESULT about 2,146,552 pages found
(CALCULATION 2,153,516 - 2,146,552 = 6,964) i.e. there are 6,964 Taiwan-based pages with links pointing to Japan's web
resources.
Step 2d:
Determine the overall number of web links pointing at South Korea's
online information from web sites resident in five respective countries:
- QUERY link:.kr -host:.kr RESULT about 709,417 pages found
(CALCULATION 1,840,565 - 709,417 = 1,131,148) i.e. there are 1,131,148 South Korea-based pages with links pointing to South Korea's web
resources.
- QUERY link:.kr -host:.cn RESULT about 1,839,582 pages found
(CALCULATION 1,840,565 - 1,839,582 = 983) i.e. there are 983 China-based pages with links pointing to South Korea's web
resources.
- QUERY link:.kr -host:.hk RESULT about 1,839,513 pages found
(CALCULATION 1,840,565 - 1,839,513 = 1,052) i.e. there are 1,052 HK-based pages with links pointing to South Korea's web
resources.
- QUERY link:.kr -host:.jp RESULT about 1,836,780 pages found
(CALCULATION 1,840,565 - 1,836,780 = 3,785) i.e. there are 3,785 Japan-based pages with links pointing to South Korea's web
resources.
- QUERY link:.kr -host:.tw RESULT about 1,839,779 pages found
(CALCULATION 1,840,565 - 1,839,779 = 786) i.e. there are 786 Taiwan-based pages with links pointing to South Korea's web
resources.
Step 2e:
Determine the overall number of web links pointing at Taiwan's
online information from web sites resident in five respective countries:
- QUERY link:.tw -host:.tw RESULT about 553,527 pages found
(CALCULATION 1,617,220 - 553,527 = 1,063,693) i.e. there are 1,063,693 Taiwan-based pages with links pointing to Taiwan's web
resources.
- QUERY link:.tw -host:.cn RESULT about 1,597,899 pages found
(CALCULATION 1,617,220 - 1,597,899 = 19,321) i.e. there are 19,321 China-based pages with links pointing to Taiwan's web
resources.
- QUERY link:.tw -host:.hk RESULT about 1,609,903 pages found
(CALCULATION 1,617,220 - 1,609,903 = 7,317) i.e. there are 7,317 HK-based pages with links pointing to Taiwan's web
resources.
- QUERY link:.tw -host:.jp RESULT about 1,610,477 pages found
(CALCULATION 1,617,220 - 1,610,477 = 6,743) i.e. there are 6,743 Japan-based pages with links pointing to Taiwan's web
resources.
- QUERY link:.tw -host:.kr RESULT about 1,611,776 pages found
(CALCULATION 1,617,220 - 1,611,776 = 5,444) i.e. there are 5,444 South Korea-based pages with links pointing to Taiwan's web
resources.
Step 3:
Compile numeric values for all 25 permutations of linkages
existing between the five countries under investigation.
Step 4:
Convert raw values into percentages.
5. Calculations - General Picture
Steps 1-5 yielded a 5x5 matrix with absolute numbers of web links (as
well as with their percentage values) which originate and terminate at
five East Asian countries (see Table 1).
TABLE 1
GENERAL VIEW: numbers |
Links to: |
Links to: |
Links to: |
Links to: |
Links to: |
TOTAL |
Links from: |
China |
HK |
Japan |
Sth Korea |
Taiwan |
|
China |
742,043 |
19,246 |
4,586 |
983 |
19,321 |
786,179 |
HK |
9,341 |
260,854 |
2,365 |
1,052 |
7,317 |
280,929 |
Japan |
9,219 |
7,377 |
1,031,258 |
3,785 |
6,743 |
1,058,382 |
South Korea |
7,412 |
6,323 |
7,472 |
1,131,148 |
5,444 |
1,157,799 |
Taiwan |
9,571 |
10,773 |
6,964 |
786 |
1,063,693 |
1,091,787 |
GENERAL VIEW: % |
Links to: |
Links to: |
Links to: |
Links to: |
Links to: |
TOTAL |
Links from: |
China |
HK |
Japan |
Sth Korea |
Taiwan |
|
China |
94% |
2% |
1% |
0% |
2% |
100% |
HK |
3% |
93% |
1% |
0% |
3% |
100% |
Japan |
1% |
1% |
97% |
0% |
1% |
100% |
South Korea |
1% |
1% |
1% |
98% |
0% |
100% |
Taiwan |
1% |
1% |
1% |
0% |
97% |
100% |
The data show that in all five investigated countries the
overwhelming majority of web links were directed towards internal
resources. The most inward-looking country in the studied sample was
South Korea (98% of South Korean links terminating in the East Asia region were the
self-orientated links), while least inward looking country was Hong
Kong (93% of the Hong Kong's East Asian links were the self-orientated ones).
6. Calculations - Detailed Picture
A closer look (Table 2) at the residual numbers of links directed
towards other countries in the region reveals additional patterns:
TABLE 2
DETAILED VIEW: numbers |
Links to: |
Links to: |
Links to: |
Links to: |
Links to: |
TOTAL |
Links from: |
China |
HK |
Japan |
Sth Korea |
Taiwan |
|
China |
|
19,246 |
4,586 |
983 |
19,321 |
44,136 |
HK |
9,341 |
|
2,365 |
1,052 |
7,317 |
20,075 |
Japan |
9,219 |
7,377 |
|
3,785 |
6,743 |
27,124 |
South Korea |
7,412 |
6,323 |
7,472 |
|
5,444 |
26,651 |
Taiwan |
9,571 |
10,773 |
6,964 |
786 |
|
28,094 |
DETAILED VIEW: % |
Links to: |
Links to: |
Links to: |
Links to: |
Links to: |
TOTAL |
Links from: |
China |
HK |
Japan |
Sth Korea |
Taiwan |
|
China |
|
44% |
10% |
2% |
44% |
100% |
HK |
47% |
|
12% |
5% |
36% |
100% |
Japan |
34% |
27% |
|
14% |
25% |
100% |
South Korea |
28% |
24% |
28% |
|
20% |
100% |
Taiwan |
34% |
38% |
25% |
3% |
|
100% |
Table 2 shows that at the time of this study (early June 2000)
- China's most important source of online information were Hong Kong (44%) and Taiwan (44%). Her least important
source was South Korea (2%)
- Hong Kong's most important source of online information was China (47%). Its least important source was South Korea (5%).
- Japan's most important source of online information was China (34%). Her least important source was South Korea (14%).
- South Korea's most important source of online information were China (28%) and Japan (28%). Her
least important source was Taiwan (20%).
- Taiwan's most important source of online information was Hong Kong (38%). Its least important source was South Korea (3%).
7. Concluding notes
The above patterns are incomplete and rough because they only take into
account interactions and informational dependencies between five places. A clearer and more energetic picture
would certainly emerge if a larger sample of Asian countries, if
not all of them, would be investigated by means of the
technique described here.
A word of warning, though. The more ambitious a study, the more
complicated and time-consuming it becomes. This pilot investigation
has looked at details of five (5) countries and therefore had to
analyse all 25 relationships between variables. A study of, say, 10
countries will necessitate an analysis of 100 relationships, whereas
the full picture of patterns of networked information on the continent
of Asia will involve no less than an analysis of 49x49 or 2401
relationships between countries ranging from Afghanistan and Armenia
to Vietnam and Yemen (i.e. the Middle East:
'bh','ir','iq','il','jo','kw','lb','om','qa','sa','sy','tr','ae','ye';
Caucasus: 'am','az','ge'; Central Asia 'kz','kg','mn','tj','tm','uz';
South Asia: 'af','bd','bt','in','mv','np','pk','lk'; South East Asia:
'bn','mm','kh','tt','id','la','my','ph','sg','th','vn', and finally
East Asia: 'cn','hk','jp','kp','kr','mo','tw').
Since the data collection for this pilot study was completed in a
single day, it can be extrapolated that an indvidual researcher would
be able to collect Altavista data for patterns of informational
interaction amongst all 49 Asian countries in about 96 days. Therefore, a
more ambitious research into these matters automatically implies a
division of labour, and a cheerful collaboration between scholars.
Since its creation in 1969 the Internet has always been known as a
convenient source of information about established facts, documents, people
and institutions. In that mode the Internet functions as a congenial tool for
rapid provision of missing information about the already identified
topics (Ciolek 2000).
However, as the above methodological note shows, Altavista's powerful
and swift "link: -host:" facility can be easily adapted for
researching the topic of information interdependencies between
networked entities. This, in turn, suggests that the Internet can also
be viewed as a tool for creation of brand new information, one
whose very existence researchers were not aware of beforehand. In that
mode the Internet ceases to function as a massive reference aid, and - instead -
becomes a powerful and handy research tool, akin to an X-ray machine
or a theodolite.
8. Acknowledgements
My thanks are due to Olaf Ciolek for his useful comments on the first draft of this paper.
9. References
visitors to www.ciolek.com since 08 May 1997.
Maintainer: Dr T.Matthew Ciolek (tmciolek@ciolek.com)
Copyright (c) 2000 by T.Matthew Ciolek. All rights reserved. This Web page may be freely linked
to other Web pages. Contents may not be republished, altered or plagiarized.
This page has been tested for full accessibility
URL http://www.ciolek.com/PAPERS/easian-info-flows.html
[ Asian Studies WWW VL ]
[ www.ciolek.com ]
[ Buddhist Studies WWW VL ]