[This document is a part of the Asia Web Watch: a Register of Statistical Data (est. 1 Oct 1997)]

Exploring the Digital Annapurna:
On Monitoring and Mapping of Asian Cyberspace

Dr T. Matthew Ciolek,
Research School of Pacific and Asian Studies,
Australian National University, Canberra ACT 0200, Australia
tmciolek@coombs.anu.edu.au
http://www.ciolek.com/PEOPLE/ciolek-tm.html

To be presented at the
International Convention of Asia Scholars (ICAS),
Noordwijkerhout, Leiden, The Netherlands,
25-28 June 1998

Document created: 15 Jun 1998. Last revised: 16 Jul 2009

0. Abstract

This paper describes uses of the Altavista WWW search engine for gathering statistical data about online resources related to Asia. It also offers an analysis of the size, geographical differences and growth rates of Asian cyberspace between February 1997 and June 1998.

1. Introduction

This is the fourth paper in a series dedicated to the description and analysis of the Internet and Asian Cyberspace. At present very few people collect and publish statistics about the Internet (Network Wizards 1998, Zakon 1998), and practically no-one monitors networked developments pertaining specifically to Asia. Therefore, this series of papers, despite numerous shortcomings, provides a much needed glimpse of this vast and uncharted territory. The three previous papers looked at the following topics: The overall aim of this series is three-fold: (i) to gather, process and publish available statistical data about the Internet and Asia-related resources; (ii) to identify, test and refine suitable methodologies for such research; and (iii) to discover possible patterns in the collected materials.

There are two reasons for which these investigations should be conducted.

First, any analysis of contemporary Asia needs to take into account data on Asia's informational infrastructure. Simply, we cannot reflect adequately on political and social change without reference to the ubiquitous electronic media, telecommunications and the Net-based interactions.

Second, the Internet as a global system of online information resources, continuously cannibalises and overwrites its own files and records. It not only updates and modifies but also obliterates information about itself. Its palimpsest nature means that in order to understand the overall context of our electronic activities we have to either carefully archive and preserve all major networked publications (Cunningham 1997, Kahle & Bruce 1996) or, more directly, develop for it a set of reliable measures and statistical summaries.

This short study will continue exploration of the second option. It will focus on two of the aforementioned topics: (i) uses of search engines, and (ii) analysis of the size and characteristics of Asian cyberspace.

2. An Exploratory Mapping of Asian Cyberspace

The term 'Asian Cyberspace' is used here to denote the global body of Asia-related information available in WWW, as well as FTP and Gopher formats. For example, any publicly accesible document mounted online anywhere in the world which refers to places, personalities, events or developments anywhere in Asian countries, is by definition, a part of Asian cyberspace (Ciolek 1997a). The term 'Asia' is cast here very widely and applies to countries and territories situated between the Arctic, Pacific and Indian Oceans, the Red Sea, the Mediterranean, the Black Sea, the Caucasus, the Caspian Sea and the Ural Mountains.

The size, organization and geographical distribution, as well as other characteristics of the informational realm that is Asian cyberspace can be investigated by means of systematic queries posed to a suitably comprehensive and adequately speedy WWW search engine.

Admittedly, the use of WWW search engines has its fundamental problems (Campbell 1997, Koch 1998). These electronic tools cover only a fraction of the entire WWW (Lawrence & Giles 1998), and they tend to favour English-language documents. Also, they do not publish data on the criteria used for inclusion of web addresses into their logbooks, and their internal structure is subject to undocumented modifications. Morevover, contents are constantly updated, expanded and contracted without any apparent schedule. In other words, search engines constitute volatile, unpredictable and frustrating systems to work with. However, they are the quickest, and most handy tools for uncovering details of the world's electronic resources on any conceiveable topic.

Of the tens of hundreds of such databases (Ciolek 1998c) the largest and fastest one is Altavista (Digital Corporation 1998a, 1998b). For that reason it was chosen as the primary tool for my investigations.

Launched in December 1995, Altavista is a free, continuously updated full-text database of public access hypertext documents. In June 1998 it contained details of 140 million online documents, that is of approximately 30-50% of the world's cyberspace (Bharat & Broder 1998). To maintain its status as the leading search engine "AltaVista continuously crawls and indexes the Web - indexing up to 10 million pages daily" (Digital 1998b). The frequency with which Web pages are accessed and their contents soaked up and digested by the engine is quite impressive. An analysis perfomed in March 1998 suggests that about 37% of Altavista's contents were no more than 3 months old, and 50% of it no more than six months old (Ciolek 1998a, Table 007).

Data presented in the remainder of this paper have been collected via the following procedure: This procedure was adopted for the sake of simplicity and speed with which data could be collected as a part of a one-person operation. Certainly, the method can be improved upon. The aim of this series of studies is to provide a point of departure towards more ambitious and more reliable statistical inquiries.

3. Asian Cyberspace Statistics: The Initial Findings

Results of our five repeated measurements are listed in Table 1.
Table 1
The volume of WWW information about Asian countries, 
as recorded by Altavista database since Feb 97
-----------------------------------------------------------------------------
Country              Feb 97      Sep 97      Dec 97      Mar 98       Jun 98
-----------------------------------------------------------------------------
MIDDLE EAST        www pages   www pages   www pages   www pages    www pages
Afghanistan          30,000      39,000      10,000      53,000       70,000
Bahrain              30,000      36,000      10,000      55,000       64,000
Iran                 60,000      85,000      21,000     127,000      167,000
Iraq                 20,000      37,000      11,000     105,000      137,000
Israel              500,000     584,000     113,000     578,000      671,000
Jordan               20,000      23,000      70,000     415,000      460,000 
Kurdistan             5,000       3,000       1,000       7,000        8,000 
Kuwait               30,000      38,000      11,000      94,000      105,000 
Lebanon              20,000      90,000      24,000     148,000      168,000
Oman                 20,000      27,000       6,000      55,000       60,000 
Palestine            20,000      41,000      10,000      61,000       70,000
Qatar                10,000      19,000       5,000      46,000       53,000 
Saudi Arabia         30,000      30,000      82,000      98,000      127,000
Syria                20,000      35,000       8,000      68,000       88,000 
Turkey              104,000     224,000      53,000     275,000      361,000 
United Arab Emirates 10,000      15,000      41,000      51,000       70,000 
Yemen                10,000      14,000       4,000      37,000       50,000
TOTAL               939,000   1,340,000     480,000   2,273,000    2,729,000 
-----------------------------------------------------------------------------
CAUCASUS
Armenia              30,000      40,000      10,000      46,000       60,000
Azerbaijan           10,000      24,000       6,000      33,000       43,000 
Chechnya              6,000       9,000       2,000      12,000       13,000 
Georgia               4,000       5,000       2,000       3,000        2,000
TOTAL                50,000      78,000      20,000      94,000      118,000 
-----------------------------------------------------------------------------
CENTRAL ASIA
Kazakhstan           20,000      17,000       6,000      41,000       54,000 
Kyrgyzstan            7,000       8,000       2,000      20,000       29,000
Tajikistan            7,000       8,000       2,000      20,000       26,000
Turkmenistan          7,000       8,000       2,000      25,000       35,000
Uzbekistan           10,000      13,000       4,000      35,000       48,000
TOTAL                51,000      54,000      16,000     141,000      192,000 
-----------------------------------------------------------------------------
SOUTH ASIA
Bangladesh           50,000      61,000      17,000      74,000       96,000
Bhutan               10,000      17,000       5,000      27,000       36,000 
India               404,000     628,000     175,000     636,000      776,000
Kashmir               8,000      10,000       4,000      19,000       24,000
Ladakh                  n/a         n/a         n/a       3,000        4,000
Maldives             10,000      12,000       3,000      24,000       34,000 
Nepal                60,000      62,000      15,000      76,000       99,000
Pakistan             60,000      83,000      31,000     150,000      196,000 
Sikkim                  n/a         n/a         n/a       4,000        5,000
Sri Lanka            30,000      27,000      80,000      95,000      120,000
TOTAL               632,000     900,000     330,000   1,108,000    1,390,000 
-----------------------------------------------------------------------------
SOUTH EAST ASIA
Brunei               20,000      30,000       8,000      43,000       56,000
Burma                30,000      43,000      12,000      47,000       58,000 
Cambodia             30,000      39,100      13,000      62,000       80,000 
East Timor            4,000       4,000      14,000      16,000       21,000 
Indonesia           202,000     240,000      57,000     267,000      328,000 
Laos                 30,000      25,200       7,000      40,000       53,000 
Malaysia            202,000     265,000      63,000     270,000      335,000 
Philippines         100,000     152,000      41,000     215,000      258,000
Singapore           403,620     591,000     158,000     437,000      536,000 
Thailand            202,000     270,000      63,000     272,000      328,000
Vietnam             104,000     202,000      46,000     238,000      293,000
TOTAL             1,327,000   1,861,000     482,000   1,907,000    2,346,000
-----------------------------------------------------------------------------
EAST ASIA                                                        
China               710,000     970,000     231,000     845,000    1,110,000 
E.Turkistan             n/a         n/a         n/a         981          425
Hong Kong           202,000     175,000     424,000     464,000      559,000 
Japan             1,003,000   2,252,000   1,422,000   1,280,000    1,557,000 
Korea (North)        10,000      10,000      33,000      39,000       50,000 
Korea (South)        20,000      27,000      74,000      90,000      116,000 
Macau                20,000      21,000       5,000      29,000       45,000
Mongolia             10,000      22,000       6,000      42,000       57,000 
Siberia              10,000      19,000       4,000      32,000       39,000
Taiwan              202,000     294,000      69,000     332,000      406,000 
Tibet                39,000      44,000      10,000      49,000       62,000 
TOTAL             2,226,000   3,834,000   2,278,000   3,203,000    4,001,000 
-----------------------------------------------------------------------------
ASIA TOTAL        5,225,000   8,067,000   3,606,000   8,726,000   10,776,000 
-----------------------------------------------------------------------------

Source: Ciolek 1998a, Table 003

These data lead to a number of conclusions. Firstly, we can see that the volume of online information about the Asian countries is already large.

To illustrate: in June 1998 it amounted to about 10.8 million Web pages. Since an average Web document is about 15 Kb long (Ciolek 1998a, Tables 008 & 009) we can calculate that the total volume of Asia related information in June 98 was 162Gb. Had this information been printed, it would generate over 40.5 mln A4 pages. If these pages were stacked in a single column of paper, their cumulative height (assuming 5 sheets per mm) would be approximately 8.1 km. In other words, the current volume of online information pertaining to Asia is as big as the Himalayan peak, Annapurna (The Times Atlas of the World 1994).

Moreover, this mountain of electronic information continues to grow. This point will be looked at in some detail later in the paper.

Another two observations which stem from Table 1 pertain to the method of the data collection.

Column 3 in Table 1 shows that in December 1997 the there was a large drop in the number of Web pages dealing with all studied Asian countries. This decline seems to be caused by the December 1997 reorganisation (Bharat & Broder 1998) of the Altavista's database operations. This implies that data collected on the Internet should always be analysed in context of a series of related observations.

Also, Table 1 raises another issue. The numbers of web pages about country names which have homonyms suggest that these cases need to be treated with caution.

For instance, the number of web pages dealing with the Kingdom of Jordan appears to be overestimated because the keyword also a homonym for a number of other geographic locations in Australia and North America as well as for an Anglo-Saxon surname. Conversely, the data for Georgia are underestimated. In our study they are derived from a search combining keywords 'Georgia' and 'Republic'. This is because the name of the Caucasian country also refers to the name of a state in the US as well as to a popular feminine name. All these homonyms tend to greatly inflate the overall figures for 'Georgia' and are very difficult to separate from each other. Similarly, while the use of the capitalised keywords such as 'China' and 'Turkey' focuses Altavista searches on materials dealing primarely with the geographical entities, there is a possibility that they occasionally point to pages dealing with 'China and Porcelain' as well as 'Turkey and Ham Dishes'. The best technique for elimination of these unwelcome side-effects is not known at the moment. Perhaps some calculations on the strength of the correlations between frequencies of occurence of terms 'Jordan/Jordanian', 'Turkey/Turkish' and so forth could be used here.

3. Asian Cyberspace Statistics: The Detailed Findings

The collected data also throw light on patterns displayed at the regional level (see Table 2).
Table 2
The volume of WWW information about Asian regions, 
as recorded by Altavista database since Feb 97
--------------------------------------------------------------------
Country              Feb 97   Sep 97     Dec 97    Mar 98    Jun 98
--------------------------------------------------------------------
MIDDLE EAST           18%   	17%    	  13%        26%      25%
CAUCASUS  	       1%        1%        0.5%       1%       1%
CENTRAL ASIA           1%        1%        0.5%       2%       2%
SOUTH ASIA            12%       11%        9%        13%      13%
SOUTH EAST ASIA       25%       23%       13%        22%      22%
EAST ASIA	      43%       47%       63%        37%      37%
--------------------------------------------------------------------
ASIA TOTAL           100%      100%       99%       101%     100% 
--------------------------------------------------------------------

Source: Table 1

Table 2 points to a consistency with which information about countries of Asia is produced world- wide. While the overall volume of web pages dealing with countries of the six geoographical regions listed in Table 2 steadily increases, the regions' share of that volume remains, fairly stable.

The growth rates in the Web-based information, however, are not constant across the regions (Table 3). The lower monthly percentage rates are characteristic of the online information on South East Asia and East Asia. Higher and more energetic rates are displayed by the pages dealing with countries of the Middle East and Central Asia.
Table 3
The volume and growth of WWW information about Asian regions, 
as recorded by Altavista database since Feb 97
-----------------------------------------------------------------------
Country            	  Feb 97        Jun 98           Growth
                       www pages      www pages  Feb97/Jun98    %/month
-----------------------------------------------------------------------
MIDDLE EAST  		939,000       2,729,000      191%        11.9
CAUCASUS 		 50,000         118,000      136%         8.5
CENTRAL ASIA  		 51,000         192,000      276%        17.2
SOUTH ASIA  		632,000       1,390,000      120%         7.5
SOUTH EAST ASIA	      1,327,000       2,346,000       77%         4.8
EAST ASIA             2,226,000       4,001,000       80%         5.0
-----------------------------------------------------------------------
ASIA TOTAL            5,225,000      10,776,000      106%         6.6
-----------------------------------------------------------------------

Source: Table 1

A question can also be asked (see Table 4 and 6) about the largest and the smallest amounts of online information pertaining to Asian countries.
Table 4  
Ten countries discussed by the largest number of web 
pages as recorded by Altavista database since Feb 97
---------------------------------------------------
Country              Feb 97          Jun 98
---------------------------------------------------
Japan              1,003,000      1,557,000 
China               710,000       1,110,000 
India               404,000         776,000
Israel              500,000         671,000
Hong Kong           202,000         559,000 
Singapore           403,620         536,000 
Jordan               20,000         460,000 
Taiwan              202,000         406,000 
Turkey              104,000         361,000 
Indonesia           202,000         328,000
---------------------------------------------------
ASIA TOTAL        5,225,000  	 10,776,000
---------------------------------------------------

Source: Table 1

Clearly, Japan is the Internet's draw card. In June 1998 it attracted over 1.5 million of pages or 14.4% of Asia-related online information. The runner-up is China, with 1.1 million pages (10.3%). Other countries play a role too, with some of them, like Jordan, not necessarily meriting their high placement (see discussion above).

Table 5 provides data on the opposite phenomenon, namely, relative online obscurity.
Table 5  
Ten countries discussed by the least number of web 
pages as recorded by Altavista database since Feb 97
---------------------------------------------------
Country              Feb 97          Jun 98
---------------------------------------------------
E.Turkistan             981*            425
Ladakh                3,000*          4,000
Sikkim                4,000*          5,000
Kurdistan             5,000           8,000 
Chechnya              6,000          13,000 
East Timor            4,000          21,000 
Kashmir               8,000          24,000
Tajikistan            7,000          26,000
Maldives             10,000          34,000
Turkmenistan          7,000          35,000
---------------------------------------------------
ASIA TOTAL        5,225,000  	 10,776,000
---------------------------------------------------

Source: Table 1
* Data from Mar 1998

Table 5 indicates that subsidiary or geographically isolated areas are less often discussed on the Web. Also, territories with 'tricky' or inconsistent spellings (e.g. Ladakh, Chechnya, Tajikistan) may have been underrepresented in our sample because their names are not standardised.

Information is also available (see Table 6) about the growth rates for the cyberspace dealing with individual Asian countries.
Table 6
The growth in the volume of WWW information about  Asian 
countries, as recorded by Altavista database since Feb 97
--------------------------------------------------------
Country            % growth/month*   
--------------------------------------------------------
Singapore   	        2.0
Israel      	        2.1
Japan       	        3.4
China      	        3.5
Tibet       	        3.7
Kurdistan   	        3.7
Indonesia   	        3.8
Thailand    	        3.8
Malaysia    	        4.0
Nepal       	        4.0
Laos        	        4.7
Bangladesh  	        5.7
India       	        5.7
Burma       	        5.8
Armenia     	        6.2**
Taiwan      	        6.2**
Bahrain     	        7.0
Chechnya    	        7.3
Macau       	        7.8
Afghanistan 	        8.3
Sikkim                  8.3***
Philippines 	        9.8
Cambodia               10.4
Kazakhstan             10.6
Hong Kong              11.0
Ladakh                 11.0***
Iran 	               11.1
Brunei 	               11.2
Vietnam                11.3
Kashmir                12.5
Oman                   12.5  
Pakistan               14.1
Maldives               15.0
Turkey                 15.4
Kuwait                 15.6
Palestine              15.6
Bhutan                 16.2
Tajikistan             16.9
Siberia                18.1
Sri Lanka              18.7     
Kyrgyzstan             19.6  
Saudi Arabia           20.1  
Azerbaijan             20.6     
Syria                  21.2     
Uzbekistan             23.7      
Korea (North)          25.0     
Turkmenistan           25.0  
Yemen                  25.0 
East Timor	       26.5
Qatar                  26.8
Mongolia               29.3
Korea (South)          30.0
Iraq          	       36.5
United Arab Emirates   37.5
Lebanon                46.2
Jordan                  ??
Georgia                 ??
E.Turkistan           -19.0
--------------------------------------------------------
ASIA TOTAL              6.6*
--------------------------------------------------------

Source: Table 1
* averaged over 16 months, Feb 97 - Jun 98                                                  
** growth rate 8.3% means doubling the volume during the 12 month period
*** average for 3 months Mar-Jun 98

Table 6 shows that the least dynamic sets of information are those about Sinagpore, Israel, Japan and China. Meanwhile, rapid expansion is shown by documents on Saudi Arabia, Azerbaijan, Syria, Uzbekistan, Korea (North), Turkmenistan, Yemen, East Timor, Qatar, Mongolia, Korea (South), Iraq, United Arab Emirates, and Lebanon.

Calculations (not presented here) show that there is no direct relationship whatsoever between the simple volume of existing information and the rate with which such information is placed online. The reasons behind these differential production rates appear to be more complex and possibly involve a combination of factors. These could include, in addition to the the volume of existing information, the length of experience with and intensity of use of the Internet by residents of a given country; the country's role in the global affairs as well as its current media 'sexiness' and newsworthiness.

4. Conclusions

The main implications of this study can be summarised as follows: It is obvious that dependable data on the Internet and Asia-related information resources cannot be satisfactorly collected and interpreted by one or two people, regardless how hard and fast they might work. These studies should be carried out as a long-term, systematic, international and collaborative effort. Just like it is with demography, linguistics or economics, these responsibilities need to shared by research teams whose operations are sponsored by major educational, government and business bodies at both national and international levels.

5. Acknowledgments

I am grateful to Monika Ciolek for her valuable advice on the earlier version of this paper.

6. About the Author

Dr T. Matthew Ciolek, a social scientist, heads the Internet Publications Bureau, Research School Pacific and Asian Studies, The Australian National University, Canberra, Australia. Since December 1991 he has been responsible for making the RSPAS' electronic research materials available to the Internet community via FTP-, WAIS-, Gopher-, Web- and email-based technologies, and is one of the world's pioneers in electronic communication regarding the Asia-Pacific region. His work and contact details can be found online at http://www.ciolek.com/PEOPLE/ciolek-tm.html

7. References

[The great volatility of online information means that some of the URLs listed below may change by the time this article is printed. The date in round brackets indicates the version of the document in question. For current pointers please consult the online copy of this paper at http://www.ciolek.com/PAPERS/leiden-98.html

Site Meter
visitors to www.ciolek.com since 08 May 1997.

Maintainer: Dr T. Matthew Ciolek (tmciolek@ciolek.com)

Copyright (c) 1998 by T. Matthew Ciolek. All rights reserved. This Web page may be freely linked to other Web pages. Contents may not be republished, altered or plagiarized.

URL http://www.ciolek.com/PAPERS/leiden-98.html

[ Asian Studies WWW VL ] [ www.ciolek.com ] [ Buddhist Studies WWW VL ]