Asia Web Watch
Methodology
Editor:
Dr T.Matthew Ciolek
tmciolek@ciolek.com
Est.: 1 Oct 1997. Last updated: 1 Mar 2002
|| Asia Web Watch main page
|| Index of all Tables
|| Terminology
|| Methodological Issues
|| General Internet Statistics
|| Web Databases Statistics
|| Asia Online Materials Statistics
|| Patterns of hypertext linkages among Asia-based web sites
|| References ||
Sources of Data and Methodological Issues
This site makes use of data derived from a number of sources.
Internet Statistics
Statistical information on the growth of the Internet as a whole, as
well as the growth of one its major components, the WWW. These
interlocking sets of figures were
originally collected and published by Network Wizards (1977) and Zakon
(1977) as well as by Gray (1996).
Altavista Statistics
Statistics obtained from a series of systematic English keyword searches directed to the Altavista
database (Digital Corporation, 1997). Altavista is the world's second
largest database of Web documents. Assuming that currently an average
Web Server publishes approximately 49.5 documents one can estimate
from the Table 1 that the entire universe of Web-based information
consists of some 59.4 million online documents or pages. Armed with
this information we can see that while the Excite system appears to be
more complete (84% coverage of the world's web resources) it appears,
to 'know' about fewer documents on Asia, than does the smaller (52%
coverage) Altavista system.
Since this study is focused on Asia-related online documents, the Altavista
database has been selected as the chief source of intelligence.
It must be noted that statistics derived from Altavista pertain only to
English language documents. This is an important issue. The
choice of English as the language of enquiry means that this paper excludes
from its analyses approximately 10% of Altavista's Asia-related material,
simply because it was produced in other languages. On the other hand, the
decision to stick to material published in a single (and dominant) language has
greatly expedited the task of gathering the replicable data.
The final methodological decision related to the use of Altavista was to
convert raw statistics on a number of URLs (uncovered via keywords searches)
into estimates of equivalent Web servers. In other words, every 49.5 pages,
regardless of their actual provenance, were treated as a rough equivalent of
one web server. Thus, for example, a figure of 460 servers dealing with
Afghanistan was based on the finding that an Altavista query involving the
keyword 'Afghanistan' generates links to 22,770 distinct pages
(URLs) with that keyword. Also, a decision was made that all server statistics
are to be rounded to the nearest ten units.
October Sample
Results of a statistical analysis of content,
provenance, usefulness and other characteristics of a sample of scholarly or
factual online information resources relevant to the South East Asian studies.
A set of 270 web-sites has been extracted between the 23-26 October 1997 from
a population of 3247 English language online documents known at the time of
inquiry to the Altavista database (Digital Corporation, 1997). This relatively
large population of potential links was generated through a query containing
the string "South East Asian Studies".
The "October Sample" was arrived at through the quick weeding-out from the list
of 3247 web links any materials which appeared to be
(i) duplicates of their other online copies,
(ii) nonexistent,
(iii) irrelevant to South East Asia studies,
(iv) irrelevant to social studies research,
(v) personal pages,
(vi) useless (devoid of factual information, stupid, misnamed, bizarre, scatological
or childish), and finally,
(vii) inaccessible (the server would not respond
at the time of the attempted connection).
The final sample of 270 documents, or 8.3% of the initial population of "South
East Asian Studies" web links is, in fact, an outcome of a compromise between
the need to finish the data collection before an inflexible deadline and the
need to make the sample as large and as diverse as possible. In other words,
the "October Sample" data may be interesting but they do not come from a
systematic and comprehensive census.
Maintainer: Dr T.Matthew Ciolek (tmciolek@ciolek.com)
Copyright © 1997 by T.Matthew Ciolek. All rights reserved. This
Web page may be freely linked to other Web pages. Contents may not be
republished, altered or plagiarized. The www.ciolek.com editors do
not control or endorse the content of third party Web Sites.
URL http://www.ciolek.com/Asia-Web-Watch/methodology.html
[See also:
Aboriginal Studies ||
Asia Search Engines ||
Buddhist Studies ||
Ciolek - Research Papers ||
Global Timeline ||
|| Information Quality ||
Tibetan Studies ||
Trade Routes ||
Zen Buddhism
]