From Private Ink to Public Bytes:
the practice and theory
of future GIS-ready online
cultural data-sets
Dr T. Matthew Ciolek,
Research School of Pacific and Asian Studies,
Australian National University, Canberra ACT 0200, Australia
tmciolek@coombs.anu.edu.au
http://www.ciolek.com/PEOPLE/ciolek-tm.html
To be presented at the
Electronic Cultural Atlas Initiative (ECAI) session of the
Pacific Neighborhood
Consortium (PNC) Annual Meeting,
City University, Hong Kong, PRC,
15-20 January 2001
[a first and never completed draft - tmc, Aug 2024]
Document created: 3 Oct 1999. Last revised: 1 Oct 2000.
0. Abstract
1. Introduction
A meeting in North African city
Once upon a time, or more exactly, on a warm spring day of 'anno
urbis conditae' MCXXVII (374 CE.) in a provincial city of Hippo
Regius [Long: 7.733333, Lat: 36.866669, ADL Feature ID: 1050346], a
young but already promising jurist (and poet) Lucius Afranius went
to do some purchases in a reed baskets' shop. Upon completing the
transaction he re-entered the street and spotted
there, next to a stall with incense and medicinal herbs, two eminent
scholars: Rufus Optimus and Aetius Maximus. The first man was a learned visitor
who a few months earlier came to Hippo from as far as Leptis Magna. The second one was a local
celebrity. The following is an imperfect reconstruction of a three-way conversation which ensued.
Afranius: "Ave, amici. Now, with all your exotic shopping completed, where are you proceeding to?"
Optimus: "Ave, young man. Good to see you. My good friend Maximus has invited me to his villa, in the Curtius hills, to
show me an olive tree he planted there the other day."
Afranius: "What is so exciting about such a tree? You saw one, you have
seen them all, that's my opinion."
Maximus: "Ave, Lucius Afranius. The olive sapling is merely an
excuse. We need one, for the great Rufus Optimus and I are going to have a lengthy, and much needed dare I say, exchange of
ideas. Verily, the best way to discuss things in a serious manner is
the peripatetic way."
Optimus: "We are discussing, dear Afranius, the best way to collect
and record information about trade routes which ply the seas as well as
provinces of our illustrious Empire."
Afranius: "Why this sudden interest in the trade routes? Are you
planning, noble sirs, a business venture?"
Maximus: "No, my impressive young
colleague. Not a business venture. An intellectual one. I view them as
a good example of an Eckenstein Boulder."
Afranius: "A boulder?, an Eckenstein Boulder?".
Optimus: "Yes. Before you do truly difficult things, you should first try your
skill (and luck) on simpler ones. Before climbing a tall mountain, you
needt to test your rock-climbing techniques on a small, but sufficiently
challenging scale. For details of such Eckenstein's training device see (Newby 1974:37)
or Ciolek (2000. Digitising Data...). In short, if you manage to handle minur difficulties of the
boulder all right, you can progress to tackling more complicated
things, such as climbing snow-clad mountains."
Afranius: "I see. Such as?"
Maximus: "Such as handling data about polygons."
Afranius: "What do you mean? I am lost."
Optimus: "Handling data about roads and communication lines, whether
they are used for trade, pilgrimages or merely movements of our
messengers and legions, boils down to handling data about points or
places, Ciolek (2000) calls them 'nodes', and the data about
interconnecting lines."
Afranius: "No, this cannot be so. Roads, let alone maritime shipping
lines, are full of curves and sudden changes of directions. A simple direct line
cannot adequately represent a variety of paths a traveller takes
travelling from one city to another, from one harbour to the next
one."
Optimus: "No worries, it can. A straight line in Ciolek's conceptual scheme is
always a generalization, a temporary hypothesis, established only to
be replaced by subsequent more detailed determinations. Any curved line,
regardless how much it weaves and meanders can always be constructed
from n straight segments. These segments can be kilometers or just a
few meters or centimeters long. No worries, then."
Afranius: "Now I see. A clever approach. A very clever one, because it
means that when collecting data on all the roads which lead to
Carthago (pardon this joke, I could not resist it), you do not worry,
prematurely at least, about topography and exact topographical detail,
you simply collect data about location of as many as possible points
through which these roads lead, and the plot them on a map, and draw
all the interconnecting lines."
Optimus: "Exactly, my friend, exactly."
Afranius: "But still, I am puzzled. If you are in such a good
agreement on the Ciolek's methodology, why do you have to walk as far
as those distant hills, to Maximus' villa and to the freshly planted
olive tree? Certainly your conversation could be completed in far
shorter time. You could take a nice stroll to the forum instead. Will your
conversation be about the polygons, by which I take you mean a closed
(like a loop) series of nodes connected by short stretches of lines
which can be used to identify some area, small or largem does not
matter here, such as the wheat-fields of or city, or that of various
provinces of our Empire, or the multifarious and much overlapping and
tangled areas of terrains inhabited by speakers of a given tongue?"
Maximus: "Well (to use a Celtic way of signalling lenghty utterances),
our conversation-to-be will not deal much with points, lines or
polygons. The first two items have been already amply discussed. The
third one represents a separate problem, one which we shall look at
some other time. The reason why we walk towards my house and the
increasingly more famous olive tree is a different one. It deals with
the manner we scholars should manage new computing and networking
technologies. It deals with the question of how should we take the
true advantage of the opportunities afforded by these technologies."
Afranius: "Computing and networking? Porca miseria! What are
they?"
Optimus: "Do not worry. These are electronic tools. They will be
invented long after you completed your long and fruitful life."
Afranius: "I see. Carry on, Maximus. What do you mean by the true
advantage?"
Maximus: "You see, normally we are accustomed to making very good but
old uses of new technology, whereas what is really needed is the new
use of it, one which is commensurate with the full power offered by a new
tool, by a new resource."
Afranius: "For example?"
Optimus: "Examples are easy. Worthwhile practical solutions are not.
Millennia ago our ancestors used stone choppers and wooden clubs to
hammer their oponents into submission. The advent of metal, bronze and
iron, changed all that. Now warriors could use swords. However, they
did not use their weapons to bash or slap, but, instead to slash and
stab. The time-honoured practice of bashing someone on the head with a
lump of metal would be a silly operation indeed."
Afranius: "Sure, you made your point eloquently. So the two of you are
trying to figure out how these computers and electronic networks
should be put to a sensible scholarly use, not necesserily the one to
which we have been accustomed on the basis of past experiences "
Maximus: "Yes. The key questions are manyfold. For instance, how do we
arrange our research and publishing activities that:
- anthropological and historical data are created once but can be used many times;
- the data are easily repurposed, that is used in several often disperate contexts and without
too many transformations and adjustements;
- the data are easily disseminated to all interested parties;
- the data are disseminated but without simultaneous propagation of errors;
- the data can be incrementally verified and corrected;
- the necessary corrections can be carried out simply, easily, and
inexpensively (in terms of resources and time).
In short, how do we bridge the gap between the culture of 'Ink' and that of 'Bytes', and
have the best of the two worlds."
Optimus and Afranius [calling out unisono]: "'Ink' and 'Bytes', what do you mean?"
Maximus: "I will explain these terms later. But essentially, what I mean is a
simple but nagging question 'How do we move from the condition where
information which is
used as an illustration, that is, as a mere self-satisfied and static
picture aiding some point in a discourse to a situation where information is fragmented into a
myriad of elements, with each of one being capable to be used as verifiable
datum, as a computable, interchangeable, and correctable fact'."
Optimus: "Wow, a tall order indeed."
Afranius: "Is the house far enough? It sounds like an interesting morning. May I join you
in your peripatetic exercises?"
Maximus and Optimus [jointly]: "Yes, of course. Come with us. And we
promise, you will not regret it."
Optimus' tale:
Afranius: "So, where do we start our journey?"
Maximus: "At the beginning. We shall start by asking ourselves, how
would we go about collecting and mapping data, say, on the trade
routes and other movement and communication lines, established
between Alexandria in Egypt, and Merv in Persia. Optimus, my friend,
how would you proceed, assuming that you have no other obligations.
And - also - assuming, that you have access to all the equipment and
software you need for the task."
Optimus:
a. drawing a series of maps (sketch or detailed)
b. direct digitisation and input into a GIS system
c. full-text database of images and text-fragments
However, problems with each of the approaches
- problems with the sources - their are messy
- problems with technology
* mapping: drawing is fiddly, needs expertise, lots of time and ADDs to the confusion and crowding
* direct digitization: scanning, GISing require costly equipment, need expertise, are fiddly (what do you do with
old maps (illegible, no coords, no proj)
* mess is searchable, but it continues to be mess.
Maximus' tale:
a. some general thoughts
- all information consists of fragments which have triple aspect (structure,content,presentation)
- one needs to capture them
- one needs to use them as lego blocks
b.possible strategies
- XML-based annotations
- 'strong' database of information fragments
- 'weak' database of information fragments
Advantages and disadvantages of each of the approaches
Optimus is curious, asks for elaboration.
Maximus obliges:
- fragment all inf. into constituent semantic elements
- use the same fragmentation procedure for ALL information, a common template
- bring fragments togetheer into data sets organised by place and date
- place the sets into containers which (a) provide data themselves (i) original (ii) derivitatives; (b) directional meta-data;
(c) modifications meta-data; (d) analysis meta-data;
- provide the each data with geographic coordinates
- use them to do: (a) statistical analyses; (b) create maps (using a GIS software)
as you please.
Optimus is exited, and asks for more information.
Maximus obliges:
2. Practice
Basic principles
* all information is collected to a common format
* all data and procedures are made explicit
* all data and procedures are made public
* all data are correctible
* all data reside in well shaped containers
* all information can be collected by hand or by a computer
* all information ends up online
* the principle of public visibility
* each fragment of information being uniquely identified
* each fragment of information being always linked to it source
* each fragment of information is evaluated Lines QA3, Places (exact or estim coords)
* all data about lines carry problem/no-problem tag
* data production is separated from data analysis (however initial data analysis is
used to provide data quality check)
* each fragment is treated like a hypothesis
* hypotheses have hierarchies of certainty
* several hypotheses pointing in a certain direction enable us to make an
informed decision
* principle of separation of Data from Conclusions, data points from maps (which are models)
* principle that you can mix and match records but you can
always return to original data-set, and to the meta-information about it, and - therefore -
to the source.
* principle that ultimately it is not the place-name, not its coords but its ID
number which keeps the bastards honest.
Basic steps
- Locate the source of information
- Take the record (xerox or simple OWTRAD notes)
- Expand the notes
- Shape the OWTRAD notes, make them fully fledged
- Create a temp list of novel Nodes
- Check them against OWTRAD GAZETTEER
- For all missing nodes
- get their coords from AZDl/TGN etc
- determine main geographic name
- determine their variant names
- add the data to the OWTRAD DB Gazetteer
- for data which still have the missing coords, estimate the coords.
- Proofread the initial data
- Load the initial data into data set container (XHTML compatible, meant to sit online)
- Give the container a meaningful name.
- Create meta-data, according to the OWTRAD template
- Load the initial data into Places Processor
- Match data-set names with Gazetteer names, obtain coords
- Export the Places data
- Load the initial data into Routes Processor
- Match data-set names with Gazetteer names, obtain coords
- Export the Routes data
- Add Places headers
- Add Routes headers
- Import places into a GIS software, display them on a map.
- Visually inspect and verify the correctness of the data (use 2 & 4 as control)
- Correct, repeat previous steps till satisfied
- Import routes into a GIS software, display them on a map.
- Visually inspect and verify the correctness of the data (use 2 & 4 as control)
- Correct, repeat previous steps till satisfied
- Store final results of 21 and 24 in a data-set container (point 9 above)
- Store the container + dara online
- Notify interested parties about its existence, invite corrections
- Act on corrections
It is an idealised picture, in poractice corners are cut.
Basic tools
- container template
- notation system
- gazetteer of coords
- database keeping track of id numbers
* relational
* resides on HD (to be moved OL)
* Elements (NODES + GAZETTEER)
*
- dbase data convertors
- template
- populated templates (place1, place2, place3, routes1, routes2, routes3),
one for each of the data sets. The idea is to put emphasis on usable, disseminable
data sets, not on a central database (a mausoleum). Databases are implicitly
guarded against changes, data sets are not.
- notebook with facts to guard the truth against outrageous claims made so frequently by ourselves or others.
(in fact can be made into a separate data-set)
Summary
- OWTRAD methodology is suitable for creation all types of data based on
points and lines (but - not yet - for polygons[areas]
- it is a general purpose system. It can handle information about
* cities, villages, monasteries, historical monuments, battlefields - ie. for all
types of infirmation which can be conceptualised as points.
* trade routes, roads, irrigation channels, social and business relationships
(eg. map
xx in YZ Fugger house), paths taken by armies and travellers (eg. Alexander the Great,)
topogenies (Fox) i.e. all information which has data about pairs of points (start-end).
Shortcomings of the OWTRAD methodology
* slow development stages (18 months for the lines, points still need more thought,
polygons have not ben touched upon.
* question of balance - speed of use, usability vs precision
* dissemination of the incipient/immature technology
* how to gain following - notghing is won by having several OWTRAD methdologies,
It would be like having several ways of noting down the same set of numbers
Afranius: are you suggesting that Soc. sc. must undergo a revolution. Must they
agree on a common set of concepts and common way of representing them?
* Danger: loosing audience's attention: the crying wolf problem. Too many corrections
(like with the MimeTap software developed by Titus) does sound boring.
3. Theory
Optimus: "There is a method in this madness. Could you elaborate?"
Maximus obliges:
* Informational structure of scholarly endavours
3 levels: theory, models and data
5 aspects: (i) actual content; (ii) termonology; (iii) methodology; (iv) apparatus
and documentation; (v) meta-data
Afranius: "well it looks like another of your belowed tables, matrices of relationships".
Maximus: "Yes, indeed. But in fact the situation is more complex
There are info-soup layers between data and models and models and theory layers.
- data and models: need informational hooks, to other data and other models.
* to verify them,
* to synthetise them
- hooks in form of references to places, dates, names of individuals (or at least
names of peoples or cultures)".
Optimus and Afranius, unisono: "Why?"
Maximus: It has to do with the transition from "ink" to "bytes'" way of thinking
about our work.
An example?
Negroponte (1995) made a distinction between atoms and bits (19...), I suggest a distinction
between Ink and Bytes. It is in fact a symbol for two modes of thinking:
INK:
(a) private, (b) preoccupied with content; (c) idiosyncratic, (d)
hand-crafted; (e) solitary; (f) holistic (g) slow, (h)
portable, (i) technology independent, (j) errors invisible.
BYTES: (a) public, (b) preoccupied with formats, (c)interoperatble;
(d) machine-produced; (e) cooperative, (f)
fragmented, (g) quick, (h) desk-bound, (i) technology dependent, (j) errors
transparent.
Optimus: it looks like another table of yours
Maximus: yes. In fact it is a fragment of a much large table, one
involving 5 modes of thinking and 19 variables. Here it is [table will be shown here]
The fifth column is the environment to which we should strive to arrive.
Afranius: OK, so the data will be collected more easily, stabndardised, disseminated,
and used. Will it make humanties and social sciences, Quadrivium and Trivium,
a better place?
Maximus: I do not know, but if I see an area which needs repair, an improvement,
and I fail to do so, I will be a lesser human being.
But here we, finally, are - in my garden. Can you see the freshly planted olive tree?
A cheerful and energetic little fellow, no more than 1.5 m tall. An yet, already
in a few months it will bear the fruit. Can you see
how beautifully it fits the setting? How much sun it soaks up every day? How
greay and silver are its leaves? How they shimmer and scintillate in the light?
Doesn't it gladden your hearts? It certainly gladdens mine.
[to be continued ...] [and it, sadly, never was - tmc, 21 Aug 2024]
4. End Matters
Appendix 1: OWTRAD meta-data template
Appendix 2: A fragment of an OWTRAD notebook entry
Appendix 3: OWTRAD notation system (v.2.0, Oct 2000) [see http://www.ciolek.com/OWTRAD/notation.html]
Appendix 4: OWTRAD rules
Appendix 5: A specimen of an OWTRAD dataset - for instance
Ciolek, T. M. 2000. Georeferenced data set (Series 1 - Routes): Roads in India during Mughal rule 1556-1707. OWTRAD Dromographic Digital Data Archives (ODDDA). Old World Trade Routes (OWTRAD) Project. Canberra: www.ciolek.com - Asia Pacific Research Online.
www.ciolek.com/OWTRAD/DATA/tmcINm1550.html
9. About the Author
Dr T. Matthew Ciolek, a social scientist and networked knowledge architect, heads the Internet
Publications Bureau, Research
School of Pacific and Asian Studies, The Australian National
University, Canberra, Australia. His work and contact details can be
found online at http://www.ciolek.com/PEOPLE/ciolek-tm.html
10. Acknowledgements
I am grateful to xxx for their critical comments on the
earlier version of this essay.
11. References
[The great volatility of online information means that some of the URLs listed
below may change by the time this article is printed. The date in round brackets indicates
the version of the document in question. For current pointers please
consult the online copy of this paper at
http://www.ciolek.com/PAPERS/pnc-hongkong-01.html
- Ciolek, T. Matthew. 1999. Paper and
Network Scholarships: The Logistical Limits and Futures of Cultural
Studies (v. 15 Dec 1999).
Paper presented at the ECAITech
session of the Pacific Neighborhood Consortium (PNC) Annual Meeting,
University of California at Berkeley, Berkeley, USA, 13-17 January
2000.
www.ciolek.com/PAPERS/pnc-berkeley-01.html
- Ciolek, T. Matthew. 2000. Digitising
Data on Eurasian Trade Routes: An Experimental Notation System.
Paper presented at the ECAITech
session of the Pacific Neighborhood Consortium (PNC) Annual Meeting,
University of California at Berkeley, Berkeley, USA, 13-17 January
2000.
www.ciolek.com/PAPERS/pnc-berkeley-02.html
- Negroponte, Nicholas. 1995. Being Digital. New York: Alfred A. Knopf.
- Newby, Eric. 1974. A Short Walk in the Hindu Kush. London: Picador Pan Books Ltd.
- Paul, Diane B. 1987. The Nine Lives of Discredited Data. Sciences, vol. 27, no. 3. pp.26-30.
- Tufte, Edward. 1997. Visual Explorations: Images and Quantities, Evidence and Narrative. Cheshire ConnecticutL Graphics Press.
12. Version and Change History
- Revisions, so far, incorporate minor editorial and markup fixes.
Maintainer: Dr T.Matthew Ciolek (tmciolek@ciolek.com)
Copyright (c) 2000 by T.Matthew Ciolek. All rights reserved. This Web page may be freely linked
to other Web pages. Contents may not be republished, altered or plagiarized.
URL http://www.ciolek.com/PAPERS/pnc-hongkong-2000.html
[ Asian Studies WWW VL ]
[ www.ciolek.com ]
[ Buddhist Studies WWW VL ]