XTech 2005: XML, the Web and beyond.
This article describes the motivation for and development of a project I have called PepysMap. PepysMap was inspired by the excellent 'blog of the diary of Samuel Pepys run by Phil Gyford 1. Phil posts diary entries day by day (currently for the year 1662). Each blog post contains the text of the diary entry hyperlinked to pages containing detail of people, places and cultural artifacts referenced from the text. The goal of PepysMap is to shadow the development of the Pepys blog by creating a topic map for each diary entry, showing the relationships between people, places and cultural artifacts.
Two principle reasons motivate the development of the PepysMap. The first is a simple interest in the subject matter of London during the Restoration period and its relationship to the London of today. The other is a desire to understand better the process of topic map development faced with an initially incomplete knowledge set and the need to react to day-by-day changes. That said, other motives for continuing this work have since become obvious. Principly, the topic map is a novel, nonlinear way of approaching the diaries. This approach allows the reader to focus on a particular individual, a certain place or even to focus on food and drink, cultural events or the royalty. Furthermore, the diary of Samuel Pepys retains its fascination partly because of the position of the man in relationship to the developments of the period. Pepys is central enough to have important and influential colleagues, but also minor enough to have been able to record open and honest opinions. What this means is that the diaries themselves are a great hub for starting to understand the milleiu of 17th Century London in general. Hence it is hoped that the topic map of the diaries can serve as a hub for the creation of further topic maps focussing on other aspects of this period of history.
The development of PepysMap is following the model used by the PepysDiary.com site. Updates are posted separately for each diary entry, attempting to follow the same day-by-day publishing model of the 'blog. In addition to posting the topic map files, a weblog entry is created describing any new modelling issues and how they have been addressed. It is hoped that eventually the blog can serve as a living history of the process of creating PepysMap 2.
The topic map is modularised across multiple files and is created by hand using the LTM syntax defined by Ontopia 3. Currently, there is one LTM topic map for each diary entry plus one topic map covering people in the diary; one covering places; one covering cultural artifacts such as food, drink and entertainments; one providing a base ontology for family relationships; one providing the rest of the core ontology of the PepysMap, and finally one topic map providing dates. In addition to the LTM syntax topic maps, a single XTM syntax topic map is created once per month which merges all of the daily updates, providing a single file that contains all of the mapped diary entries so far 4. From this single XTM syntax topic map, an HTML rendition of the topic map is also created and posted to the web 5. At present, authoring is carried out by myself and by Stuart Brown of OxfordML.
As the topic map grows, however, the management overhead involved in creating a separate LTM topic map for each daily entry is gradually increasing and it is likely that over the coming months I will move to a suitable editing application.
The principle focus of the PepysMap is on events described by the diary. Each diary entry is broken down into a sequence of events. This approach is strongly influenced by the Historical Markup Language (HEML) 6 and shares a number of features in common with that markup language. One advantage in this approach is that there are certain aspects of events which are common to all events irrespective of their type. Events all occur at some point in time or across some extended period in time and may additionally have a temporal relationship to one another. They will typically involve one or more agents and one or more places. The agents involved in an event can be broadly categorised as actor or patient. These similarities enable a certain degree of consistency to be applied to the modelling. A further advantage of an eventoriented model is that events as first-class objects can serve as a nexus for complex relationships (such as cause-effect relationships) and can also serve to scope names or as the subject of another event (such as when two people discuss a play they have both seen). Finally, identifying events as first-class objects also enables them to be effectively indexed by the topic map, making it possible to quickly find all events of a particular type.
As an example of this event-based approach, consider the following passage from Pepys' diary entry for 2nd July 1661:
Spoke with several, among others my cozen Roger Pepys, who was going up to the Parliament House, and inquired whether I had heard from my father since he went to Brampton, which I had done yesterday, who writes that my uncle is by fits stupid, and like a man that is drunk, and sometimes speechless.
In this passage there are three related events.
On closer inspection, there is a fourth event lurking, that of Roger Pepys travelling to Parliament House. However, sometimes the secret to effectively modelling the diary seems to be to know when to stop!
There are some relationships that could potentially be modelled as events that are not currently modelled in that way. One example of this is marriage. When the topic map was first begun, I decided to use a naieve approach to modelleing marriage as a binary association between the married spouses, rather than as a separate topic in its own right. However, the limitations of this began to quickly shown up over the first three months of modelling work. In this case, not having the marriage of two individuals modelled as a topic means that it is not possible to scope the married names of women, or to put second marriages into context. There is one interesting case of one Elizabeth, wife of Robert Bernard who has the name "Lady Digby", that name having been conferred upon her by her first marriage to the 1st Baron Digby. After his death, she retained that title after her marriage to the lower-ranked Bernard. This kind of complexity turns out to be more common than one would expect!
A few other exceptions are made for simplicity that have not (to date) fallen prey to such complexity. For example many workers have a simple binary relationship to their trade or profession. As most of the working class of the time would keep the same trade for their working life, a simple association suffices. That is not true of political and miliary office however and a more complex event-based model for this is discussed below.
To give a flavour of the kinds of challenges faced in the PepysMap modelling exercise, the following are just some of the issues that have been addressed in the work done to date.
To aid in topic map modularisation and to serve the goal that the PepysMap could become a starting point for other topic maps about Restoration London, almost all people, places and cultural artificats described by the topic map are assigned a subject identifier. Where possible, the subject identifier used is a link to the PepysDiary.com site for the page that describes the subject in more detail. Such links are available for a large majority of people and places and for some of the cultural artifacts (for example, there is no listing of the plays seen by Pepys). Subject identifiers for a structural elements of the topic map such as identifying superclass-subclass and other hierarchical relationships are taken from existing vocabularies published by TopicMaps.org, Ontopia and Techquila. Where possible Ontopia's PSIs for culture have been used.
However, this still leaves a large number of concepts and entities that are not identified by any pre-existing topic map resource (known to the author). For a very few we have used some authoritative resources including Wikipedia 7 and Luminarium 8. For the rest, new identifiers have been created under the namespace http://www.techquila.com/psi/. At present, no resources exist for the identifier URIs. It is intended to create those resources and if possible to follow all of the OASIS recommendations for Published Subject Identifiers.
At the time that Samuel Pepys writes, England had yet to adopt the 'new' Gregorian calender. That would not happen for another 90 years. As a result, the dates he gives are using the Julian calender. Hence each date is assigned two separate names, one giving the date in the Julian calender and the other the Gregorian date. Additionally, as the legal/ecclesiastical calender years are based on March 25th as year start, it is in fact necessary to provide three separate date names for dates in those overlap periods.
The approach taken here is simple enough. The topic represents a particular point in time, rather than a date. Hence it would be possible to also attach names for other calender systems if that were required. The Gregorian calendar date/time (in UTC) is mapped to a URI and used as the subject identifier for date/time topics.
The modelling of offices held by individuals introduces a number of significant issues. Firstly, a period in office is bounded temporally, even if in some cases we do not precisely know what those bounds are. Secondly, several offices confer a title on the office holder, such as "Lord Privy Seal" or "King". Thirdly a single office may be held consecutively by any number of individuals and equally a single individual may hold a number of offices concurrently. To model all of this, it is necessary to treat the holding of an office as a first-class object. This fits naturally within the event-based model used by PepysMap.
As an example of this, let us consider a quite complex case in point. At the time of Pepys' diaries, James Stuart, brother of King Charles II, is Duke of York and Lord High Admiral (both positions conferred by his brother). In the future, he will become James II of England. Taking his position of Duke of York, this is modelled as follws (in LTM notation):
[james-stuart-duke-of-york : office-holding-event
= "James Stuart, Duke Of York" ]
[duke-of-york : office = "Duke of York"]
participation( james-stuart-duke-of-york : event,
james-stuart : office-holder,
duke-of-york : office-held,
charles-stuart : office-conferer )
occurs( james-stuart-duke-of-york : event,
year-1643 : start,
date-16850206 : end )
The office and the event of James Stuart holding the office are modelled as two separate topics. An association of type "participation" is used to identify the office holder, the office held, and the conferer of the office. A second association of type "occurs" is used to identify the temporal range of the office-holding event. In this case the start can only be narrowed down to some time in the year 1643, whereas the end conincides with James Stuart's ascendency to the throne and so we have a more definite date for that.
A similar structure is used to model James Stuart's position as Lord High Admiral and later as King:
[james-stuart-lord-high-admiral : office-holding-event
= "James Stuart, Lord High Admiral" ]
[lord-high-admiral : office = "Lord High Admiral"]
participation( james-stuart-lord-high-admiral : event,
james-stuart : office-holder,
lord-high-admiral : office-held,
charles-stuart : office-conferer )
occurs( james-stuart-lord-high-admiral : event,
year-1643 : start,
date-16850206 : end )
[james-ii : office-holding-event
= "Reign of James II of England"]
participation( james-ii : event,
james-stuart : office-holder,
monarch-of-england : office-held )
occurs( james-ii : event,
date-16850206 : start,
date-16881105 : end-after )
The last remaining issue is the names that these offices confer. This is dealt with simply by providing all of the different names and scoping each by the associated office:
[james-stuart : man = "James Stuart";"Stuart, James"
= "Duke of York" / james-stuart-duke-of-york
= "Lord High Admiral" / james-stuart-lord-high-admiral
= "James II of England" / james-ii
@"http://www.pepysdiary.com/p/800.php"]
A significant part of the diaries concern government and government institutions, such as the Navy Office where Samuel works. Initially in creating the topic map a mistake was made that arises from the fact that many government buildings are named after the institution housed there. So the name "Navy Office" refers both to an organisation and to a physical place. Over time it has become apparent that there is a need to separate the two. Due to the Topic Naming Constraint that requires that two topics with the same name in the same scope be merged, we make use of scope to distinguish between otherwise identical names and also create unambiguous names with no scope.
For example, The Exchequer, institution and location are modelled as two separate topics:
[exchequer-institution : institution
= "The Exchequer (institution)";"Exchequer, The;institution"
= "The Exchequer" / institution
@"http://www.pepysdiary.com/p/290.php"]
[exchequer-building : building
= "The Exchequer (building)";"Exchequer, The;building"
= "The Exchequer" / building
@"http://www.pepysdiary.com/p/242.php"]
It is worth noting at this point that we have also chosen a consistent approach to assigning sort names to topics (in LTM notation, the first string after the = sign is the display name and the second one is the sort name) that makes use of a semi-colon delimiter between the entity name (as a sort string) and a disabmiguating string. The sort names are thus similar in form to primary terms and secondary terms in back-of-book index.
Another staple of the diaries is Samuel's penchant for the theatre. In modelling these events, we need to be able to relate the performer and the work performed as well as the theatre where the performance takes place and members of the audience. There are a number of examples of Samuel noting the presence of other people in the audience. Indeed sometimes he seems to pay more attention to the audience than to the play!
The nexus for describing attendence at a performance is the performance itself. The performance is modelled as an event with participation by the performer and the work performed. Separate associations are used to model each group of attendees at the performance. For example, there is this entry for 17th August 1661:
...after dinner Captain Ferrers and I to the Opera, and saw "The Witts" again, which I like exceedingly. The Queen of Bohemia was here, brought by my Lord Craven.
The performance itself is modelled as a topic of type "Performance". "The Opera" in this instance refers to a specific place, not to opera in general. Additionally, from the location, we have assumed that the performance is staged by William Davenant's Opera Company.
[event-16610817-05 : performance
= "Performance of 'The Wits' at the Opera (17th August 1661)";"1661081705"]
occurs(event-16610817-05 : event, today : on)
participation( event-16610817-05 : event,
davenants-opera : performer,
the-wits : performed-work,
the-opera : place)
Captain Ferrers and Samuel Pepys form one group of attendees and their presence is modelled as single association:
participation( event-16610817-05 : event,
samuel-pepys : audience,
robert-ferrers : audience )
Similarly, Elizabeth Stuart (she's the "Queen of Bohemia") and William (Lord) Craven form a separate group of attendees and are modelled in a separate association.
participation( event-16610817-05 : event,
elizabeth-stuart : audience,
william-craven : audience )
By using this convention, it is possible for the topic map to convey some sense of the clustering of people in the audience and allows us to answer the question "Who has Samuel accompanied to the theatre ?" as well as "Who has attended the same performance as Samuel ?".
One of the more complex types of event to model is travelling. As you might expect, a journey is modelled as an event. Participation in the event is played by the traveller or travellers. The other two pieces of information provided (where known) are the route taken and the method of travel.
The route taken is modelled as an association between the travelling event and the places along the route. Special roles indicate the route start and end; a further role of "via" ind icates any stopping points along the way and a role of "en-route" indicates a final destination which is not achieved by the travelling. This latter role type is useful for longer journeys which may take several days or be broken by stops to visit people or places, as well as for journeys which are aborted before reaching the final destination.
The method of travel is modelled by simpler means using a simple binary association between the travel event and the method of travel. In some cases, only the principle method of travel is referred to. For example if Samuel writes in his diary that he travels from home to Westminster by boat, only the "boat" method of travel is modelled. The fact that he would have had to walk to Blackfriars Stairs to get the boat is considered to be fine detail. However, where Samuel himself is specific about a change of mode of travel, that change is modelled by building consecutive travelling events for the different modes.
As an example, from 6th August 1661:
...took horse for London, and with much ado, the ways being very bad, got to Baldwick, and there lay and had a good supper by myself.
This information is modelled as a travelling event with a route from Brampton to Baldwick, en-route to London:
[event-16610806-08 : travelling-event
= "Samuel rides towards London, but stops at Baldwick (6th August 1661)";"1661080608"]
occurs (today : on, event-16610806-08 : event)
participation ( event-16610806-08 : event,
samuel-pepys : traveller )
route-taken ( event-16610806-08 : journey,
brampton : route-start,
baldwick : route-end,
london : en-route-to)
method-of-travel ( event-16610806-08 : journey,
horseback : method)
Simply keeping up with the diary is almost as much as one person can handle on a part-time basis. That said, there are a number of other goals that ideally should be tackled.
Work on the PepysMap started in July 2004 with entries for July 1661. There is about a year-and-a-half of diary entries for 1659/60-1661 to be mapped.
Phil Gyford holds geospatial data for many of the locations mentioned in the diary. In collaboration with him we plan to add this information to the topic map which could enable the development of map-based visualisations of the diary.
Many of the people, places and cultural artifacts in the diary are important enough to have been written about or otherwise depicted in their own right. The PepysMap topic map could serve as an interesting index to these resources. Indeed the topic map could be used to index non-electronic resources such as portraits in places like the National Gallery or modern performances of the plays mentioned.
As the development of PepysMap has been somewhat organic, there is a need to revisit the entries mapped so far and edit them for consistency, both in overall modelling and in naming. Tools such as TMBrowse (part of the TM4J TM4Web tool [6]) provide a useful overview of the topic map, but detailed editorial work will require more tooling.
In addition to editorialising for consistency, it is hoped that this work could attract historians and other scholars with an interest in the period and that input from more knowledgeable contributors could be used to increase the density of linking in the topic map and correct any factual errors.
An important part of this editorial work would involve the creation of descriptive resources for the subject identifiers created in the http://www.techquila.com/psi namespace for entities and concepts contained in the topic map.
By creating a filter to convert PepysMap topic map data into HEML markup, it would be possible to make use of HEML tools for creating SVG timelines and maps of the events described by the diary. It is possible that this integration could be achieved by creating an XSLT transform from the XTM syntax of the merged PepysMap to the HEML syntax.
Kal Ahmed
Networked Planet Limited