An exercise in visualisation: “model” seen through a network graph
This post details the theoretical considerations and the practicalities that informed the development of a network graph to visualise the etymological connections and semantic clusters associated with the English term “model”. These linguistic relationships are detailed in a forthcoming paper by Cristina Marras and Michela Tardella entitled “Modelling between Digital and Humanities: Looking Back and Forward”.
Marras and Tardella investigate the polysemy of the term “model” by exploring its etymological roots and by distinguishing words that have been closely associated with “model” and which, taken together, constitute a semantic cluster or network of associated terms. By outlining its etymological history and by identifying words that have been associated with “model” the authors do not aim to account for uses of “model” and its inflections in varieties of contemporary academic discourse. Rather, they delineate the semantic scope of these terms more broadly, thereby providing a ground for further investigations into the academic usage of these terms in the humanities.
Marras and Tardella identify two senses of modelling, one relating to ‘formal ways of reasoning and representation (mathematical and deductive forms for example)’ and one relating to ‘less formal processes of reasoning involving metaphorical, inductive, and abductive forms’. This paper investigates senses of “model” and its inflections that are related to both categories but notes that the latter has yet to be ‘“fully recognized” as a form of modelling adequate to DH’. After outlining the term’s etymology, interlinguistic equivalents and lexicography, the authors explore twenty-eight associated terms that together form a semantic net or cloud before concluding the paper with an overview of the terms in various disciplines in the sciences and humanities.
2. Some Preliminary Considerations
I took up the task of developing a visualisation that would support the discursive overview of the etymological roots of “model” and its associated terms. Because the relationships between these terms are complex and because there are a large number of terms associated with “model”, a visual representation offers a quick and easy way to make sense of these connections. The visualisation could then function as an index for the discursive outline provided by the paper. Developing this visualisation enables us to pilot these techniques in preparation for further visualisations that could be used to represent the findings from an analysis of a corpus of academic journal articles constructed to investigate modelling in the humanities.
As the visualisation will be reproduced in a forthcoming volume of articles on modelling in the humanities as well as being showcased on the project website, it would need to be developed in such a way that it could be reproduced as a static image. Furthermore, there were technical limitations relating to my own programming expertise, which was scant at best, and limitations relating to the length of time available to me on the project: at five months, there was little scope to learn to develop a suite of visualisations while constructing and analysing a corpus of journal articles. These constraints shaped my choice of visualisation and the platform I used to develop it.
While the visualisation is a supplement to a longer article, however, it also needs to be understood in isolation from that article when encountered solely on the project website. The open-access nature of the publication to which it is attached mitigates the strength of the dissociation between its source material and its representation online but the lack of context for the graph needs to be recognised as it will have repercussions for the way the visualisation is understood.
3. Choosing and Planning the Graph
After reviewing several possible options, which I discuss below, I decided that a network graph would be the most suitable option for representing the relationships between entities. It is also relatively simple to learn to produce within the available timeframe. I chose to use d3.js to develop this visualisation because it was free, offered a great amount of flexibility for control and styling of the final graph and is easy to adapt for future analyses—for example, by using statistical measures and frequencies to define the length of the connections (‘edges’) between specific terms and those that co-occur with them. I was soon to learn that for those of us with little to no programming experience, d3.js has a steep learning curve. Nevertheless, within a relatively short amount of time (Albeit inversely proportional to the level of frustration I was to experience in the following weeks), I was able to develop a graph that can be re-used throughout the lifecycle of the project.
Marras and Tardella explore etymological and semantic relatedness, encompassing historical and synonymous uses as well as terms in translation. Representing these different levels of relations meant that the visualisation either needed to flatten them or encapsulate two different organising principles (temporal and semantic, for example). I first experimented with a structure that organised etymological relationships as a “backbone” for the network graph, with the twenty-eight terms of the semantic cloud organised around this core. These twenty-eight terms would connect with the primary backbone at all relevant points to indicate a link between them (for example, a link between measurement and the Greek root of model). Using space in this way to represent these different levels, however, made the visualisation difficult to read with ease. The complexity that stratifying these levels spatially would introduce to the visualisation runs counter to the principle of simplicity and immediacy to which the visualisation was intended and so I chose not to represent these different levels spatially, but to introduce colour to group those terms that were related in a specific way (so that all Greek roots, for example, were of one colour). This meant that the network graph would not be organised in terms of time.
Another consideration that required some thought was that the authors did not analyse any datasets in quantitative terms to support the identification of qualitative relationships. This meant that the distance between nodes or the thickness of the connections between nodes couldn’t be used to represent, for example, the strength of the association between two terms based on the evidence of some supporting data analysis. The distances between terms were thus governed by aesthetic and pragmatic reasons and not by quantitative measures. While the network graph that I eventually constructed allows us to see which terms are related to which, the connections between terms are meaningful only insofar as they indicate some specific semantic connection raised in the article. The network graph thus shows us which terms are related to “model” and it shows us which of these terms are related to others, but it does not assign any information with regard to the type or strength of the connection itself.
Each group of related terms are colour coded and labels for each are contained in a legend. Devising the groups proved to be tricky as terms tended to overlap with one another. In some cases I adhered to the groups that were defined by the paper itself and created connections between terms that were clearly related. For example, the Latin root “Modus” refers to ‘a measure’, which is also one of the twenty-eight words explicitly noted by the paper to be semantically related to ‘model’. As the paper discussed Latin terms as one group and terms relating to measurement in another section of the paper, I chose not to collapse these terms into a single group. The benefit of this approach is that the network graph preserves the groups and the relationships discussed in the paper but still allows us to make these relationships explicit. In other cases, however, I created a single group for terms that are related in English. An example of this decision includes the terms “Resemblance”, “Copy”, “Imitation”, “Simulation”, “Similarity” and “Analogue”, which all relate to the concept of similarity. The choice of name for each group was derived from the terms within that group itself and is, of course, highly subjective; arguments for other choices of label could very well be made.
4. Implementing the Graph
We now come to the most time-consuming aspect of the visualisation, which I record here for posterity. Perhaps my travails will be of some use to those who follow in my footsteps. ‘You will rejoice to hear that no disaster has accompanied the commencement of an enterprise which you have regarded with such evil forebodings’ (Frankenstein). Indeed, the construction of the code for the visualisation resembled nothing so much as a monstrous stitching together of human and animal body parts in Frankenstein to create something new from its constituents. Implementing the graph involved a long process of trial and error, yet the flexibility offered for future adaptations made the effort worth the trial.
There were a range of network graphs available as models through the d3.js library. I opted to adapt Mike Bostock’s force-directed network graph. I needed to adjust various elements of the code and the json data for my own purposes, particularly with future applications in mind. To this end I inserted a legend, introduced new fields in the json file within which to insert quantitative data to determine the length and thickness of the connections between nodes and inserted labels for each of the nodes. As we were using Greek and Latin characters I also needed to ensure that they would be displayed correctly, and thus included a line of code to define the type of encoding the browser was required to read. The html file could then be dragged and dropped into a Firefox browser to view and interact with the dynamic visualisation.
I then created an Excel template that separated each part of the json data into individual cells so that new data could be copied and pasted into the relevant column. A final column that concatenated the data in the separated columns enabled me to quickly generate code that could be copy and pasted into a json file, which could then be used to generate the network graph in a short amount of time. The distances between all the nodes of the graph can thus be adjusted simply by entering new values in the relevant Excel column and copy and pasting the data from the concatenated column of the Excel spreadsheet into the relevant section of the json file. Likewise the width of all of the connections, or edges, between nodes can be modified, as can the groupings of terms that is used to generate the colours of the nodes and the accompanying legend. This enables us to re-purpose the network graph to generate visualisations of quantitative data based on linguistic and statistical analyses of textual corpora. The dynamic graph itself can be manually re-arranged so that static images can be generated through screen capture.
5. Future Directions
There were a number of features I attempted to implement but was unable to do so given the time and knowledge constraints I was working against. I initially attempted to define groups using hulls following Ger Hobbelt’s model. That proved difficult and so I scaled back my ambition and tried to implement hulls following Aaron Bramson’s model. The success I had achieved thus far persuaded me that I was equal to this no doubt trivial task. I was proved wrong and abandoned this approach in favour of a simpler colour-coded system with an accompanying legend to designate groups.
I also investigated other types of visualisation. In particular I was attracted to the notion of a timeline that could be embedded in a webpage. I successfully adapted an example using d3.js and so my programming expertise was not a barrier to implementing this visualisation. The reason I opted for a network diagram was because it lends itself to publication as a static image while an interactive timeline that requires the user to click through to access other parts of the narrative does not. However, because the paper tells a story of the development of the term “model”, a timeline offers the benefit of explicating this story in a visually attractive and intuitive manner. It would also allow me to extend the visualisation by including examples of the terms as they were used at various points in history in a manner similar to the Oxford English Dictionary’s online database of examples. This resource offers a useful touchstone for scholars wishing to explore modelling from interdisciplinary perspectives and could form a valuable addition to the project, perhaps as part of a postulated phase II extension.
Dendrograms offer another possible type of visualisation that could be used to visualise “model” and its roots and associated terms. I opted not to use this type of graph because of its hierarchised structure, which did not suit the kinds of relationships that I was portraying. However, these could be used to visualise findings from an analysis of the corpus of journal articles that I shall be constructing and is particularly suited to representing the terms that co-occur with “model”, its inflections or with associated terms. This visualisation offers another way to explore the large amounts of data that a corpus linguistic analysis of modelling in journal articles would uncover.