The visible part
When people visit and engage with Your Paintings Tagger, what they see is only the tip of the iceberg of the entire Tagger system and infrastructure. Tagger is a complex system, with a complex work flow.
The main idea is to get many people tag many paintings, repeatedly and redundantly: individual tags do matter (but not on their own), but it’s the complex contribution of the community as a whole that determines what becomes publicly available.
The Tagger Interface
This article covers our approach to Tagger and how things work, both in terms of the visible part as well what goes on under the surface.
Tagger has three types of users:
- The general public. Everyone who registers may start tagging immediately. No specific knowledge or expertise in terms of art and (hopefully) technology is required to tag, as one of the main aims of the project is to collect tags and descriptors that are useful to virtually everyone. The general public will contribute Things, Events, People, Places and – after some warming up and the first ten paintings tagged – also Classifications and Categories.
- Experts. When registering, contributors may disclose whether they have a particular expertise in the history of art, by supplying their qualifications and credentials. These are quickly checked by academics at Glasgow University and, if verified, users are given the expert status. Experts have two roles: in the first instance they are given access to two extra workflows for each painting (dates and styles&movements), which deal with these more academic classifications; furthermore, they will help classifying difficult paintings, after the general public will have failed to deal with them (more on this later).
- Supervisors. Supervisors are hand picked academics, invited to take part in the project because of their specific fields of expertise in the history of art (e.g. Baroque, Renaissance or Abstract Art). Their involvement includes classifying extra difficult paintings, which are those that not even the experts have been able to classify, as well as dealing with uncertain tags (more on both of these later).
Workflows and Vocabularies
Workflows have different constraints, vocabularies and help features. Generally, tags are associated with controlled lists, with terms appearing when users start typing. Users are not forced to use the lists or the suggested terms: in most cases they may enter their own terms: a local nobleman that is the sitter of a painting may be excluded from a general vocabulary, such as Wikipedia.
- Things/Ideas are populated by the Oxford English Dictionary (Lexicon edition), by Oxford University Press. When typing, users are presented options that match main as well as alternative terms, excluding what OED considers to be highly offensive terms; People, Places and Events from the dictionary as also excluded, as these are considered in subsequent workflows.
- Events, Places and Names are lookups of DBpedia, the database format of Wikipedia, retrieved through the Google Search APIs. We did try to use Geonames for places, but the results we were obtaining were not satisfactory in terms of ease of use, with lots of repetitions and trouble in trying to get the most important places at the top of the list. Although Qi (the Collection Management System) integrates ULAN, AAT and TGN it was decided not to use these, as the audience focus is more towards the general public (hence Wikipedia) than the academic community.
- Classifications and Categories are perceived as more specialist and academic tasks, with lists devised by University of Glasgow, following agreement with the BBC and specialist user research conducted by Flow Interactive.
- The list of Styles and Movements was created by University of Glasgow, which has also gone through the monumental task of pre-assigning possible styles to each one of the over 25,000 artists in the database. The purpose of this operation is to make sure than only styles that are relevant to the given artist can be selected (e.g. Picasso may often be baroque in his personal style, but his paintings cannot historically belong to the Baroque style).
- Finally possible dates can be entered only for paintings that don’t already have one. They are subject to constraints from the artist life dates, if known, to ensure that unfeasible dates are not accepted. We did consider using other dates (acquisition, bequests, etc), but the fuzzy structure of the source data would have not allowed it.
Tagger shows paintings to users at random. It is designed to collect a predefined number of tags for each painting, from different users and for each of the different workflows.
Internally, the PCF manages paintings available for Tagger by catalogue, with the option of prioritising which catalogues are displayed and tagged first. At launch, some 89,000 paintings were available to tag: to enhance efficiency and make sure that results are delivered in batches within sensible time frames, the software groups them by catalogue and delivers to audiences one catalogues at a time. So between 1,500 and 5,000 paintings are constantly going through Tagger.
A user sees a painting only once: he/she can either tag it or skip it, but the same painting won’t be displayed again to the same user. The painting will be displayed to other users until a satisfactory number of tags for each workflow have been obtained. Better, until a satisfactory number of contributors will have been completed each workflow. There is an important difference: we cannot predetermine how many tags we can expect for a painting (abstract paintings may not have that many things that can be noted down), but we are expecting that a given number of users looks at the painting and try to tag it – or confirm that indeed there is not much to say about it.
Once the predetermined numbers are reached, the painting is taken off Tagger for post-processing (more below).
Some paintings may not reach the necessary quota of tagging actions, within the given time available. If this happens, the paintings are prioritised in the queue of expert taggers, to get them to help further. At this stage, given we are using experts, the number of expected tags per workflows is lowered, but tags are still considered at the same level of the ones from the general public.
For particularly difficult paintings, even experts may fail. At this stage, the impossible paintings are served only to supervisors, and only once: anything that a supervisor tags is automatically accepted.
The review process
Tags of paintings that have completed the tagging process are subject to a two-stage analysis, before their classifications are passed to the BBC and Your Paintings via the API.
- In the first instance, a software devised by the Citizen Science Alliance at University of Oxford (Department of Astrophysics) considers a number of factors to determine which tags should be automatically accepted and which one automatically discarded. This is based on a number of factors, including:
- Number of agreeing tags. If a many people see a cat in a picture, there probably is a cat in the picture
- Quality of taggers. Statistically, taggers are better at tagging the more they tag. Novice taggers score slightly less than people who have tagged tens or hundreds of paintings. But quantity is not the only determining factor: the software takes into account who among taggers score higher in terms of quality of tags (i.e. how many of their tags are among the ones that are accepted). This makes sure that – potentially – a tag is not discarded only because only two or three people have noticed a smaller detail in a painting (and this is why registration and login is crucial to the process).
- This software is the same – of course adapted for the very different type of data – that has been pioneered for Zooniverse and similar projects by the Citizen Science Alliance.
- The automated analysis will hopefully determine the fate of most tags, but there will be some that cannot be dealt with automatically: there will be paintings with very few tags, that are too difficult, that people don’t like – or tags that that won’t gather enough consensus, but are not weak enough to be discarded automatically. All of these will need human intervention from supervisors: through a dedicated, separate interface, supervisors will examine each painting that have ambiguous/suspect tags and either approve them or discard individual terms.
The supervisor's interface
The making of Tagger
The design of Tagger began in May 2010, when The PCF appointed Keepthinking to design and develop their new Collection Management System. Keepthinking worked with the BBC and Flow Interactive on User Experience Design to establish what needed to be captured, by what types of audience and for what types of audiences. Flow Interactive have conducted extensive user scenario and user engagement research with different audiences, then together we reviewed the findings and proposed a model that included different workflows for each painting.
Qi - Collection Management System
At the end of September a fully working prototype was released and user-tested at the BBC. Most of the feedback was positive, but there were areas that people struggled to understand (in some cases because we were testing just Tagger, without the fully working Your Paintings counterpart).
We reviewed and made changes to the application, user tested it again and then rolled it out to 800 volunteers: six weeks later (at the end of the year), Tagger had collected over 60,000 tags. CSA compared them with tags by expert users (scholars from the Courtauld Institute): it was immediately clear that the general public scored at least as well as, if not better and more accurately in some cases, than the professionals.
In 2011 we made more changes, following extensive feedback from the pilot, which had mostly to do with user interface design (trying to keep everything above the fold, compact, yet clear and usable). May 2011 saw more and final tests in terms of infrastructure (Tagger is now on Amazon EC2, in a structure that can be automatically scaled up or down based on traffic) and finally launched as planned on the 23rd June, together with the BBC Your Paintings.
Who has done what
The whole process has been the result of one year of intense collaboration between everyone involved. What follows here is hopefully roughly correct, but I’m hoping I’m not leaving anyone out or have taken anything wrong. These credits don’t include Your Paintings, as the BBC website is outside the scope of this article – and I would surely get it wrong!
- Designed and developed a new edition of Qi, which is the Keepthinking’s own Collection and Content Management System, to hold and manage all painting-related information, including tagger results
- Designed and developed the API that serves data to the BBC Your Paintings
- Designed (user experience, wireframes and graphic design) and developed Tagger, including all necessary components, such as:
- interfaces to Google Search APIs to query DBpedia
- interfaces to CSA to retrieve painting data and send back tags
- Designed and developed the supervisor interface to deal with uncertain tags
- The Citizen Science Alliance (University of Oxford)
- Adapted and provided the software that decides how to serve paintings and analyses tags
- Created the APIs to communicate with the Tagger front end
- Organised the Amazon hosting infrastructure
- Intelligent Heritage
- Intelligent Heritage managed the process, making sure all parties delivered what they were supposed to.
- University of Glasgow
- University of Glasgow (Art Department) has provided all the art historical consultancy throughout the project.
- Flow Interactive
- Flow have helped with user research and user engegement on both the Tagger as well as the Your Paintings sides of the project
- Martin Bazley
- Martin has conducted the first series of user tests at the BBC
- The BBC + The PCF
- The BBC and the PCF, and their unique and fortunate partnership to deliver Your Paintings to the nation, have been the main driving forces and inspiration for the entire project.
We hope you will enjoy tagging. If you have any questions or would like to make any suggestion, please do not hesitate to contact us.
© Cristiano Bianchi, Keepthinking, 26 June 2011