Pre-creating categories for information deposit and retrieval is a bad idea and using metadata is a much better one.Pre-creating categories is no different from mindreading and fortune-telling. It's an attempt to make sense of the information we would create in future and preempting where information should be stored and retrieved. Pre-creating categories stifles what comes under it and potentially forces a mismatch with what goes within it. So if we need to create categories to put some order to the mess, it should eventuate from the commonality of documents described by the contributor or anyone who uses it. Moreover, we can do much more to manipulate information if we know how to describe them appropriately.
We are talking about metadata design here. In essence, a good set of metadata includes a unique identifier and other metadata that satisfies information capture and retrieval. Unique identifiers are for instance URLs which can be referred by links. Capturing and finding information can be achieved with tags. Clay Shirky put up a convincing argument on how links and tags are better ideas than pre-created categories, but also suggests some hard to meet conditions when ontology seemed possible.
So why aren't people seeing the benefits of metadata? Here are some possible reasons:
- People haven't got use to the idea of using metadata for information capture and retrieval;
- People are entrenched with the "benefits" brought about by categories in relation to controlled vocabularies;
- Lack of exposure to what metadata can do;
- Prevalence of the folder concept packed with operating systems from about 2 decades ago - maintaining status quo.
Some considerations and uses of metadata
Metadata should help us manipulate information. This differentiates what qualifies as metadata and what doesn't.
Global or local
Use global metadata if we want to achieve information governance for an organisation. Such applications include support for the corporate taxonomy and information disposal. Corporate taxonomy - didn't we say that pre-creating categories is not a good idea? To a large extend - Yes. And I'll talk more about it below.
Local metadata should support finding information in context. For example, if we are deciding for a place to eat at a lifestyle website, then metadata like cuisine, location, type of eatery, price range and ratings are important. However, information such as reviews, signature dishes and food pictures are less used to manipulate information and they should be regarded as content and not used as metadata. Also be aware of metadata that could potentially be global, especially when you see them appearing consistently in local metadata.
Corporate taxonomy
In the same article, Clay Shirky has mentioned that Ontology is possible given these hard-to-meet characteristics:
- Small corpus;
- Formal categories;
- Stable entities;
- Restricted entities;
- Clear edges
- Expert catalogers
- Authoritative source of judgement
- Coordinated users
- Expert users
And once we've made a corporate taxonomy, it's time to look at a tool that can make metadata and taxonomy work together. The US Air Force made use of Sharepoint to integrate with Concept Searching to achieve automatic metadata tagging, taxonomy management and auto-classification. This is a great approach especially if we are concern over the possibility of anyone tagging. However, as with any auto tagging/classification systems, the danger of this best guess approach is accuracy in semantics. Fortunately, the ability to narrow the search results with tags means that we should be able to filter away irrelevant results pretty easily.
Information disposal
All information are subject to disposal. The ways to handle disposition are normally tied to the need and duration to keep the information. These parameters varies from organisation to organisation but they may be consistent for government agencies within a country / state. There are the rest who may like to follow established standards such as ISO 15489. Information disposal metadata may for example include:
- Draft documents - should be deleted when the final document is produced.
- Final documents - should be kept for 3 years and then routed to the information custodian for disposition considerations.
- Records - should be kept for 7 years and then archived.
Metadata input should be made as easy as possible. This is especially true for people who have migrated from the local or shared folders environment, where metadata input was not required. We should always scrutinise the number of metadata fields. One approach is to expose the number of metadata fields only when they are required. Expose Basic metadata for draft documents, followed by Sharing metadata for finalised documents, and then exposing more Record Management metadata if that is a controlled document.
Inheriting metadata is another way of minimising the amount of input required. This can tie to taxonomy categories, folders (or other types of containers) and pre-defined metadata.
There are also lots more useful metadata that can be captured automatically by the system. These are stuff like creation date, modification date, author and etc.
Description and tags metadata
When the title fails to show what a piece of information is about, description and tags gives the next level of details. When these three are presented in a listing together with metadata such as the name of contributor and date created/modified, it helps a reader decide the relevant entry to read without the need to open up the document/page.
These two metadata also help overcome the dilemma of having candidate metadata in or out. Forget about "Just in case" metadata. Instead, tag the information or put what you intent in the description. It's better to do this rather than to have people populate them with no apparent value.
Tags are extremely powerful because they tell people exactly what the information is about and probably suggest how to use it. This overcomes the insufficient or skewed semantics offered by categories. Many applications now allow us to subscribe to tags, streaming in relevant information as they are created. Browsing or searching tags bring us to a collection of relevant information, and we can further zoom in by selecting the next tag of interest. Repeating this process gives us more precision as we discover what's left.
Summary
I've started by talking about what's not too good about our natural tendencies to pre-create categories. The discussion then went on to show the importance of using tags and links to overcome the issue of pre-created categories with reference to Clay Shirky's article. However, tags and links (unique identifiers) are just a part of a metadata scheme and they cannot be design in isolation. This leads us to a wider discussion on metadata and their application.
Hope this has been useful and please let me know your comments.










No response to “Metadata design considerations”
Post a Comment
Trackbacks
Leave a trackback