A Lesson on Taxonomy from Heather Hedden


Tue 08 January 2013 By Jimena Ballina

At the end of November, Jane and I attended the Gilbane Conference in Boston where we joined a myriad of content management professionals - one of them being Heather Hedden, an information management guru specializing in taxonomies, indexing and search; author of The Accidental Taxonomist; and Simmons Graduate School of Library and Information Science teacher of taxonomy creation courses. We were lucky enough to speak to Heather over lunch and secure a future interview. Now, we're passing along the wealth of information Heather provided us. And to think, it's just a sneak peek of what it takes to be a taxonomist!

What does a taxonomist do? Is this a relatively new type of profession that was born in the Information Age, or is it more of an evolution of expertise?

A taxonomist designs, creates, edits, and/or maintains taxonomies. Taxonomies vary, and people called “taxonomists” may work with different kinds of structures with different names: taxonomies, controlled vocabularies, thesauri, metadata models, etc. “Taxonomy” can refer to a specific type of structure of terms in a simple hierarchy, but “taxonomy” is also the general name for all kinds of term sets used for organizing or indexing and retrieving content. These differences may result in different tasks to the taxonomist. For example, depending on the type of taxonomy the taxonomist may or may not need to create synonyms for each individual term.

Taxonomy consultants tend to focus on taxonomy design and review work with taxonomy creation work done to only a limited extent. They often engage in requirements gathering tasks. Independent contractor taxonomists tend to do work in taxonomy building, after the design is in place, and taxonomy editing. Staff taxonomists do more work in taxonomy maintenance along with projects of taxonomy review, revision, and occasionally the merging or mapping of more than one taxonomy together.

The taxonomist role has existed before the digital information age in a few niche specialties of different names; it has evolved, and the need has grown immensely in the digital information age. Those who developed library cataloguing schemes, and developed and edited vocabularies such as Library of Congress Subject Headings did similar work to taxonomists today. There have also been information professionals who developed specialized information thesauri (in print form) for the manual indexing of periodical articles at reference publishers, such as the H.W. Wilson Company, or for internal corporate publications at the largest corporations. Now every mid-size company or organization has so much digital content that they need taxonomies for their intranets and enterprise content management systems. Retailers need product taxonomies for their ecommerce offerings. Government agencies and NGOs make lots of documents available to the public on their websites. Print publishers are digitizing their content. Multimedia collections are growing everywhere. Search engines, while providing satisfactory results on the Web, cannot get the precise results desired within an enterprise without the additional support of a taxonomy.

I always think it’s important to be able to explain to your mother (or your kids) what you do and why it matters. How would you explain what you do to your mother and kids?

This is definitely a field that I would explain differently to someone of the older generation than to someone of the younger generation. To my mother and others of the older generation who have very little exposure to the online world and the Internet, I would start by reminding them of back-of-the-book indexes and library catalogs and say I create things like those. But, rather than for the information within a single book or for books within a library, I created index terms for all the varied kinds of documents companies or organizations have, which nowadays are electronic files, and people look up the terms on the computer to retrieve the documents they need.

While the younger generation is very familiar with the Web and search engines, they have less familiarity with indexes and subject headings. So, to explain to my kids what I do, I simply show them a website with browsable hierarchical taxonomies (such as an ecommerce site) to explain what I do. I have also shown them the controlled vocabularies used in periodical databases on our public library website, which I hope they will use when they need to write research papers for school.

You have published a cleverly named book, The Accidental Taxonomist. What inspired you to write this book? Where did you get the title?

The core content of the book started as a 5-week online workshop, “Taxonomies and Controlled Vocabularies,” that I began offering in spring 2008 through the continuing education program of Simmons College Graduate School of Library and Information Science (and I continue to offer today). One of my former students asked if I would offer an advanced course in taxonomies. After pondering the idea, I realized that the market for an advanced course on taxonomies would be too small to justify the ongoing commitment, but a book could contain both basic and advanced topics. My course alone did not have enough material for a book, but I could supplement the basic information with advanced topics, background information, and other related issues.

Information Today Inc. was my publisher of choice, because I had already written a book published by Information Today for the professional association, the American Society of Indexing. As I was filling out the book proposal form, I had to answer a question about how my book would be differentiated from others in the field. I was going to supplement my content on how to create taxonomies with a chapter or two on the profession of taxonomists, so I though this would be an additional way to position the book. Perhaps “taxonomist” instead of “taxonomies” could be in the title. I then recalled a case study I had recently read about an enterprise taxonomy, in which the corporate librarians (whose primary job is to conduct research) were asked to create the taxonomy, and it occurred to me that many people who get into taxonomies do so by accident. Information Today actually has a series of titles The Accidental...[information specialist profession], so The Accidental Taxonomist made perfect sense.

In a recent blog post, you stated: “Content management and content management systems focus on processes, and that it’s a good way to look at taxonomies, too.” Can you explain what you mean by that?

Content management systems help manage content over its entire life cycle, which involves various processes of planning, authoring, editing, tagging, possibly translating, publishing, and archiving content. Compared to the content they help organize or index, taxonomies are relatively stable, but they are not static. The management of taxonomies also needs to follow a life cycle. Taxonomies are planned and designed, developed and edited, possibly translated, published or implemented, used in tagging, then used in browsing and searching, and finally reviewed and analyzed for further revision. Governance is also an important for both content management and taxonomy management, and governance plans consider not only policies but also procedures, such as how frequently and by whom should the taxonomy be reviewed and what the procedures are for approving changes.

How does taxonomy apply to Digital Asset Management? Is it a different beast, or just a variation of the content management animal?

Even thought the content of content management is digitized these days, the designation “digital assets” refers specifically to non-text content, such as image, audio, or video files. Because digital assets are not text, search technologies, based on keyword matches and text analytics won’t work for retrieving digital asset files, making manual tagging a necessity and taxonomies (as in a controlled vocabulary of terms for tagging) critical. Each digital asset has various metadata associated with it, such as: medium or format type, creator, location, subject-person, subject-topic, copyright status/owner, purpose or audience, and perhaps a related product or service line. Each of these metadata fields should have its own controlled vocabulary of possible terms that can be assigned, rather than allowing the assignment of any uncontrolled and inconsistent words and phrases, which could result in different spellings of the same name, or different synonyms for the same topic.

While digital assets may be considered a special type of content, digital asset management systems are not simply a type of content management systems. There are more kinds of metadata fields for digital assets, such as photographic specifications, and there also need to be features for digital rights management. While there is a lot of overlap, the vendors of such systems make a point of differentiating their products.

What are some of the differences between external facing taxonomies and internal facing taxonomies? How does your approach to building the two different types differ?

External facing taxonomies include those for ecommerce, library periodical article databases, subscription services, and public web sites of government agencies, organizations, and large companies. Internal-facing taxonomies are those used by employees of a company or organization to access internal content. As a generalization, internal users are more familiar with the content they are accessing than external users, as they access more often. They also are familiar with the meanings of a greater number of more specific taxonomy terms. Therefore, a taxonomy for internal users could potentially be larger and more specific or granular than for external users. Internal users can also deal with a more complex taxonomy, because they can be required to take training to use it, whereas external users will unlikely make use of any support materials or tutorials.

Keep in mind that the same taxonomy may be used for both internal and external content, or an external taxonomy may be a subset of a larger internal taxonomy.

In addition to differences in the characteristics of the taxonomy, there are also differences in the tasks involved in creating a taxonomy for external versus internal users. For an external-facing taxonomy, taxonomists can look at comparative websites of similar organizations, which are publicly available. For an internal-facing taxonomy, taxonomists can easily access the actual users, by interviewing employees.

As a consultant, you must see many different types of organizations, because taxonomy is applicable across just about every industry in one way or another. Tell us about one of your favorite projects, and what you liked about it.

One of my favorite recent projects has been to develop subject discipline taxonomies for a textbook publisher which is in the process of digitizing its textbooks and creating content units the size of sections within a chapter of a book. I like this project, because I can get a little deeper into a subject area and build a large taxonomy, not just design the high level. I like learning about new subject areas, not just learning about how different companies operate. Working with educational content is also a particular interest of mine. Finally, the taxonomy will serve both external and internal users.

It seems as though a taxonomist has to adapt often, particularly when working in an unfamiliar industry. Let’s say I threw you in a large, international video game company and you needed to manage the internal taxonomy for the enterprise and there was no current system in place. What would you have to do to tackle this project successfully if you initially had no knowledge of the video game industry?

No matter the industry, there are some similarities among most enterprise taxonomies. Typically such a taxonomy is divided among facets for department, document type, product category, and perhaps line of business or market segment. For an international company, geographic area would also be a facet. Each department would have its own set of topical terms, and I would interview stakeholders in each department to find out what the information organization and retrieval needs are. I would obtain ideas for department topical terms, such as from product development, sales, marketing, legal and licensing, finance, human resources, and IT. A separate faceted taxonomy would be needed for just the products, which would include attributes for genre category, game type category, product features, audience/rating, etc. In addition to speaking with product managers, I can also do some research on the websites of competitors in the industry.

If science phased out the necessity for sleep and you could have a second profession after your day as a taxonomist, what would you do and why?

I enjoy taxonomy work, and as a consultant/independent contractor, I can also take on more projects if I had more time. Each project is unique and has intellectual challenges, and I learn about something new. I have also done freelance back-of-the-book indexing, and I would probably take on more of that, if I had more time. Creating indexes is similar to creating taxonomies, but has its own unique analytical challenges.

Enhanced by Zemanta

Category: Industry Insight
Tagged: Insight, Nuxeo Community
Check out the features of our latest Nuxeo Platform Download Nuxeo