Dr. Tom's Taxonomy Guide
|
||||||||
Purpose of DocumentThe purpose of this document is to define taxonomies and their uses. A selection of useful taxonomies is provided.Document Information
Contents1. Description of a TaxonomyA taxonomy is a knowledge map of a topic, typcially realized as a controlled vocabulary of terms and or phrases. A taxonomy is an orderly classification of information according to presumed natural relationships. Denise A. D. Bedford, Ph.D. of the world bank enumerates four types of taxonomies: flat, hierarchical, faceted and network (Taxonomies for Information & Knowledge Management Architectures, URL: http://www.sla.org/chapter/cdc/presentations/20030204_taxonomies.ppt). Hierarchical is the most common form. A vocabulary is the simplest form of taxonomy. It has only one level, and comprises a list of allowable terms or phrases. The terms may have identifiers such as numbers and/or letters. The IMS meta-data field general.difficulty has a vocabulary of four levels that are normally indicated by the numeric values 0.4.The most typical form of a taxonomy is a hierarchy. At the top level, general terms or descriptive phrases are used. Each of the general terms has beneath it a set of terms that provide more refinement of the top-level term. Each of these second level terms may have a set of refining terms beneath it. Frequently each term has an alphanumeric identification. As an example, the top-level terms from the Library of Congress Classification Outline are:
The category "B -- PHILOSOPHY. PSYCHOLOGY. RELIGION" has an extensive set of sub-categories. Among them is F, Psychology. Psychology is further divided, and includes 180-198.7, Experimental psychology. These levels can be shown in outline form: class="indent" B -- PHILOSOPHY. PSYCHOLOGY. RELIGION F, Psychology 180, Experimental psychology Those familiar with the "Sniffyv1p1.xml" meta-data record example will recognize this classification. Note that each level includes both an index, e.g., B, and a term or phrase, e.g., PHILOSOPHY. PSYCHOLOGY. RELIGION. Not all taxonomies contain both. class="indent" 2. Uses of TaxonomiesA taxonomy provides a controlled vocabulary for populating fields or taxonpaths. A taxonomy may be used for two major reasons: 1) to limit the choices of field values to a controlled set; and 2) to use terms that are defined by a known source. As noted above, the simplest taxonomy is a controlled vocabulary. For example, the IMS Meta-Data general.structure field (1.8) has a restricted vocabulary (i.e., Collection, Mixed, Linear, Hierarchical, Networked, Branched, Parceled, Atomic) from which the single field value can be drawn. The source of this vocabulary is the IEEE LTSC LOM; IMS maintains a mirror of that vocabulary. The Taxonomy and Vocabulary Guide section of the IMS Meta-Data Best Practices Guide (mdbestv1p1.html) provides a description of the use of taxonomies, which this document serves to supplement.A more complete example of the use of taxonomies is in the IMS Meta-Data Classification category. For review, the organization of the Classification category (9) is: classification purpose taxonpath description keywords "purpose" refers to the purpose of the classification, not the purpose of the resource. For example, a purpose of "discipline" means that this instance of classification describes the discipline, or subject area, of the resource. A cataloger might select the Library of Congress Classification (LCC) to describe the discipline (or subject-the name was selected to reduce confusion internationally) of the resource. The LCC is a taxonomy, thus, it can be used to populate a taxonpath. The structure of an IMS meta-data taxonpath is: taxonpath source taxon id entry taxon id entry taxon id entry The source specifies the controlling authority for the taxonomy used. In this example, LCC. The multiple taxons comprise an ordered list. The number of taxons in the list can range from 1 upward; the specification states that at least 16 should be supported (the minimum maxima). Each taxon in the list contains one value. The sub-terms refine the descriptions of the parent term. Each taxon has an id and entry. The id is an alphanumeric reference. Each taxon node in a taxonomy has a descriptive term, which is contained in the entry. Beneath each taxon node is a selection of sub-terms from which a value can be selected. Each sub-term is also a taxon node. Within the IMS Meta-Data taxonpath, only one value (taxon) can be selected at each level, thus the list of taxons is a specific pathway down through the source taxonomy. Each classification instance may contain multiple taxonpaths. For example, a cataloger may choose to either use several discipline taxonomies to describe an resource, or the cataloger may choose to describe the resource's discipline through several taxonpaths within the same taxonomy if the resources covers more than one discipline or sub-discipline. Continuing with our Sniffy example, a classification taxonpath describing Sniffy could appear as follows: taxonpath source: LCC taxon id: B entry: PHILOSOPHY. PSYCHOLOGY. RELIGION taxon id: F entry: Psychology taxon id: 180 entry: Experimental psychology The concatenated id is: BF 180. A resource may be classified several ways by repeating the classification category with several purposes. For example, a resource may have a classification with a purpose of "discipline" to describe its subject area using the LCC, and may also have a purpose of "Educational Objective" to describe the educational objectives of the resource using the McRel taxonomy (see below). Taxonomies may also be the sources of structured controlled vocabularies for other IMS implementations, such as the educational level, skills and goals within the Learner Profile. 3. A Selection of Useful TaxonomieIMS does not endorse any particular taxonomy or set of taxonomies. Nor do I. A selection of useful taxonomies is provided below. This selection is not exhaustive. Use of taxonomies from this selection is not required. Organizations may choose to create their own taxonomies. LCSH: Library of Congress Subject Headings (Introduction: http://www.tlcdelivers.com/tlc/crs/shed0014.htm)
There are no online versions of the LCSH. It is a set of 5 volumes.
"The Library of Congress subject headings system was originally designed as a controlled vocabulary for representing the subject and form of the books and serials in the Library of Congress collection, with the purpose of providing subject access points to the bibliographic records contained in the Library of Congress catalogs." "As an increasing number of other libraries have adopted the Library of Congress subject headings system, it has become a tool for subject indexing of library catalogs in general. "In recent years, it has also been used as a tool in a number of online bibliographic databases outside of the Library of Congress." LCC: LIBRARY OF CONGRESS CLASSIFICATION OUTLINE
(http://pharos.alexandria.ucsb.edu/demos/lcc.html,
http://lcweb.loc.gov/catdir/cpso/lcco/lcco.html,
http://geography.about.com/science/geography/library/congress/bllc.htm)
The LCC is a subject taxonomy maintained by the US Library of Congress. It is a good subject taxonomy, but is US-centric. Its widespread use makes it an attractive choice. GEM: Gateway to Educational Materials (http://www.geminfo.org/Workbench/Workbench_vocabularies.html)
Subject is a mandatory element for any GEM resource. GEM also offers a controlled vocabularies (the equivalent of a taxonomy) for this element; two levels of GEM controlled vocabularies are offered, the first approximating a top-level discipline taxonomy. The second level provides more detailed descriptions and is not, technically, a taxonomy. GEM also permits the use of other controlled vocabularies, such as ERIC and NICEM. These are optional, in contrast to GEM's. The level-one GEM controlled vocabulary is mainly oriented to K12, although they do recognize the following "grade levels" (another element) of educational materials: K12, Adult/continuing education, Higher education, Preschool education, Vocational education. YAHOO (http://www.yahoo.com/Education/By_Subject/)
Under education this portal site uses just a few top-level educational subjects. Each has a rich decomposition, although not necessarily in terms of sub-disciplines. McRel (http://www.mcrel.org/standards-benchmarks/)
McRel provides databases (and services) to help users access information about educational materials (primarily K12). They organize their content knowledge primarily in terms of standards (educational objectives; over 250) and benchmarks (specific grade-indexed skills; almost 4000). At the top-level, however, they also use subject taxonomy (14 terms); these are, in effect, mandatory elements. CIP: Classifications of Instructional Programs
(DOL/CVU) (http://nces.ed.gov/npec/papers/cipPreface.html)
The US Department of Labor, Employment and Training Administration uses CIP (Classifications of Instructional Programs) codes in their ALMIS (America's Labor Market Information System) database. The CIP codes are also used by NCES. In addition, it has been adopted by the CVU (see http://www.california.edu/catalogs_prog_cat.asp). This is not the same classification used in another DoL database, American's Job Bank. The subject list in the table reflects the first level (2 digits of 6) of the CIP codes. Many databases use the sub-discipline categorizations as well as the first-level terms. Career Resource Library, America's Job Bank,
US Department of Labor (
http://www.acinet.org/acinet/resource/occup/occup.htm)
America's Career InfoNet site displays occupational information with a
two level occupational taxonomy. The resource library taxonomy contains
online career information arranged under broad subject categories.
Taxonomy of Educational Technology (
http://www.lis.uiuc.edu/~chip/pubs/taxonomy/index.html)
A Taxonomy of Media for Inquiry, Communication, Construction, and Expression from the College of Education University of Illinois at Urbana-Champaign. 2000 Mathematics Subject Classification (http://www.ams.org/msc/)
"The Mathematics Subject Classification (MSC) is used to categorize items covered by the two reviewing databases, Mathematical Reviews (MR) and Zentralblatt MATH (Zbl). The MSC is broken down into over 5,000 two-, three-, and five-digit classifications, each corresponding to a discipline of mathematics (e.g., 11 = Number theory; 11B = Sequences and sets; 11B05 = Density, gaps, topology). "The current classification system, 2000 Mathematics Subject Classification (MSC2000), is a revision of the 1991 Mathematics Subject Classification, which is the classification that has been used by MR and Zbl since the beginning of 1991. MSC2000 is the result of a collaborative effort by the editors of MR and Zbl to update the classification." American Mathematics Metadata Task Force (http://mathmetadata.org/ammtf/taxonomies/)
Proposed subject classifications for school and college mathematics. Medical Subject Headings "The Medical Subject Headings comprise National Library of Medicines's
controlled vocabulary used for indexing articles, for cataloging books
and other holdings, and for searching MeSH-indexed databases, including
MEDLINE. MeSH terminology provides a consistent way to retrieve information
that may use different terminology for the same concepts." This is a large
taxonomy (21MB). Many of the terms in this guide are defined in the Glossary. Author: |
||||||||
http://www.tomwason.com |