In ancient days the pieces of information mainly were material, i.e. not computer-indexable. For example, books were such a kind of material.
Common approach from the information science point of view is to assign each of the books with a set of keywords: When you want to retrieve one of them later, you go ahead, choose some of the keywords, and lookup them in a catalogue which itself refers to the books
theirselves.
To be able to handle this all, you need at least three kinds of storage:
- a storage for the books, e.g. a kind of library, organized in a way to stay able to at least locate the shelf a particular book is placed in
- a storage for the catalogue
- and, most important: a storage for the keywords.
So, what result originates from that?
Assumed you associated two books X and Y very similar in content with two synonymous but different keywords, A and B. One day you want to know something about a topic that is covered by both books X and Y, but you don't know about that. You directly go to the catalogue. You pick up a search keyword that accidently fell into your mind. Say B.
You look up the appropriate catalogue card and find Y. -- That there is a closely related X book you don't even get aware of. So you fetch the Y book, but the X book remains in shelf. Possibly it would have been valuable to find X as well.
Therefore keywords get stored theirselves too.
The main goal of a keyword storage is similar to the other pieces of information storages: To stay able to find the wanted contents, i.e. the keywords -- and to find exactly the keywords wanted. "Keywords wanted" are those that might be applied to one or more books.The appropriate tool for keyword or, more precisely, term (as in "search term") storage is a terminology. It mentions every word that was applied to at least one piece of information -- e.g. book -- of the pieces of information storage -- e.g. library. (In a converse, to keep administrative work load small, there's the suggest to choose only keywords already listed in the terminology, to associate books with.)
In a simple case, a terminology might be an alphabetical list of terms. Even better a taxonomy is: For each of the terms it offers an orientation help: Usually there are broader and narrower variants of terms: a mammal is treated to be broader than a dog or cat or cow or horse or something else which is a mammal. (In fact, taxonomies refer to items but list the labels of the items. Taxonomies are closely related to ontologies.)
So, a taxonomy offers is a relationships to identify the location of a given term in the whole taxonomy. Less common than is a are has a relationships, like the ones applied between car and something like wheels, motor, front window, doors etc. Both of these relationships are called hierarchical relationships.
My diploma thesis was about the thesaurus kind of taxonomy, so I currently I am not sure if this applies for the classification kind as well: There is at least one more kind of relationship -- the associative one. It relates terms at each other that don't belong to a hierarchical order, but somehow have something to do with each other, like bird and bird cage. (Admittedly, they might be related using a has a relationship, but I never came across such an assignment.)
Synonyms in taxonomies get treated a special way
Synonyms in taxonomies get treated a special way: They get collected to a single class. Each of the items of that class is (/treated to be, compare to administrators' cheats above) synonymous with every other item of the class.In classifications there are just classes related to each other, being representative for the terms belonging to the class. In thesauri there are no such classes. During thesaurus creation sets of synonymous terms get identified. One "most significant/common" term gets chosen to be the term representative for all the other synonyms. This "most significant/common" one is called "descriptor", while the others become non-descriptors.
Why all the fuss about the synonym details?
If you look up a synonym you get redirected to the class or descriptor the synonym belongs to. None of the books/other pieces of information ever gets keyworded by a synonym. That solves the problem of searching the X and Y books mentioned above one day by the A and another day by the B search term: Either A pointsto B or B to A or both to a third term, say C. And both the books are associated by the "most significant/common" term, e.g. C. So, either if you chose A or B as your search term, you always find all the relevant pieces of information/books, e.g. X and Y.
So far the common part of terminologies.
But there is a usability problem in the widespread is a approach.<<
Glossary
- to retrieve
- to find again
- class
- a set of synonymous words
- synonym
- a word meaning the same as another word
- sometimes there is a difference between them both, but the one applying them doesn't notice
- terminology administrators sometimes think it is not worth the effort to keep "very similar" terms distinct, so they merge them into a single class by claiming the very words were synonyms
- catalogue card
- associations between keywords are stored on catalogue cards which, as a whole, make up the catalogue itself
Updates: 20070624: Tagged the posting. Updated the posting style (layout) to my current style, such as using blockquotes when appropriate, more precise word picks, better grammar.
No comments:
Post a Comment