Unstructured data is no reason to make a mess of it

Table of Contents

Classification & Search

The foundation to integration

Try to imagine that all systems in a landscape use the same set of reference values or metadata. Systems or platforms like:

  • SharePoint landscape
  • CRM system
  • HR system
  • WordPress site
  • etc.

the same list boxes are offered to enrich files or information objects. Then it’s a piece of cake to connect all content in the entire landscape with a search engine.

Integration is no longer a matter of complex links. It is important that if all systems derive the same meaning from values such as apples and pears and information landscape is upgraded to an organic whole in a fairly simple way.

Metadata, a child can do it!

The importance of metadata for your organization

Auto classification

Jargon

Structured versus unstructured data

Taxonomy

A word that originates from Greek. A combination of (taxa) concepts like order, arrangement along with (nómos) words like use, rules, and law The science of arranging individuals or objects into groups (taxa, or the single term taxon).

The term taxonomy can be used for both the method of arranging concepts as for the hierarchical ordering that is the result of the process. Such a hierarchical structure or ordering and the activity to get to such an ordering is called classification. Almost everything can be organized or structured in a taxonomy: life and living organisms, tools, goods, all kinds of things, books, topography, administrative structures, events, etc.

Taxonomy in technology

In computer science, the need arises for more and more common terminology to be used in systems and databases, including for the purpose of the integration of data from various systems and for the unique exchange of product data, such as e-business systems and knowledge-driven designs. To enable this, use is made of standardized definitions of concepts, where the terms are arranged in a subtype-supertype hierarchy or taxonomy. This structure, among another great advantage that properties of super-types are inherited by subtypes.

In recent years, in the fields of computer science and artificial intelligence, attempts are made to create and maintain taxonomy from a set of concepts. An example is the automatic classification of a group of documents, for example, digital libraries. It is remarkable that in this field, a distinction is made between taxonomy and typology. The difference is mainly in the way in which the classification is established. In a taxonomy, you arrange a group of sample objects by dividing them. The next step is to observe what characteristics a concept has and you place it in a hierarchy by use of overarching features. This process shapes the taxonomy.

In a typology, one starts from the concept. One considers that distinctive characteristics might normally have any objects, and then proceeds to classify the actual objects in accordance with these rules. Example The Dutch cities can be divided into provinces and cities (such as cities in Limburg, Holland, or cities in Noord-Brabant…) according to population. Cities with more than 500,000 inhabitants, cities with a population of 250,000 – 500,000 or other combinations.

Most groups of objects can be classified in different ways. However, some typologies are considered better than others. A typology with empty categories (eg Cities in Limburg with more than 500,000 inhabitants) can be considered a weak typology. On the other hand, too many objects in a category also make for a bad typology.

The terms, typology, classification system, and taxonomy can be considered synonymous. In the domains of psychology, computer science / artificial intelligence, a distinction is made between these terms. The difference is in the way they are made; taxonomy (empirical) or typology (conceptual).

Concepts related in a typology may not be related in a taxonomy. Suppose you define a typology of things to bring as gifts for a visit from a sick colleague, you expect concepts such as apples, pears, flowers, and crossword puzzle magazines.

You are unlikely to find those concepts combined in a taxonomy.

Typology

Folksonomy

Categories
Tags

A folksonomy is a system in which users apply public tags to online items, usually to help them find those items again. This practice is also referred to as collaborative/social tagging, social classification, or social indexing.

Folksonomy (when it was “invented”) was originally “the result of personal free tagging of information for your own use. The boundary between folksonomy and social tagging (tags in an open online environment where the tags of other users are available to others) is becoming blurred. Folksonomy is often used in cooperative and collaborative projects, such as research, content repositories, and social bookmarks.

The term folksonomy is a mixture of the words folk and taxonomy.

If you define taxonomy as a way of managed metadata, folksonomy is the opposite, it is just a container of unordered terms, but if you can infer the usage of each term, you can find meaningful terms for an organization and if you use the folksonomy guarded, promote words to the taxonomies.

Examples:

  • Twitter hashtags
  • Instagram
  • WordPress

In many systems or (social media) platforms, folksonomies can be presented in tag clouds.

In a classical sense, a thesaurus is a kind of reference. A thesaurus is used to find the exact word for an object, a particular technical term or a word with the desired connotation (style considerations).

In modern times it is a tool connecting unique concepts through hierarchical equivalent and associative relationships. The term comes from the Greek and means treasure. It was initially established in linguistics as a logical-systematic (and alphabetical, but not explanatory) dictionary: the concepts of language were categorized and compared with related concepts:

  • Synonyms; words that have a similar meaning. Sometimes people use the term data dictionary as a synonym for thesaurus
  • Hyperonyms; words that describe a broader concept. Lexicon has a broader meaning than a thesaurus.
  • Hyponyms; words that have a narrower meaning. Thesaurus has a narrower meaning than a thesaurus,
  • Antonyms; words with the opposite meaning.

The term “thesaurus” is also used for a reference book with a specialized vocabulary within a particular interest or profession, such as medicine or music. With the help of a thesaurus, the catalog of, for example, a library makes it more accessible than by means of an ultimately arbitrary arrangement.

For categorization and reference, one is not strictly bound by the terms (and language) of a book or other media such as video or sound that does not contain text or metadata.

A thesaurus can even assign several terms per publication or information item.

Thesaurus

Ontology

In computer science and logic, an ontology is the result of an attempt to define a complete and strict conceptual scheme about a particular subject or domain. The word ontology is a term used in philosophy.

An ontology is typically a data structure that describes all relevant entities and their relationships within the rules of the domain. In the field of artificial intelligence, the concept of ontology is used to describe the ‘real world’ in a way that a computer can comprehend. Another way to describe it is knowledge representation.

In a semantic web, a computer must derive the meaning of text or metadata from a model and calculate reasoning, effect or conclusion based on that information.

An ontology is used as a strict and complete model for a particular domain, usually in a hierarchical structure, containing all relevant units and their relations and the rules that these units and relations must comply with.

A term used in data modeling, but difficult to define on its own.

Words that approximate the concept

  • Typical
  • Normally

A standardized way of displaying. According to recognized, accepted rules. It is also an adjective meaning that the subject is in accordance with the canon, the rules (originally ecclesiastical laws). Canonical issues are so credible, and so is a canonical model.

CANONIC USED IN INFORMATION ARCHITECTURE

Information architects often talk about canonical models that split reality into concepts and relationships. A model makes reality visible. A canonical model is a clear conceptual model designed on the basis of a standardized and common approach to something in a particular context (a piece of reality) with the result.

  • Clarity
  • Standardization
  • Common appearance
  • Context

canonical model is unambiguous and therefore explains in one way only. The meanings of the concepts in the model are based on a generally accepted standard. Think of a typical description of a car. A car is a very complex thing, but following the model of “car” is quite universal.

The model reduces the complexity of the car to some important concepts related to each other. A typical car has a body, an engine, a steering wheel, a front axle with two wheels and a rear axle with two wheels. The steering wheel is connected to the front axle and the motor drives one or both axles at the same time. This model is typical of a car. Every car meets this model. Indeed, tricycles are not, so the model is not universal, but within the context of a car manufacturer that only produces four-wheeled vehicles.

A canonical model simplifies communication about things in a particular context (for example, a company). Anyone within that context who is familiar with the model knows what is meant when the concepts are discussed in this model. It does occur, said quite simple misunderstandings. After all, the model is unambiguous.

Canonical Model