Questions and detailed answers from foryouandyourcustomers regarding the information model

Questions and detailed answers from foryouandyourcustomers regarding the information model

You can use information modelling as a method to describe the information needs of an environment (company, enterprise, organisation, etc.) in a way that is both technology- and process-neutral. It is a necessary element of any digitisation effort. In this article, we’ll answer your fundamental questions on the overview, basis, procedure and entity in the practical use of information modelling. We will also answer frequently asked questions in detail.

What is an information model? An information model is a description of all of the relevant entities in an environment (a mini world) together with their attributes and relationships to each other. It is based on E. F. Codd’s relational model. The description can be presented as a table or in graphical form. A graphical representation is more suitable for communication and understanding. It serves as an actual information road map. Presenting the description in tabular form is necessary if the information in the model is going to be used to create a database in addition to being used for learning purposes.

How do you start information modelling? First, identify a few important entities in the mini world you are considering. Do this by:

  • asking questions
  • studying websites (online shops) of the company
  • studying documents (sales, marketing, product descriptions, lists, catalogues, business objectives, etc.)

Draw the entities as boxes, and try to find connections between the entities. In principle, every entity can be linked to every other entity at least once. Chart out all of the relevant connections. Try to think of other entities that could link up to the ones you already have.

How do you figure out the terms and concepts for an unfamiliar environment? The most important thing is to understand the entities (and therefore the terms used for them). You can define an entity by looking at:

  • its attributes list (its properties)
  • its key attributes (which identifies an instance of the entity in reality)
  • some examples – actual instances are best (second-best are pictures, and third-best are verbal descriptions of examples)

ATTENTION: The name of the entity alone is not enough. Names can have different meanings in different environments.

The method is described in the book Information modelling: a method for improving understanding and accuracy in your collaboration by Stefan Berner, which was originally published in German in 2016 by vdf Verlag.

Modelling workshop

How do you reconcile conflicts (disagreement among participants)? If someone does not agree on a definition, the entities should be described in as much detail as possible (examples, attributes, keys, relationships) until the differences no longer exist or have been reconciled. If you can’t agree on a good, fitting name, write down all of the suggestions. Then table it for discussion later when you have more facts about the entity and perhaps a new name comes to mind.

How does the information model relate to conceptual, logical and physical data models? The information model corresponds to the conceptual data model (if this model contains the purely technical description of the data). It’s also known as a semantic data model. A logical data model is the mapping of the data to a model for storage. If relational storage is intended, the information model just needs to have the definition of data types for the individual domains of the attributes. A good modelling tool can automatically generate many aspects of the logical model (surrogate keys, constraints, intersection tables, etc.)The physical data model adds to the logical model with storage information, access paths (indices) and other technical aspects.

How do you limit the scope of modelling? Only the information requirements for the tasks (processes) and results for the mini world is described. If the mini world represents a single application or department, only the information used there is described. Relationships to entities from surrounding systems are recorded. The entities of the surrounding systems are only described in terms of how they affect or function for the mini world being considered.

How do you limit the depth (detail) of modelling? For the overview description for reaching a shared understanding of the mini world, it is enough to have entities, their relationships and a few important (descriptive) attributes. If the information model is used for a precise data description, all attributes, all domains and all keys must be defined. In terms of content, the depth of the analysis is determined by the relevance to the mini world.
Example: For a company that provides company cars, it is enough to note the brand, registration plate and a few other use-oriented attributes when describing the entity {car}. For a car manufacturer, all of the car’s components, materials, certificates, etc. have to be described for the entity {car}.

The relational model

The relational model is a model that links the predicate logic and set theory and adds special rules (normal forms). It is based on a clean mathematical definition and allows for statements formulated in natural language to be formalised for the real world. It was developed in 1972 by E. F. Codd and is the most commonly used basis for managing data.

What is a relation? In the relational model, a relation is a set of all true statements (attribute values) across all instances (tuples) of an entity. Relations are usually represented in tables, where a table row corresponds to a tuple (i.e. all attribute values relevant to this entity). All tuples of a relation have the same structure. A database consists of a set of relations.

What is a tuple? A tuple is a set of all attribute values of an instance of an entity. The volume of all tuples of an entity make up all of the information we have about this entity. Tuples of an entity are mapped in tables, with each row representing a tuple. Two tuples of an entity must have at least one differing value. This means that there cannot be multiple tuples with identical values for one relation (in contrast to tables in Excel, for example, where duplicate rows can exist).

What are normal forms? Codd’s relational model defines three normal forms, which ensure the consistency and minimality of a model.

What is first normal form? First normal form (1NF) states that an attribute can only have one statement about a thing. On the one hand, 1NF rules out repetitions ((list of) first names); on the other, it prohibits coding of multiple attribute values in one data field (e.g. hierarchically organised article numbers).

What is second normal form? Second normal form (2NF) requires each attribute value to be functionally dependent on the entire key. This means that all key attribute values have to be known in order to identify a value.

Example: For the entity {invoice item} (with the key {invoice ID} and {item no.}), the invoice issue date would violate 2NF because this date is only functionally dependent on the invoice ID.

What is third normal form? Third normal form (3NF) requires each attribute value to be exclusively functionally dependent on the entire key of the entity. Dependencies between two non-key attributes of an entity are ruled out by 3NF.

Example: The entity {invoice item} contains the attributes {article no.} and {article name}. Since the article name is functionally dependent on the article number, this would be eliminated by 3NF.

Entity

An entity is the structural description of a concrete or abstract thing in reality, information about which we want to/should manage in our observed system. In this context, ‘information’ means that we are interested in the value of at least one property (attribute) of the thing. An entity is synonymous with an entity type. The entity describes what attributes of a real thing we are interested in, what relationships the thing has to other things and which attributes for one instance of the thing in reality differ from every other instance.

What is a good entity name? A good name is a name that the reader associates with a ‘thing’ (a real object with attributes), such as ‘person’, ‘contract’ or ‘vehicle’. The best names are those which anyone hearing or reading them would immediately associate with the meaning intended for them. Poor choices of name would be names that

  • mean something different in common speech than intended in the mini world or
  • have multiple semantic uses (homonyms).

If a good name cannot be found in common language, a made-up name (usually a combined name) is better than a name that could be misunderstood. Made-up names must be learned and take some getting used to but ultimately lead to fewer incorrect implicit interpretations and therefore prevent misunderstandings.

Is it advisable to continue using an existing, frequently used and established name? If after analysis and coming to a shared understanding of the entity it is determined that the existing name is suitable, it should definitely continue to be used. If the name is misleading, meaningless or confusing (e.g. it has multiple meanings or is a term that is not used in the way it is commonly understood by the general public), it should definitely be changed.

What is a subtype of an entity? A subtype of an entity is an actual entity that can be understood as a specialisation of an abstract entity. It differs from other subtypes of the same entity in at least one attribute (or relationship). A subtype cannot exist on its own; that would make it identical to a super-entity.

Examples: Employees, contract staff, pensioners and owners can all be subtypes of the entity {person} in the context of human resources.

When should entities be grouped (generalised) and when should they be separated (specialised)? As a rule of thumb, two entities that have at least one different attribute (or relationship) are different entities. If one or more entities have lots of the same attributes or relationships, grouping (generalisation, abstraction) becomes necessary. It is a good idea to continue analysing entities separately in case of doubt. It is much easier to combine two similar entities later than to extract two different entities from one super-entity.

What is a super-type of an entity?A super-type is an entity that represents an abstraction of multiple other entities. Abstraction in this context means that the super-entity has all of the attributes and relationships that all of the subtypes (at least two) of the super-entity share.

Example: The entity {vehicle} has the attributes {range}, {weight} and {number of wheels} as well as the relationship to owners, just like all subtypes ({motorbike}, {car}, {bus}, {bicycle}).

How are entity roles expressed? Entity roles can be expressed as one-to-one (1:1) relationships. Example: A person can be both an employee and a customer of a company. This means that a person can hold no roles, one role or two roles. To represent this, you draw three entities with two one-to-one relationships. A person is an employee and a person is a customer. If the relationships are optional, a person can adopt these roles. If the roles are exclusive and each instance of an entity can only ever hold exactly one role, roles can be represented as sub-entities.

What is the difference between an entity or entity type and an entity instance? An entity or entity type is the structural description of the information needs of a real thing. Example: A car has n doors, n wheels, a weight, a maximum speed, a brand, an owner, etc. An entity instance is a collection of the values of all attributes of an entity for a specific example of that entity. Example: This car has five doors, four wheels, weighs 750 kg, has a maximum speed of 200 km/h, is a VW and belongs to me.

Do you have to differentiate between an entity’s type and its physical implementation? If two things have different keys (structurally, not coming from their values), they are different entities. A car type (i.e. a model, such as an Audi 3xT, a BMW 350, a VW Beetle) is identified by its model designation. Its attributes are length, width, weight, number of doors, manufacturer, etc. It is an abstract thing – namely, the description of a type of car. A car is a physical thing sitting in the driveway in front of the house. It is identified, for example, by a serial number affixed to or stamped on it somewhere. It ‘is the model’ Audi 3xT, ‘belongs to the person’ Steven, ‘is painted in the colour’ green and has attributes such as degree of dirtiness, rust level, fuel level, etc. A car is therefore another entity. There are many cars that are the same model (i.e. they are the same type of car). But the entity {car} exists if any attribute of a specific instance is of interest (such as the fuel level, serial number, current location of the car). And, naturally, the entity {car} is connected to the car type.

Attribute

An attribute is a property of a thing (entity) in the real world. It represents exactly one value from the set of values (domain) defined for this attribute.

What is the difference between an attribute type (attribute) and an attribute instance (value)? The attribute or attribute type is the description of the structure and meaning of a property (name, domain, relationship to entity, explanation, any extended consistency rules). An attribute value is a specific value from the value list of the domain for this attribute.

What is a good attribute name? A good attribute name is a name that the reader associates with a set of values of an attribute of a real object. Good attribute names have the same structure as good domain names. With attribute names, a role can be assigned to the domain (e.g. the domain {weight} can be assigned the attribute {empty weight}). Examples: Article number, vehicle brand, empty weight.

How do you figure out the meaning of an attribute? You determine the meaning of an attribute from 

  • the meaning of its name in common language (if applicably selected)
  • the description of the set of values permissible for this attribute
  • the relationship to its entity
  • several examples of typical values and limit values
  • counterexamples (i.e. things that do not describe this attribute), if any

How do you determine the right entity for an attribute? An attribute belongs to the entity on which it is functionally dependent. This means that (a) there is a meaningful connection between the entity and the attribute and (b) the attribute value for an entity instance (tuple) is uniquely defined by the values of the key attributes of the entity instance.
Example: (a) a person was born on {date of birth}
(b) the person with social security number 123456789 was born on 01.05.1977.

Can the same attribute belong to multiple entities? No. Every attribute makes a statement about one entity. If an attribute, such as the date of birth of a person, belongs to more than one entity (e.g. {person} and {employee}), this violates second or third normal form and requires more coordination or leads to inconsistencies. The fact that an attribute such as {name} belongs to many entities does not violate this rule; instead, it represents an inaccuracy in the nomenclature. A person name and a product name and a planet name are all names for a thing and all consist of a sequence of letters. But the values come from different domains. (A person called ‘ashtray’ or a planet called ‘Smith’ would be rather odd.) Even if the same name is used in multiple entities, they are still different attributes (they still have different domains).

Is there such a thing as group attributes or attribute groups? An attribute group is a collection of a number of attributes under a new name. Example: An address consists of a street, building number, postcode and town or city.

Domain

A domain is the description of a set of values that an attribute can have. A domain defines permissible values that are abstracted by a name. Examples: Item number, integer values from 0 to 1,000.

What is a good domain name? A good domain name is a name that the reader associates with a set of values for an attribute of a real object. Domain names are often also used as attribute names. They have the same function as attribute names in terms of communication.

What is the difference between a domain and a data type? A domain describes the permissible set of values for an attribute value. The focus here is on the values and their meaning and not on how they are represented.
Data types describe how the values of a domain are technically represented or managed. In order to implement an information model in a database, data types must be defined for all domains. Example: BOOLEAN is the domain for the values of {true} and {false}. The data type here could be, for example, an integer, and the domain values are represented by the data values 0 (false) and 1 (true).

Key

The key of an entity is an attribute or group of attributes with a value or values that are unique in a list of all instances of the same entity. This value or value combination therefore identifies the specific instance of this entity.

What is a primary key? The primary key is a concept of data modelling, which defines which key in a table is used as the foreign key for relationships. Primary keys are not necessary in information modelling. Each (functional) key is important and its uniqueness should be ensured. For implementation in a database, it is a good idea to have a single-digit, system-generated artificial key without any technical semantics. This artificial key is managed as a foreign key in other tables.

Relationship/association

A relationship (or association) links (associates) two entities semantically. This means that it indicates what the two entities have to do with each other, what relationship they have with each other. The relationship (or relation) is represented by a line that can be dashed (optional connection) or solid (mandatory connection). Each line ends at an entity either with a 1 (which means an instance on the other side has a relationship with max. one instance on this side) or with a crow’s foot (many, which means an instance on the other side can have a relationship with any number of instances on this side). The semantics of the relationship must be noted with a verb in one direction (or even better in both directions!)

What is a functional dependency? A functional dependency between two attributes exists when the value of one attribute can be clearly inferred from the value of the other attribute (the key). In the information modelling environment, this (purely structural) clear relationship must be meaningful for the mini world. It is meaningful when there is a meaningful relationship between the entity identified by the key and the attribute. Example: The vehicle of the model VW Golf weighs 744 kg (the empty weight).

Why do relationships have to be expressed through verbs? The verbs used to express the relationships between two entities (the {car} belongs to {owner}) are the actual semantics of a connection. The verb states what one entity has to do with the other. These verbs are the knowledge contained in the model. They are absolutely necessary in order to understand the mini world being observed. A connection between two entities without a verb is meaningless. Without the verb, it only indicates that there is a relationship between the entities but does not clarify what that relationship means and leaves the interpretation of that relationship to the reader.

What is a one-to-one relationship? A one-to-one (1:1) relationship between two entities signifies the identical nature of the entities. The two entities have the same key (or the key of one simultaneously identifies an instance of the other) and are therefore the same thing. One-to-one relationships should be avoided. Exceptions:

  • Roles can be documented as one-to-one relationships with an entity ({person} is {employee}, {person} is {customer}).
  • You may wish to specify two entities in order to clarify differences (e.g. of an organisational or process-oriented nature). Example: Production article – sales article

Why are we not allowed to use nouns to express relationships between entities? A noun in a relationship indicates a possibly transitive relationship. This means that an entity is linked to one entity through another entity.
Example: A customer advisor reaches one or more customers by email. If no meaningful verb is found for a direct relationship between two entities, there is no relationship.

What are good relationship phrases (verbs)? Good relationship phrases are verbs that express a fact as precisely and narrowly as possible. Since the information via a system always expresses a static condition, all of the relationships should be expressed as status verbs (or in a grammatical form that expresses a status or condition) – for example, ‘consists of’, ‘identifies’, ‘is based on’.

What are poor relationship phrases (verbs)? The worst are those which do not contain verbs. Then come all verbs that express the status of the connection (‘connected to’, ‘belongs to’, ‘related to’, etc.) Other verbs to avoid are those which are too general or could have lots of meanings (such as ‘is’, ‘has’, etc.)

When do many-to-many relationships have to be broken up into intersection tables? The relational model does not recognise many-to-many (n:m) relationships. Only functional relationships (1:m or 1:1) can be implemented. When transferring the information model to a logical or physical model, all many-to-many relationships have to be broken up into so-called intersection tables. At the information model level, it is often just enough to know that two entities have a many-to-many relationship. Breaking up that relationship into intersections is a working step that can be done automatically. Good modelling tools will do this for you. If you are interested in attributes in the information model which belong to the intersection table, these must be created. They replace the many-to-many relationship.

Can a relationship connect more than two entities? No. 
The relational model only recognises connections between two entities. Some model languages allow for multiple relationships. These can easily be broken up into relationships between two entities.

You can find the syntax and an example of the information model here.

Do you have any further questions? Feel free to ask them. Get in touch, and share your experiences with me. You can reach me at [email protected].