Blog Post

Knowledge graphs: Why do it and how to start small

Paul Appleby • 15 July 2020

Graphifi has a goal to make building knowledge graphs easy. But what does that actually mean and how do you get started?

Knowledge graphs are rich, flexible, sources of connected information storing both structured and unstructured data. They hold not only the data but information about the meaning of that data. They describe things and, as importantly, the relationships between those things. Flexibility is provided by the fact that the graph is not fixed in shape or form by any particular schema - it can adapt over time to evolving needs.

There are, of course, well known graphs belonging to Google, Apple, Microsoft, LinkedIn, Facebook, etc, that power things like Google search and Siri, but why bother and how do you get started building a knowledge graph?

Knowledge graphs can help in many areas. One example is by easing data integration issues and being able to answer complex questions about a business that no one system can answer by itself. Many recent use cases revolve around providing a source of data for machine learning applications. Others relate to providing a 360 degree view of customers.

Unfortunately many organisations do not have the resources nor the skills to get started building a knowledge graph. Many developers are not familiar with semantic technologies, ontologies or working with graph databases. In many respects though, the building blocks to getting started are similar for many use cases - for example, modelling, defining APIs and creating and managing what many would call ‘reference’ data.

Graphifi’s applications are designed to help get past the initial problems of getting up and running and allow organisations to concentrate on using their information assets to create value for the business by being able to focus on easily creating, managing and then making use of the information with technologies that developers are happy to work with, rather than working on building the basic plumbing of a knowledge graph system.

We are very much in favour of the idea of starting small, proving out ideas, providing business value quickly and then iterating and evolving to more sophisticated solutions.

One aspect of information management that is common to most businesses, large or small, is the need to categorise, classify and describe parts of the business. Controlled lists of terms and taxonomies are one means to do this (e.g. IPTC media codes as shown below can be used to categorise news content). It is not uncommon for this information to be held in a spreadsheet. What is the impact of that? It makes it difficult to share. It also makes it difficult for machines to process. And it makes it very hard to work in more than one language.

Can we do better? In short, yes. By holding this information in an application such as Graphifi’s Graphologi the information can be managed, shared and reused far more easily. Data in Graphologi is in a form called RDF - a standardised form, ideal for machines to understand, for working in multiple languages and for integrating data from diverse sources. And for managing taxonomies Graphologi uses another standard called SKOS , where things in taxonomies are thought of as ‘concepts’ rather than terms.

By then exposing and delivering your taxonomies through an API, such as that which Graphifi’s EasyGraph can build for you, potentially makes your data available to all parts of your organisation very quickly, allowing diverse parts of the business to start to use that data with very little cost or effort.

This approach gives other advantages. Changes to the taxonomies can quickly be deployed avoiding what can historically be very long times for updates to propagate through to systems. Furthermore, RDF is based around the idea of unique identifiers (IRIs), where everything has its own identifier. That means that API users can easily identify concepts (terms) within the taxonomy without any ambiguity and independently of language. And each concept can have its label described in multiple languages allowing systems to display the correct language for end users.

Additionally, by correctly constructing and organising your taxonomies (possibly using collections, another standard SKOS structure provided by Graphologi) and then exposing them through an API makes it easy for applications to access only those parts of the data that is relevant to their particular use case. For example, one business unit may only be interested in financial terminology, whilst another may only be interested in manufacturing aspects, but as a whole all of the information may be interrelated and managed accordingly.

So, in short, starting to build a knowledge graph can be as simple as managing and exposing taxonomies through an API.

Of course, this is just the starting point. Integrating other data sources with your knowledge graph is a logical next step. For example, in an educational publisher that may mean relating learning material (learning objects) with learning objectives and rights management. However, if those systems have adopted using your taxonomies, possibly via APIs, but also potentially through other integration methods, then you have a head start because there is already a common set of reference information upon which to connect. In our educational publisher example this might include using taxonomies such as learning age (to describe the age content is designed for, where that content sits in a content management system), educational authorities (to describe who has defined a curriculum) and countries (for rights territories as managed in a rights management system).

To help enable taking your knowledge graph to this next level, Graphologi supports two further standards - SKOS-XL, as an extension to SKOS and OWL, for ontologies. SKOS-XL allows for richer descriptions of the labels for concepts in a taxonomy (e.g. a particular label may apply to a particular business unit which has its own special term), whilst an OWL ontology can define a shared understanding that can be used to create even richer relationships between the things in your graph (e.g. in a taxonomy of superheroes we could say that a particular superhero, from a superheroes taxonomy, ‘lives in’ a particular city, which may come from a taxonomy of locations and that also that superhero may have an ‘arch-enemy’ of a particular villain, from a villains taxonomy).

These more advanced features are there when you are ready for them but the fatal mistake is to try to go too far before any business value is seen. Start small and build.

If you would like to know more about anything mentioned in this post please get in touch, we will be very happy to have a chat.

You can try the trials of both Graphologi and EasyGraph - they are both freely available via our website.

Share by: