In recent times, large corporations and SMEs have not only found themselves faced with a slew of data in all manner of formats. They have also had to navigate a wide range of data storage options – on-prem, private clouds, public clouds and even hybrid clouds. Not to mention juggling edge and IoT devices, whilst getting to grips with the different ways to use data – for example, internally through ETL processes, data warehousing or mass data processing, as well as externally within the context of IoT or business intelligence.

When it comes to IT systems, this heterogeneity can lead to data silos, impact data quality, affect data migration, necessitate the introduction of various integration tools, limit information access and ultimately drive data integration costs skyward. Conventional data management is also being stretched to breaking point trying to accommodate all manner of requirements. In today’s world of business, it should be possible to query vast quantities of data either periodically or event-based, make this information available across companies in real time, no matter the location, whilst also enabling business-relevant analyses. On top of which, automation advances in data integration and data management are putting an additional strain on traditional strategies.

What exactly is a data fabric?

This is where the data fabric comes into play. A data fabric is an IT architecture and design concept that challenges the notion of vertical data management in favour of a close-knit, horizontal data layer between digital endpoints. A data fabric is not an application or a software solution, but moreover a strategy to realise decentralised data storage, processing and management. It enables data orchestration in diverse, distributed, and complex environments and spans entire enterprises like a veil, true to the term ‘fabric’.

Data that is connected and structured in this way boasts many advantages. When comparing suppliers, for example, businesses can go beyond basic price comparisons and draw on different criteria such as reliability, quality and compliance. This process and product data can be viewed, connected and pooled with production data. The information can then be evaluated and passed back to manufacturing. When it comes to perishable goods, for example, analyses may show that strict compliance with delivery deadlines, in fact, has an impact on the quality of the intermediate product, on the speed with which it can be produced and ultimately on the achievable product price.

Data to aid collaboration in distributed environments.

Those only working with conventional data integration strategies, won’t be transferring machine data to purchasing, for example. Although the data is collated, only one area of the company can access it. There is no option for a comprehensive overview of the information. With a data fabric, this data is available on-demand for analysis, manufacturing and purchasing, for example, meaning processes can be adjusted based on the facts.

The purpose of a data fabric is to optimise data availability, unlock its collaborative benefits within distributed environments, minimise friction losses, identify correlations and cut costs thanks to data-based insights. Data management is streamlined and crossovers between cloud-based and local storage devices are no longer disruptive. Another advantage is that existing data services can simply be transferred to future structures, rather than having to be replaced – sometimes at considerable expense.

Implementing a data fabric creates a data management ecosystem with high data quality, reusable data services, machine-readable data and APIs that enable data integration and orchestration internally within a company and externally with its partners. Users no longer have to guess where the data is, how to get to it, and what impact changing the data might have on others.

To-do list for implementing a data fabric.

When considered from this angle, a data fabric is a logical extension of smart data integration and accelerates corporate digital transformation. The following considerations and processes should be examined when implementing a data fabric architecture:

  • Formulate the specific question to be answered by data integration.
  • Collect and analyse relevant data depending on which information, data sets and taxonomies are best suited to solve the question.
  • Cleanse the collected data, for example by removing invalid or outdated records, eliminating unstructured or conflicting data, adjusting data fields, and more.
  • Create a data model that is meaningful to both humans and machines: analyse the different data schemata, reusing or creating ontologies, application profiles, etc.
  • Integrate data using ETL/ELT processes that can quickly load both structured and unstructured data.
  • Harmonise data by matching descriptions of the same entity in records with overlapping scopes, processing their attributes and merging the information where necessary.
  • Enrich data through reasoning and analytics by extracting new entities and relationships to generate new information.
  • Maximise data usability through knowledge discovery tools such as SPARQL queries, GraphQL interfaces, data visualisation, and more.
  • Maintain information and continuously develop data structures.

MQTT as a starting point for scalable data fabric processes.

One way to rapidly implement the data fabric concept within a corporate setting is, for example, to use MQTT (Message Queuing Telemetry Transport). MQTT is an open, standardised network protocol. It runs in the cloud and is designed to be used for connections with remote locations involving devices with resource constraints. For this purpose, MQTT relies on the so-called publish-subscribe principle: a publisher (e.g. a temperature sensor) sends information to subscribers via an intermediate MQTT broker. Information is organised in a hierarchy of topics, so when the broker receives a control message with data, it distributes the incoming information to all subscribers of that topic (e.g. laptops or mobile devices). MQTT is highly scalable and able to connect millions of endpoints, whilst still delivering high-quality, trustworthy information. An MQTT broker can therefore make all types of data such as texts, images and even binary files, like videos, available to all connected IT systems for further processing. Another way to rapidly implement a data fabric is to use ready-made – i.e. no-code – connectors that can be used with almost all data sources. In this scenario, a data fabric acts as both a data source and a data consumer.

In this era of big data, however, there is no standard for optimising knowledge management. Every company will value their data in different ways, depending on varying needs and objectives. So, a custom approach is often needed when building a data fabric. Nevertheless, one principle still applies to all data fabric structures: data should be FAIR, i.e. findable, accessible, interoperable and reusable.

Back to top button