Exploring the Promising Future of Graph Databases

The emphasis on the technology industry has shifted in the last few decades. In the past, the focus was on the efficient storage of data through technology and databases, which were achieved using relational databases. However, today, the main concern is how to extract the most value from data. It is now widely accepted that data becomes more valuable when connected, which is why graph data is so alluring.

In 1946, a man named John von Neumann invented the computer. The first computers were primarily used for merging program instruction memory and data memory together in order to perform calculations. Data management at this time was very simple, involving the processing of data using machines that classified, compared, and drew tables from millions of punched cards. As a result, the applications for computers at this time were quite limited. Despite former IBM Chairman Thomas J. Watson’s claim that the world only needed five computers, most computer usage at this time was still focused on performing calculations.

In the 1960s, as simple methods of file and instruction storage could no longer meet the demands of businesses, the concept of databases was introduced. Understanding the entire history of database development is crucial in explaining why graph databases are widely utilized in the industry today.

1960s-1980s: Hierarchical Data

The emergence of database systems coincided with the increasing use of computers for data management and a growing demand for data sharing. Traditional file systems were unable to meet these demands, leading to the development of Database Management Systems (DBMS) that could unify and share data.

This era of database technology is commonly referred to as a “hierarchical structure” or a network structure, and the main idea was to organize data in a tree-like structure. In other words, data was stored as interconnected records in this era of database technology.

During this era, the primary steps in database operations involved:

Using primary keys
Scanning all records in order
Navigating links from one record to another

The main innovation of this period was in the use of keys and scanning, while link navigation technology was deemed too difficult and slow.

To address performance issues, the most innovative solution was the introduction of the B-tree or self-balancing tree data structure. The B-tree provided alternate access paths across linked records and helped to speed up record retrieval. As a result, the use of these technologies led to the emergence of relational databases.

1980s-2000s: Entity-relationship Data

The idea of separating data from its retrieval system ignited a new wave of innovation.

At this time, an important figure emerged – Edgar Frank Codd, who established what we call entity-relationship during his time at IBM, which is commonly referred to as relational data.

The relational system organizes data into collections. These collections focus on storing and retrieving entities from the real world, such as people, places, and other things. Similar entities (such as people) are grouped together in tables. In these tables, each record is a row. A single record is accessed from the table through its primary key.

In the relational system, entities can be linked together. To create links between entities, more tables need to be created.

If your data needs to be organized and retrieved in tables, then relational technology is still the preferred choice.

However, no matter how important the role of relational technology is, it is not a one-size-fits-all solution. In the late 1990s, the popularity of the internet ushered in the information age. The storage of huge amounts of data and cloud data have all caused great troubles for relational databases.

At this time, many businesses need a clear relationship model, and for data, having a huge amount of data is of little use.

For the expected use of data, although the industry has a detailed storage model, there is no storage model for analyzing or intelligent applications of this data.

This has led to the third and most recent wave of database innovation.

2000s-2020s: NoSQL

From around the year 2000 to 2020, the development of database technology was characterized by the emergence of NoSQL (not only SQL or “non-SQL”).
The goal of this era was to create scalable technology for storing, managing, and querying various forms of data.

The increasingly expanding and rapidly developing applications and platforms have made technical architects rack their brains to expand, expand, and expand again. As a result, several different types of NoSQL databases have emerged, mainly including key-value, wide column, document, stream, and graph databases.

Document, Graph, Key-Value, and Wide-column

The message of the NoSQL era was clear: storing, managing, and querying data in tables is not a universal solution. First of all, the popularity of web-based applications created a need to transfer data between these applications. Therefore, data serialization became inevitable, typically in formats such as XML, JSON, and YAML.

These table conversions have also led to the most basic protocol: exchange protocol. Data on the network is not inherently in a tabular structure. Therefore, relying on relational data to solve it requires many conversion procedures, which is not straightforward.

In addition, these new types of applications have brought in a large amount of data, putting unprecedented pressure on the system’s scalability, which naturally led to the popularity of key-value, document, graph, and other specialized databases.

2020s~? : Graph Data

Based on the above, we see a foundation for the fourth era of database innovation in the history of the industry: the wave of graph thinking.

This innovative era is shifting from the efficiency of storage systems to extracting value from the data they contain.

Historical experience tells us that each era has a delayed effect. For example, relational data was actually used as early as the 1970s, but it did not become mainstream.

The same goes for the graph era. At least in the next 3-5 years, existing database technologies will not disappear and will continue to dominate.

As for when graph databases will become mainstream, it depends on whether the global value system has undergone an effective shift, where value is derived from highly connected data assets rather than just efficiency.

These things take time, although they are inevitable.

Final Thoughts

As we shift from efficient data management to extracting value from data, I believe many company executives, architects, scientists, and business directors will begin to focus on graph data technology.

Because in our industry, in addition to speed and cost, value is also emphasized.

Imagine when we can connect information fragments and construct data, we can extract value, new insights, and new directions from the data. What a wonderful thing!

But all of this requires an understanding and mastery of the complex relationship networks within data, which requires a new way of thinking.

This way of thinking involves shifting from considering data in tables to prioritizing relationships in tables. This is what we call graph thinking.

Therefore, the next stage of data technological innovation is shifting from focusing on efficiency to focusing on discovering value through specific graph technology applications.

This is graph data technology, and the future of graph data is worth looking forward to.