Home Industry Verticals Data & Analytics The fallacy of non-native graph databases: Why native is the answer

The fallacy of non-native graph databases: Why native is the answer

8 MIN READ

As the graph database industry grows in popularity, more organizations are recognizing the value of connected data and are investigating new ways to become a connected enterprise. Integrity, performance, efficiency and scalability are key attributes for those organizations that want to build behavioral and decision-making applications based on live, real-time evaluation of connected data.

Creating a reliable database management system is not a simple task. There is a huge selection of design choices, so being aware of your primary needs is essential. Specifically, understand your application’s goals and requirements, as well as the trade-offs the designer of the system has selected. The two primary designs of database management systems are native and non-native.

Native graph databases are built specifically for storing graph data and handling graph workloads across the entire stack. By contrast, non-native databases are optimized for some other storage model (e.g. columns, documents) and either bolt-on a graph API on top of an existing different database or attempt to wedge-in limited graph operations into an existing non-graph query language. The differences are drastic: native technology performs graph queries far more quickly, scales more easily and runs more efficiently with less hardware than non-native databases. For the reasons why, here is an in-depth look at why organizations will find greater value in investing in a native graph database.

Graph storage

Native graph databases are designed to use the file system in a manner that is sympathetic towards graph workloads and is safe for storing graph data. The store formats enable index-free adjacency for rapid graph traversals even on slow storage, and the write strategies ensure consistency even under system failure.

Still, many operations teams may be more familiar with a non-graph backend system and be tempted to use those to try to store a graph. We found to our own detriment early in our development that the non-native approach inevitably ends in performance and scalability issues because of the disconnect between graph data and non-graph storage.

Moreover the only empirically proven way to ensure data safety is to update a graph via ACID transactions, as maintaining relationships between records is much more demanding than weaker-than-ACID consistency models can provide. While non-native graphs built on eventually consistent stores can corrupt data, native graph databases include transactional mechanisms to ensure data safety remains impervious to network blips.

Native storage also allows organizations to embrace evolving hardware architectures. With the emergence of native storage models for novel disk storage platforms and memory architecture like non-volatile RAM, it is imperative that your graph database is optimized for those architectures.

Graph query processing

Native graph querying refers to how a graph database describes, plans, optimizes and executes queries. On a native system, all layers of the architecture are optimized for storing and retrieving graph data. While non-native graph databases may try to avoid mechanical penalties through radical denormalization, the native approach provides consistently high traversal performance at any depth. When non-native graph queries surpass their level of normalization (usually as shallow as depth three), their performance degrades substantially.

In an RDBMS back-end we have the additional problem that reversing the direction of a traversal is extremely difficult (who are my friends versus who’s friends with me). In order to reverse traversal direction, we must either create a reverse-lookup index for each use-case or perform a brute-force search through the original data. This is slow at runtime and complex to maintain.

Speed, data integrity and efficiency

Native graph databases handle connected data queries much more quickly than non-native graph databases. Even on modest hardware, native graph databases can handle millions of traversals per second between nodes and thousands of transactional writes per second in a graph through a single machine.

Native graph databases that support ACID transactions means that once a transaction is complete, its data is consistent and durable. Transactions also occur concurrently, which means that transactions do not interfere with each other. No partially written records will exist in the case of a fault.

Unlike non-native graph models, native graph databases can deliver constant time traversals with index-free adjacency without complex schema design, fancy indexing, or arcane query optimizations. This intuitive property-graph model eliminates the need to create additional, complex application logic to process connections.

Why native versus non-native matters

Today’s datasets are more variably structured, interconnected and interrelated than ever before. Connected data is incredibly valuable, but non-native approaches to database management cause a reduction in that value. Native graph databases are more valuable in the long-term and ultimately save significant money by eliminating massive hardware investments.

Non-native approaches to graphs not only create sub-optimal graph implementations, they also spoil the simplicity and expressiveness of the model and convolute the underlying database’s own native functionality. Conversely native graph technology provides a streamlined and consistent platform for working with graph data. Enterprises hoping to utilize the power of connections within their data will find the key attributes of a native graph database – integrity, performance, efficiency and scaling advantages – are critical for prolonged success.

Jim Webber
Jim is Chief Scientist at Neo4j working on next-generation solutions for massively scaling graph data. Prior to joining Neo4j, Jim was a Professional Services Director with ThoughtWorks where he worked on large-scale computing systems in finance and telecoms. Jim has a Ph.D. in Computing Science from the Newcastle University, UK