This may come as a shock if you’ve first encountered knowledge graphs in Gartner’s hype cycles and trends, or in the extensive coverage they are getting lately. But here it is: Knowledge graph technology is about 20 years old. This, however, does not mean it’s stagnating — on the contrary.
Gartner predicted that the application of graph processing and graph databases will grow at 100% annually through 2022 to continuously accelerate data preparation and enable more complex and adaptive data science. Graph database vendors seem to verify this across the board: 2019 was a very good year. Having identified knowledge graphs as a key technology for the 2020s, we take a look at how they are evolving.
The 20-year old hype
First, let’s quickly recap those 20 years of history. What we call Knowledge Graphs today has been largely initiated by none other than Tim Berners-Lee in 2001. Berners-Lee, who is also credited as the inventor of the web, published his Semantic Web manifesto in the Scientific American in 2001. The core concepts for Knowledge Graphs have been laid there.
The Semantic Web manifesto was in many ways ahead of its time. Looking back today, we can see some parts of it going strong, while others have faded. Building on a foundation of standards for interoperability, such as Unicode, URIs, and RDF, the core of the vision has always been semantics: instilling meaning in web content.
The Semantic Web got a bad name for being academic, while some technical choices such as XML did not quite work out. The thing is, however, that crawling and categorizing content on the web is a very hard problem to solve without semantics and metadata. This is why Google adopted the technology in 2010, by acquiring MetaWeb.
In 2012, the term Knowledge Graph was introduced. A very successful rebranding indeed, and that’s not all we have Google to thank for. Google employs key people in the domain and is the driving force behind schema.org. Schema.org is the core of Google’s knowledge graph. It is, unsurprisingly, a schema.
Knowledge graphs and schemas are foundationally bound. While not all knowledge graphs are as big as Google’s, every one of them is based on a schema. Knowledge graph neophytes do not always realize this, but whether it’s implicit or explicit, there’s always a schema. Which brings us to the point.
Knowledge graphs and graph databases
Knowledge graphs can be stored in any back end, from files to relational databases or document stores. But since they are, well, graphs, it does make sense to store them in a graph database. This greatly facilitates storage and retrieval, as graph databases offer specialized structures, APIs, and query languages tailored for graphs.
In addition, many graph databases today offer a lot more than just a store for data. They come packaged with algorithms for graph analytics, visualization capabilities, machine learning features, and development environments. They have essentially grown from databases to platforms. But there is further nuance here.
Graph databases come in two main flavors, depending on which graph model they support: Property graph and RDF. In general, RDF graph databases emphasize semantics and interoperability, while property graph databases emphasize ease of use and performance.
When it comes to knowledge graphs, RDF graph databases are a natural match. It’s not impossible to build knowledge graphs on top of property graph databases. Usually, however, this results in having to learn knowledge management fundamentals the hard way, and re-implement relevant features. While lessons don’t come for free, building on platforms centered around knowledge management helps.
Property graphs and RDF graphs are not that different conceptually. Having interoperability between them would be both possible and desirable. This is why in March 2019 a W3C workshop on web standardization for graph data took place, as the first step towards standardization in the graph database world.
A key element to bridge the gap is something called RDF* (RDF star). RDF* is a proposal to standardize a modeling construct for RDF graphs, namely the addition of properties to edges. Although this is possible in RDF, there is no standard way of doing it. Standardizing it would not only help interoperability with property graphs but also interoperability among RDF graphs.
From secret handshakes to RDF stars
As Steve Sarsfield, VP of Product in Cambridge Semantics put it, before RDF*, if people wanted to use edge properties in RDF graphs, they had to rely on secret handshakes. This is not ideal, especially considering one of the key advantages of the RDF stack is standardization and interoperability.
In the wake of the W3C initiative, a couple of RDF graph database vendors went ahead and implemented RDF*. Cambridge Semantics is one of them. Its AnzoGraph database supports RDF*, as well as SPARQL*. SPARQL is the standard query language for RDF, and SPARQL* is its extension that works with RDF*.
Cambridge Semantics recently unveiled AnzoGraph DB Version 2, and when discussing the release with Sarsfield, we wondered what their experience from the field has been. Are people asking for RDF*, has it helped adoption? Bridging the gap with property graphs has enabled AnzoGraph to get an implementation of Cypher, the most popular language for querying property graphs, underway.
Sarsfield noted that it’s still relatively early days for knowledge graph adoption. As such, many of the organizations that use AnzoGraph tend to have highly skilled people on board. For them, switching between data models and query languages is not much of an issue. For mainstream adoption, however, this is important.
Stardog is another RDF graph database vendor that has implemented RDF*. Mike Grove, Stardog co-founder and VP Engineering, said this has been in the works for a while, and they are very excited about it. Stardog started working on the plumbing as part of the Stardog 7 development effort, and they were very happy to be able to ship the feature.
Regarding its reception, Grove noted that what people wanted was a more user-friendly way to have edge properties: “Neo4j obviously got this right. RDF* does a fantastic job of bringing the same ease of use to semantic graphs.” He went on to add that customers are excited, and many are already working on integrating it into their applications.
Technically, RDF* and SPARQL* are not yet standardized. Both have been introduced by Olaf Hartig, a researcher at Linköping University. When inquiring about their status, Hartig noted that while there have been delays, he hopes the standardization process will pick up speed soon.
For knowledge graph platforms, too, GraphQL is a plus
Both Sarsfield and Grove noted that they expect RDF* to boost knowledge graph adoption. Implementation is key, and having early adopters and real-world usage may also catalyze the standardization process. Sarsfield and Grove expressed their support for the process, as well as the need to get the word out.
RDF* can make a difference, but it’s not the only thing going on in the knowledge graph world. As knowledge graphs entail several layers and can be a central piece of infrastructure for organizations, graph databases are growing into platforms.
AnzoGraph started as part of the Anzo platform before becoming a product in its own right. Stardog also touts its product as a platform, emphasizing features such as visualization and virtualization built around the graph database core.
Another RDF graph database vendor, Ontotext, recently announced a new version of its own platform. An interesting feature that Stardog’s and Ontotext’s platforms share is support for GraphQL. Unfortunately, GraphQL’s name does not do it justice. As if there was not enough confusion already regarding graph: GraphQL is not a graph query language.
GraphQL is a replacement for REST APIs. Despite the misnomer, it’s very useful, and its popularity among developers is growing. This is why more and more databases are adding support for GraphQL, with names such as MongoDB joining the GraphQL wave. Graph databases are no exception. Stardog has had it since 2017, Ontotext is in the process of adding it.
As Stardog put it, more developers know and are learning GraphQL than all the graph query languages combined. Ontotext on its part put together a rather elaborate post on the use of GraphQL in its platform. Whichever way you approach it, however, GraphQL makes lots of sense for accessing services built around database platforms.
GraphQL plus variants
Stardog reports GraphQL success within its customer base. Grove mentioned that one of the big Silicon Valley tech companies exclusively uses GraphQL to interact with Stardog. Both Grove and Jem Rayfield, Ontotext’s Chief Architect, agree that GraphQL can work well in some cases, but by its very design, the expressiveness of GraphQL is quite limited.
Most people who don’t know GraphQL assume it’s a graph database query language. Most people who know GraphQL wonder how a graph database can be powered by it. This statement comes from Manish Jain, the CEO and founder of Dgraph. Dgraph is a graph database powered by GraphQL — or something like it.
GraphQL+ is a derivative of GraphQL, developed and used exclusively by DGraph until today. In a 2019 interview with ZDNet, Jain expressed no interest in standardization for GraphQL+. No other vendor we know of has expressed interest in adopting GraphQL+ either. But that’s not all there is to GraphQL for graph databases.
Most approaches are about what GraphQL can do for knowledge graphs. But to close the loop with the Semantic Web underpinning of knowledge graphs, here’s an idea: What if GraphQL resources were annotated with URIs?
URIs are global identifiers, which can denote concepts from shared vocabularies, such as schema.org or other ontologies. This seems like a natural fit, and one that both Grove and Rayfield agree has potential. There is another working group set up to align RDF and GraphQL, although it does not look like it’s moving very fast.
Knowledge graphs in the 2020s: We speak your language
It seems we are moving towards a new status quo. If NoSQL stands for Not Only SQL, we could call this NoSPARQL — Not Only SPARQL. SPARQL remains the language of choice for taking full advantage of knowledge graph capabilities. It also doubles as an API, its expressiveness is beyond what GraphQL can attain, and SPARQL’s federated query and data integration capabilities are unique.
But vendors seem set to meet users where they are, be it GraphQL or any other language. Even SQL. As Stardog’s Grove put it: “We’ve always strived to bring our technology to the users. GraphQL was a step in that plan. Supporting SQL is the next step in that journey, not because SQL is better than GraphQL, but because of what that support enables.”
SQL enables existing tooling to work on top of graph databases, making them accessible to a wider audience. Stardog is not the first graph database platform to have added an SQL connectivity layer. Cambridge Semantics also offers a connectivity layer for Tableau. More graph databases support SQL, and there is an ongoing standardization effort to add graph extensions to SQL itself.
Eventually, even natural language support could be an option. “No matter how you feel about SQL, SPARQL, GraphQL, or any other query syntax/language, natural language is just better. Why ask someone to learn an esoteric syntax when they can just simply type?” said Grove.
Grove mentioned Stardog will be launching a natural language interface to the knowledge graph. A pipedream? This may not be too far off. There is ongoing research for natural language interfaces for databases. And, to add to this, there are also existing integrations for accessing databases via voice assistants. So, you can see where this is going.
We don’t know whether conversational knowledge graphs are something everyone would be comfortable with. What we do know is that more options is a good thing, and exciting times are ahead. Stay tuned as we keep exploring the years of the graph.