Picking up from where we left off with part one of the top technology trends for the 2020s, here is what will shape the data landscape for the years to come.
2. AI: It’s all about Data and Hardware
The last part of the 2010s has been all about AI, and the 2020s will not be any different. We will see AI widening its reach, and impacting every conceivable field. Having already seen the AI hype rise, however, we must also be prepared for a backlash. And it’s very important to be aware of what “AI” actually means.
In essence, what we call AI today is an umbrella term for various pattern matching techniques. Machine learning and its various subdomains, such as deep learning, essentially boil down to pattern matching. We’ve seen several breakthroughs in the 2010s, but the seeds for most techniques and algorithms have been planted decades ago and remain essentially the same.
Still, we have seen the performance of AI systems in many domains going from being worse than human, to catching up and surpassing humans. How is that possible? The answer is twofold: Data and compute.
The digitization of nearly all aspects of human activity has led to an explosion in the volumes of data being generated. Algorithms now have much more data to work with, and that alone means they can perform much better. In parallel, however, progress was made in domains such as image recognition: adjustments in neural networks, brought about by vibrant communities, have boosted the accuracy of the algorithms. ImageNet is a good example of this.
AI is continuing to make inroads in new domains at a breakneck pace. In 2019 alone, we’ve seen great progress in domains such as natural language processing, games, and common sense reasoning, to name but a few. New achievements in result quality and execution speed have been made almost monthly. The amount of resources dedicated is staggering, and research is progressing faster than ever. So, should we all be preparing for the brave new AI world? Well, maybe not so fast.
The problem with the AI frenzy is the divide between the haves and the have nots is widening. And not just because of the resources and expertise the big players have. It’s a self-reinforcing loop of sorts: Being data-driven, designing and producing data-driven products means these products not only can have an edge, but they also bring in more data as they operate.
As there is an evolutionary link connecting data and AI, more data is used to develop better AI, leading to better products, more data, and so on. An archetypal and widely recognized example of this is Facebook, but it’s not the only one. When the likes of the Economist are calling for a new approach to antitrust rules for the data economy, this should be a cause for concern.
Data, however, is just one part of the AI equation. The other part is hardware. Without the tremendous progress in hardware, the 2010s have seen, AI would not be possible. Access to the compute power needed to process the massive amounts of data needed for machine learning used to be a privilege reserved for the select few.
While the kind of hardware that Big Tech has access to remains beyond comprehension for most, democratization of sorts seems to have transcribed. The combination of cloud, with its on-demand access to processing power, and specialized hardware for AI workloads, has made AI chips accessible to more organizations than ever, assuming they can afford it.
The big innovator, and winner, in the 2010s AI hardware was NVIDIA. The company that most people came to know as a maker of GPUs, specialized hardware typically used by gamers for fast graphics rendering, has reinvented itself as an AI superpower. The architecture of GPUs, it turns out, is very well suited to running AI workloads.
Intel was becoming complacent in its dominance of traditional CPU hardware, and other GPU makers failed to execute, so NVIDIA rose to become the leader in AI hardware. That, however, is not set in stone, and the hardware space is already seeing rapid innovation.
While NVIDIA is dominating AI hardware and has built a software ecosystem around it too, waves of disruption are hitting the AI chip market. Just a few days before the closing of the 2010s, Intel stroke back by acquiring Habana Labs. Habana Labs is one of many startups in the AI chip market, looking to come up with new designs, built from the ground up to accommodate AI workloads.
Even though for many Habana Labs is an unknown, its chips are already used in production by the likes of cloud vendors and autonomous vehicle makers. GraphCore, which became the first AI chip unicorn in late 2018, has recently announced its chips are now used in Microsoft Azure Cloud. Far from over, the AI chip race is only just beginning.
1. The Future is Graph, Knowledge Graph
Up until the beginning of the 2010s, the world was mostly running on relational databases and spreadsheets. To a large extent, it still does. But if the 2010s brought the first traces of dissent in the monoculture of tabular data structures, the 2020s will bring the final nail in the coffin. The NoSQL wave of databases has largely succeeded in getting developers, administrators, CIOs, CTOs, and business people out of their comfort zone, and instilled the “best tool for the job” mindset.
Polyglot persistence, as is the lingo for using data models and data management interchangeably depending on the task at hand, is becoming the new normal. After relational, key-value, document, columnar, and time-series databases, the latest link in this evolutionary proliferation of data structures is graph. Graph databases and knowledge graphs have been making waves and being included in hype cycles for the last couple of years.
While it’s understandable why many people tend to think of graph as a new technology, the truth is this technology is at least 20 years old. And it has been largely initiated by none other than Tim Berners Lee, who is also credited as the inventor of the web, in 2001 with the publication of his Semantic Web manifesto in the Scientific American. Lee also coined the term Giant Global Graph, to describe the next stage in the evolution of the web.
Having been into this technology since the early 2000s, it’s exhilarating to see it getting steam with technical progress, funding, and use cases piling up to a snowball effect. It is also amusing to see graph-washing beginning to commence. In essence, progress in graph is happening along the trajectory of progress in machine learning.
It’s not so much that there was a major breakthrough in the technology that made it feasible, but more about the right conditions that made it boom. Many of the concepts, formats, standards, and technology enabling graph databases and knowledge graphs to flourish today have been developed over more than 20 years. What has brought on the perfect graph storm is a combination of factors.
Like AI, the data explosion has contributed to bringing graph in the fore. Now that Big is no longer a qualifier for Data, because we have mastered the art of storing lots of it, the question really is how to get value out of data. Leveraging connections in data is a prominent way of getting value out of data and graph is the best way of leveraging connections.
This is why graph databases excel in use cases that require finding connections in data, such as anti-fraud or master data management. This is why graph analytics, with algorithms such as centrality or PageRank that are based in accounting for nodes and edges, can offer valuable insights in connected datasets. As the terminology seems to still be in flux for many newcomers in this field, a short history lesson, and grounding in semantics, may be called for.
Graph analytics such as PageRank can be applied to data stored in any back end. Graph databases are back ends designed to accommodate graph data structures, offering specialized query languages, APIs, and oftentimes storage structures. Knowledge graphs, on the other hand, are a specific subclass of graphs, also called semantic graphs, that come with metadata, schema, and global identifier capabilities.
Google has played a key role in the rise of graphs, and knowledge graphs. As the web itself is a prime use case for graphs, PageRank was born. As crawling and categorizing content on the web is a very hard problem to solve without semantics and metadata, Google embraced them, and coined the term Knowledge Graph, in 2012. This, and the widespread adoption of schema.org that came with it, marked the beginning of the meteoric rise of graph technology and knowledge graphs.
Knowledge graphs can address key challenges such as data governance but ultimately, they can serve as the digital substrate to unify the philosophy of knowledge acquisition and organization with the practice of data management in the digital age. The NASAs and the Morgan Stanleys of the world are managing ontologies, and utilizing knowledge graphs.
Graphs and knowledge graphs cross-cut into AI, too. Much of the AI hardware and software for the 2020s utilizes graph data structures. A combination of bottom-up, pattern matching techniques with top-down, knowledge-based approaches is the most promising way for AI to continue to make progress.
As Nathan Benaich, author of the State of AI Report put it, “Domain knowledge can effectively help a deep learning system bootstrap its knowledge, by encoding primitives instead of forcing the model to learn these from scratch.” Knowledge graphs are the best technology we have for encoding domain knowledge, and the world’s most comprehensive knowledge base — the web — already functions as such.
Knowledge Graph is a technology that enables other technologies to accelerate their growth, and it also enables humans to take stock of their own knowledge. This is why the future is Knowledge Graph.
To infinity and beyond
Looking back, it becomes clear how far we have come in the relatively short span of the last decade. Counter-intuitive as this may seem, however, we are not certain this is a good thing. Somewhere along the way, technological progress left human ability to monitor, comprehend and digest technology in the dust. In the dawn of this new decade, we seem to be engrossed in the never-ending race for more: More data, more processing power, more technology.
The belief that more equals better seems to be firmly ingrained in most of us. And the signs of what’s coming seem to tell the story of not just more, but immeasurably more. Quantum computing is progressing in leaps, promising to unlock compute power beyond our wildest imagination. DNA storage seems set to do the same for storage. More data and compute than we would know what to do with. To infinity and beyond. But what for, and for whom?
Is this technology making us happier, and bringing us closer, or is it alienating and distressing us? Where are all the huge productivity gains going? Who is in control, who gets to call the shots, and why?
AI, for example, is already being used to make critical decisions. Who gets to build those systems, on what data, and according to whose ethics and criteria? Should society as a whole have some sort of control over it? How could society even dream of controlling a technology it hardly understands, and in what way? Are we sure more technology is the solution to technologically induced issues? What is a moral compass for the 21st century?
For the time being, these are questions few people are prepared to tackle. But if the 2020s develop on the trajectory they are set on, more and more of us will have to face those questions head-on.