One of the challenges with modern machine learning systems is that they are very heavily dependent on large quantities of data to make them work well. This is especially the case with deep neural nets, where lots of layers means lots of neural connections which requires large amounts of data and training to get to the point where the system can provide results at acceptable levels of accuracy and precision. Indeed, the ultimate implementation of this massive data, massive network vision is the currently much-vaunted Open AI GPT-3, which is so large that it can predict and generate almost any text with surprising magical wizardry.
However, in many ways, GPT-3 is still a big data magic trick. Just jamming lots of data into pre-configured neural networks isn’t machine learning after all, but rather just an exercise in statistics. Indeed, Professor Luis Perez-Breva makes this exact point when he says that what we call machine learning isn’t really learning at all. Has GPT-3 “learned” or is it just regurgitating what it has digested? Indeed, to realize true learning means to adapt learning from one domain to another unrelated domain and this requires more than just stuffing large networks with large amounts of data. To do that, machines need to understand concepts, and this requires ways to deduce and induce logic through knowledge graphs and other applications of common sense. Unfortunately, while work on neural networks has reached a very-well developed and advanced point, we’re still in the relatively early stages with work on knowledge graphs and common sense reasoning.
In a recent AI Today podcast, Chaitan Baru, senior advisor of data science research initiatives at the University of California San Diego (UCSD) shares his insight into how people are taking a deeper look at knowledge graphs to make AI systems more intelligent and adaptable, especially in areas where there isn’t a lot of training data.
Research pushing forward Knowledge Graphs
Mr. Baru served in various roles at the San Diego Supercomputer center and was also a senior advisor for data science at the National Science Foundation (NSF) for four years. Mr. Baru talks of his experience at the intersection of database systems and computing 22 years ago for IBM Silicon Valley before continuing on to UCSD where the opportunity to employ modern and emerging technologies in advanced use case applications presented itself. Mr. Baru was involved in the Federal Big Data Initiative (FBDI) heralded during the Obama administration in 2012 as a starting point from which to further explain his roles and projects at the NSF. Mr. Baru also became involved in developing the Big Data Strategic Plan. In this role, he initiated a partnership between NSF and public cloud providers which they have continued developing to this day.
In order to maintain progress, Chaitan worked on formulating strategies for the NSF. One way in which he achieved this aim was by the use of knowledge graphs to make sense of not only copious amounts of data but also how to implement it effectively using machine learning and AI. Workshops were built up around the possibility of knowledge graphs being used to connect the AI industry, academia, and government with the goal of creating a symbiotic relationship between these sectors for the progress of AI. Each of these sectors contributes their own assets and benefits to AI and also utilizes AI in varying ways. By taking advantage of the powerful research undertaken in academia, new technology developed in industry and the Government’s ability to collect big data – a big data strategy initiative was created across the sectors. Chaitan notes that by creating such a strategy, improved smart services could eventually be provided to citizens at a faster rate.
Knowledge Graphs: Increasingly Necessary, Increasingly Proprietary
Knowledge graphs can be particularly helpful in applications for voice assistants and chatbots. The more that those systems can actually understand what you’re trying to say, the more useful they can be. Current knowledge graphs are helping with a variety of things including speech recognition, facial recognition, and translation. Chaitan states that he felt he had witnessed a “glimpse into the future” by observing real-time data processing, analyzing and action. He shares an anecdote in which a professor at the University of Michigan created a system by which he could track the academic actions of students and send them forewarnings if their academic behaviors were indicative of eventual failure. This system has since been implemented across the whole campus.
Chaitan continues by noting that knowledge graphs underpin most of the services provided by large internet companies such as Google. Workshops were organized on the topic of Open Knowledge Network (OKN) between various representatives from many top-tier corporations, academic institutions, and government positions. However, the problem with these knowledge graphs is that they are proprietary and controlled by the corporations that have invested in their development. This means that instead of creating and sharing large knowledge graphs which can benefit from group development, such as how large neural nets have been developed since the days of ImageNet, knowledge graph development suffers from overlapping, competing work that does not allow organizations to build on top of each other’s work. Instead of sharing knowledge graphs, organizations are protecting them, which is why work on knowledge graphs has suffered from ongoing development.
Chaitan states that industry is interested in the proprietary domain while academia and governments are looking towards open-source use of knowledge graphs. The challenge that these differing needs create has become an issue within the development of suitable knowledge graphs. Chaitan talks of the desire within the industry for a simple solution such as a credential-based system to ensure that everyone is capable of using the knowledge graph.
We’re just at the beginning stages of how knowledge graphs and common sense systems will make ML applications better, stronger, and more trustworthy. Chaitan states that his own interest is in projects that manage the full lifecycle of machine learning or deep learning and creating the infrastructure to properly maintain and even recycle these systems. After the culmination of decades in AI progress, hardware has become significantly cheaper and the use of the internet has increased thus leading to more data collection. Inevitably new business models, initiatives, projects, and applications will be built around AI in the future but Chaitan is worried by potential hindrances that could halt AI’s continual progress such as the ethics of data collection. Additionally the need for data literacy should be included within the education system so that everyone can effectively utilize emerging data systems. AI is an exciting sector with much still left to be researched, explored, and executed and Chaitan believes we’re just at the beginning of what’s possible.
Credit: Google News