Welcome to My Week in AI! Each week this blog will have the following parts:
- What I have done this week in AI
- An overview of an exciting and emerging piece of AI research
Applying models across domains
This week I have been working on applying seq2seq models to time series forecasting. These models are typically used in NLP applications, but due to the similarity of the two tasks, they can also be used for time series forecasting. They are especially useful for multi-step forecasting.
Leveraging new resources
I have also spent time perusing Deepmind’s educational resources. They recently launched a learning resources page for people with all levels of experience in AI, from beginners to students to researchers. Some of the highlighted resources made available by Deepmind include: their podcast and Youtube channel featuring interviews and talks by AI scientists and engineers, a range of college-level lecture series and courses, and resources from global education initiatives that broaden access to AI research. In addition, they provide access to fascinating blog posts and research papers. I highly recommend browsing through the site!
Identifying a chemical molecule’s target proteins with deep learning
As I mentioned last week, the research I am sharing in this post involves AI in drug discovery. In ‘IVS2vec: A tool of Inverse Virtual Screening based on word2vec and deep learning techniques,’ Zhang et al. present a framework for applying the Inverse Virtual Screening (IVS) technique to chemical molecules using deep learning¹. Research has found that on average, each drug can bind to 6 target proteins — and in the early stages of drug development, it’s very useful to know what these target proteins might be. IVS is a method of identifying these target proteins.
The method presented by the authors combines Mol2vec (based on Word2vec) and a Dense Neural Network and is called IVS2vec. They described Mol2vec, which proceeds in the following manner: a chemical compound is translated into SMILES structure so that it is in the form of strings, meaning the molecule is viewed similarly to a sentence and is split into substructures or ‘words’, and then Word2vec is applied to finally encode the molecule as a 300 dimension vector.
1. Machine Learning Concepts Every Data Scientist Should Know
2. AI for CFD: byteLAKE’s approach (part3)
3. AI Fail: To Popularize and Scale Chatbots, We Need Better Data
4. Top 5 Jupyter Widgets to boost your productivity!
The researchers built up a dataset using Mol2vec by extracting information from the PDBbind database, encoding the molecules as described, and then matching the encoded molecules with their corresponding target proteins. This dataset was then used to train a classifier using a Dense Fully Connected Neural Network based on DenseNet². DenseNet was developed to solve the problem wherein some of the layers in many-layer networks are disused, by allowing each layer to pass its extracted information to ensuing layers. This means that each layer in the network has an aggregation of all the information that was previously extracted.
This framework worked very well in performing IVS on a holdout set from the PDBbind database, achieving a classification accuracy of over 91%. IVS2vec has the potential to speed up clinical trials and help researchers understand the effects and side effects of new drugs more easily. It also has applications in repurposing existing drugs for new uses, which is a much faster and less expensive process than de novo drug development.