Friday, February 26, 2021
  • Setup menu at Appearance » Menus and assign menu to Top Bar Navigation
Advertisement
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News
No Result
View All Result
NikolaNews
No Result
View All Result
Home Neural Networks

Language-Guided Navigation in a 3D Environment | by Louis Bouchard | Aug, 2020

September 1, 2020
in Neural Networks
Language-Guided Navigation in a 3D Environment | by Louis Bouchard | Aug, 2020
586
SHARES
3.3k
VIEWS
Share on FacebookShare on Twitter

1. Machine Learning Concepts Every Data Scientist Should Know

2. AI for CFD: byteLAKE’s approach (part3)

You might also like

How 3D Cuboid Annotation Service is better than free Tool? | by ANOLYTICS

Role of Image Annotation in Applying Machine Learning for Precision Agriculture | by ANOLYTICS

Label a Dataset with a Few Lines of Code | by Eric Landau | Jan, 2021

3. AI Fail: To Popularize and Scale Chatbots, We Need Better Data

4. Top 5 Jupyter Widgets to boost your productivity!

https://arxiv.org/pdf/2004.02857.pdf

As the name says, they developed a language-guided navigation task for 3D environments where the agents follow language navigation directions given by a user in order to realistically move in the environment.

In short, the agent is given first-person vision, which they call Egocentric, and a human-generated instruction, such as this example; “Leave the bedroom and enter the kitchen. Walk forward and take a left at the couch. Stop in front of the window.”
Then, using this input alone, the agent must take a series of simple control actions like “move forward for 0.25 m”, “turn left for 15 degrees”, to navigate to the goal.
Using such simple actions, VLN-CE lifts assumptions of the original VLN task and aims to bring simulated agents closer to reality.
Just to give a comparison, current state-of-the-art approaches move between panoramas and cover 2.25 meters on average including avoiding obstacles for a single action.

Big Data Jobs

They developped two different models in order to achieve such task.
The first one (a) is a simple sequence-to-sequence baseline.
The second one (b) is a more powerful cross-modal attentional model, which we can both see in this picture.

The first model

https://arxiv.org/pdf/2004.02857.pdf

This first model takes a visual representation of the observation, containing depth and RGB features, and instructions for each time step.
Then, using this information and the instructions given by the user, it predicts a series of action to take, denoted as “at” in this image.

https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035

The RGB frames and depths are respectively encoded using two ResNets-50 architectures, one pre-trained on ImageNet and the other one trained to perform point-goal navigation.

https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9

Then, it uses an LSTM to encode the instructions from the user.
LSTM is the short for Long short-term memory, which is a recurrent neural network architecture widely used in natural language processing applications due to its memory capabilities allowing it to use previous words information as well.

The second model

https://arxiv.org/pdf/2004.02857.pdf

These actions, a, are then fed into the second model.
The goal of this second model is to compensate for the lack of visual reasoning in the first model, which is super important for this kind of navigation application.

For example, you need a good spatial visual reasoning in order to understand an instruction such as “to the left of the table.”
Your agent needs to know that it first needs to know where’s the table, and then, go to the left of that table.

[Photo by Romain Vignes on Unsplash]

Which is done using attention.
Attention is basically based on a common intuition that we “attend to” a certain part when processing a large amount of information, like the pixels of an image.
More specifically, it is done using two recurrent networks, as you can see in the image, one tracking observations using the same RGB and depths input as the first model.

While the other network’s role is to make decisions based on the user’s fed instructions and visual features.

https://machinetalk.org/2019/03/29/neural-machine-translation-with-attention-mechanism/

This time, the user’s instructions are encoded using a bidirectional LSTM.
Then, they compute a list of simple instructions which is used to extract both visual and depth features.

Following that, the second recurrent network uses a concatenation of all the features discussed including an action encoding as inputs and predicts a final action.

https://arxiv.org/pdf/2004.02857.pdf

To train such task, they used a total of 4475 trajectories split from the train and validation split. For each of those trajectories, they provided multiple language instructions and an annotated “shortest path ground truth via low-level actions” as seen in this image.

At first, it looks like it needs a lot more details and time to achieve such task. Shown in this picture below, where (a) being the current approaches, using real-time localization of the agent, and (b) being the covered approach with low-level actions.

https://arxiv.org/pdf/2004.02857.pdf

But when we compare it to the traditional panoramic view with perfect location instead of having no position given and using only low-level actions it is clear that it needs way less computation time in order to succeed, just as you can see in the amount of information given for each approaches in the picture above.

This is a comparison on the VLN validation/test datasets between this and the current state-of-the-art approaches.

https://arxiv.org/pdf/2004.02857.pdf

From these quantitative results, we can clearly see that using this cross-modal approach with multiple low-level actions in a continuous environment outperforms the nav-graph navigation approaches in every way. It is hard to visualize such results from a theoretical comparison basis, so here are some impressive examples using this new technique:

https://github.com/jacobkrantz/VLN-CE

Watch the video to see more examples of this new technique:

I invite you to check out the public release version of the code on their GitHub. Of course, this was just an introduction to the paper. Both are linked below for more information.

The paper: https://arxiv.org/pdf/2004.02857.pdf
The project: https://jacobkrantz.github.io/vlnce/?fbclid=IwAR2VO1jwjaq4Uydz2O25ZaLXVFjoD46QirYnW1zNeNAJyNkleA0KS_PDBrE
GitHub with code: https://github.com/jacobkrantz/VLN-CE

Credit: BecomingHuman By: Louis Bouchard

Previous Post

Nextcloud incorporates Kaspersky antivirus security

Next Post

13 Algorithms and 4 Learning Methods of Machine Learning

Related Posts

How 3D Cuboid Annotation Service is better than free Tool? | by ANOLYTICS
Neural Networks

How 3D Cuboid Annotation Service is better than free Tool? | by ANOLYTICS

February 26, 2021
Role of Image Annotation in Applying Machine Learning for Precision Agriculture | by ANOLYTICS
Neural Networks

Role of Image Annotation in Applying Machine Learning for Precision Agriculture | by ANOLYTICS

February 26, 2021
Label a Dataset with a Few Lines of Code | by Eric Landau | Jan, 2021
Neural Networks

Label a Dataset with a Few Lines of Code | by Eric Landau | Jan, 2021

February 25, 2021
How to Make Data Annotation More Efficient? | by ByteBridge | Feb, 2021
Neural Networks

How to Make Data Annotation More Efficient? | by ByteBridge | Feb, 2021

February 25, 2021
How Is Machine Learning Revolutionizing Supply Chain Management | by Gina Shaw | Feb, 2021
Neural Networks

How Is Machine Learning Revolutionizing Supply Chain Management | by Gina Shaw | Feb, 2021

February 25, 2021
Next Post
13 Algorithms and 4 Learning Methods of Machine Learning

13 Algorithms and 4 Learning Methods of Machine Learning

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

Plasticity in Deep Learning: Dynamic Adaptations for AI Self-Driving Cars

January 6, 2019
Microsoft, Google Use Artificial Intelligence to Fight Hackers

Microsoft, Google Use Artificial Intelligence to Fight Hackers

January 6, 2019

Categories

  • Artificial Intelligence
  • Big Data
  • Blockchain
  • Crypto News
  • Data Science
  • Digital Marketing
  • Internet Privacy
  • Internet Security
  • Learn to Code
  • Machine Learning
  • Marketing Technology
  • Neural Networks
  • Technology Companies

Don't miss it

Spy agency: Artificial intelligence is already a vital part of our missions
Internet Security

Spy agency: Artificial intelligence is already a vital part of our missions

February 26, 2021
Blockchain lags behind other technologies in finance adoption for now, says Broadridge
Blockchain

Blockchain lags behind other technologies in finance adoption for now, says Broadridge

February 26, 2021
Supercomputer-Powered Machine Learning Supports Fusion Energy Reactor Design
Machine Learning

Supercomputer-Powered Machine Learning Supports Fusion Energy Reactor Design

February 26, 2021
How 3D Cuboid Annotation Service is better than free Tool? | by ANOLYTICS
Neural Networks

How 3D Cuboid Annotation Service is better than free Tool? | by ANOLYTICS

February 26, 2021
21 Must-Know Instagram Facts for 2021
Marketing Technology

21 Must-Know Instagram Facts for 2021

February 26, 2021
Chinese cyberspies targeted Tibetans with a malicious Firefox add-on
Internet Security

Chinese cyberspies targeted Tibetans with a malicious Firefox add-on

February 26, 2021
NikolaNews

NikolaNews.com is an online News Portal which aims to share news about blockchain, AI, Big Data, and Data Privacy and more!

What’s New Here?

  • Spy agency: Artificial intelligence is already a vital part of our missions February 26, 2021
  • Blockchain lags behind other technologies in finance adoption for now, says Broadridge February 26, 2021
  • Supercomputer-Powered Machine Learning Supports Fusion Energy Reactor Design February 26, 2021
  • How 3D Cuboid Annotation Service is better than free Tool? | by ANOLYTICS February 26, 2021

Subscribe to get more!

© 2019 NikolaNews.com - Global Tech Updates

No Result
View All Result
  • AI Development
    • Artificial Intelligence
    • Machine Learning
    • Neural Networks
    • Learn to Code
  • Data
    • Blockchain
    • Big Data
    • Data Science
  • IT Security
    • Internet Privacy
    • Internet Security
  • Marketing
    • Digital Marketing
    • Marketing Technology
  • Technology Companies
  • Crypto News

© 2019 NikolaNews.com - Global Tech Updates