Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation.

Overview

Episodic Transformers (E.T.)

Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich, Cordelia Schmid, Chen Sun

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions. This code reproduces the results obtained with E.T. on ALFRED benchmark. To learn more about the benchmark and the original code, please refer to ALFRED repository.

Quickstart

Clone repo:

$ git clone https://github.com/alexpashevich/E.T..git ET
$ export ET_ROOT=$(pwd)/ET
$ export ET_LOGS=$ET_ROOT/logs
$ export ET_DATA=$ET_ROOT/data
$ export PYTHONPATH=$PYTHONPATH:$ET_ROOT

Install requirements:

$ virtualenv -p $(which python3.7) et_env
$ source et_env/bin/activate

$ cd $ET_ROOT
$ pip install --upgrade pip
$ pip install -r requirements.txt

Downloading data and checkpoints

Download ALFRED dataset:

$ cd $ET_DATA
$ sh download_data.sh json_feat

Copy pretrained checkpoints:

$ wget http://pascal.inrialpes.fr/data2/apashevi/et_checkpoints.zip
$ unzip et_checkpoints.zip
$ mv pretrained $ET_LOGS/

Render PNG images and create an LMDB dataset with natural language annotations:

$ python -m alfred.gen.render_trajs
$ python -m alfred.data.create_lmdb with args.visual_checkpoint=$ET_LOGS/pretrained/fasterrcnn_model.pth args.data_output=lmdb_human args.vocab_path=$ET_ROOT/files/human.vocab

Note #1: For rendering, you may need to configure args.x_display to correspond to an X server number running on your machine.
Note #2: We do not use JPG images from the full dataset as they would differ from the images rendered during evaluation due to the JPG compression.

Pretrained models evaluation

Evaluate an E.T. agent trained on human data only:

$ python -m alfred.eval.eval_agent with eval.exp=pretrained eval.checkpoint=et_human_pretrained.pth eval.object_predictor=$ET_LOGS/pretrained/maskrcnn_model.pth exp.num_workers=5 eval.eval_range=None exp.data.valid=lmdb_human

Note: make sure that your LMDB database is called exactly lmdb_human as the word embedding won't be loaded otherwise.

Evaluate an E.T. agent trained on human and synthetic data:

$ python -m alfred.eval.eval_agent with eval.exp=pretrained eval.checkpoint=et_human_synth_pretrained.pth eval.object_predictor=$ET_LOGS/pretrained/maskrcnn_model.pth exp.num_workers=5 eval.eval_range=None exp.data.valid=lmdb_human

Note: For evaluation, you may need to configure eval.x_display to correspond to an X server number running on your machine.

E.T. with human data only

Train an E.T. agent:

$ python -m alfred.model.train with exp.model=transformer exp.name=et_s1 exp.data.train=lmdb_human train.seed=1

Evaluate the trained E.T. agent:

$ python -m alfred.eval.eval_agent with eval.exp=et_s1 eval.object_predictor=$ET_LOGS/pretrained/maskrcnn_model.pth exp.num_workers=5

Note: you may need to train up to 5 agents using different random seeds to reproduce the results of the paper.

E.T. with language pretraining

Language encoder pretraining with the translation objective:

$ python -m alfred.model.train with exp.model=speaker exp.name=translator exp.data.train=lmdb_human

Train an E.T. agent with the language pretraining:

$ python -m alfred.model.train with exp.model=transformer exp.name=et_synth_s1 exp.data.train=lmdb_human train.seed=1 exp.pretrained_path=translator

Evaluate the trained E.T. agent:

$ python -m alfred.eval.eval_agent with eval.exp=et_synth_s1 eval.object_predictor=$ET_LOGS/pretrained/maskrcnn_model.pth exp.num_workers=5

Note: you may need to train up to 5 agents using different random seeds to reproduce the results of the paper.

E.T. with joint training

You can also generate more synthetic trajectories using generate_trajs.py, create an LMDB and jointly train a model on it. Please refer to the original ALFRED code to know more the data generation. The steps to reproduce the results are the following:

  1. Generate 45K trajectories with alfred.gen.generate_trajs.
  2. Create a synthetic LMDB dataset called lmdb_synth_45K using args.visual_checkpoint=$ET_LOGS/pretrained/fasterrcnn_model.pth and args.vocab_path=$ET_ROOT/files/synth.vocab.
  3. Train an E.T. agent using exp.data.train=lmdb_human,lmdb_synth_45K.

Citation

If you find this repository useful, please cite our work:

@misc{pashevich2021episodic,
  title ={{Episodic Transformer for Vision-and-Language Navigation}},
  author={Alexander Pashevich and Cordelia Schmid and Chen Sun},
  year={2021},
  eprint={2105.06453},
  archivePrefix={arXiv},
}
Comments
  • Stuck while rendering trajectory

    Stuck while rendering trajectory

    Hi, I'm now suffering from the code stucking and would ask some help to deal with it.

    The problem command is as below: python -m alfred.gen.render_trajs

    From debugging, I found that the lines below makes it get stuck https://github.com/alexpashevich/E.T./blob/92ee2378d596b55f05e5c1949726577a64215f04/alfred/gen/render_trajs.py#L272-L288

    More precisely, the code stuck while performing the lines below. https://github.com/alexpashevich/E.T./blob/92ee2378d596b55f05e5c1949726577a64215f04/alfred/env/thor_env.py#L65-L76

    It stuck due to thorEnv so I searched some issue seems related. https://github.com/askforalfred/alfred/issues/120#issue-1330610682

    There is no error but the simulator just stop on some timestamp and don't move more than a day.. It would be very appreciate if you help

    opened by jeje910 9
  • Error in trying evaluation task

    Error in trying evaluation task

    Thank you for your great work! @alexpashevich

    Your paper was very interesting for me and i'm now trying to run your code in my local. But I have an issue when I try to run alfred.eval.eval_agent.

    When I try to run the pretrained models evaluation code python3 -m alfred.eval.eval_agent with eval.exp=pretrained eval.checkpoint=et_human_pretrained.pth eval.object_predictor=$ET_LOGS/pretrained/maskrcnn_model.pth exp.num_workers=5 eval.eval_range=None exp.data.valid=lmdb_human FileNotFoundError came out and I cannot find which one is wrong. The codes before this line were correctly run and worked well.

    The picture below is the error I am currently suffer from. ET Question

    And the pretrained directory looks as follow. ET ERROR2

    Should I copy the directory from somewhere or did I miss something before evaluate task?

    Sorry for the poor question..

    opened by jeje910 6
  • Share 45k synthetic trajectories

    Share 45k synthetic trajectories

    Thank you @alexpashevich for your great work. When I followed your guide to Generate 45K trajectories with alfred.gen.generate_trajs, it mostly failed to generate trajectories in order to reproduce the result. Rather than sharing the entire raw images, would you just share the traj_data.json with us? I think the total size of such files for 45k trajectories is around 3 GB. From these files, we can render the images easily.

    Thank you so much!

    opened by davidnvq 4
  • Trajectories were skipped

    Trajectories were skipped

    Thank you @alexpashevich for your great work When I followed your guide to create an LMDB dataset with natural language annotations. It always report the error "string indices must b integers" in line 145 of the thor_env.py. The action "LookUp" seems not be an integer. This error would cause 7080 trajectories ere skipped. Would you like to tell us how to fixed it?

    opened by ptwaixingren 3
  • Questions about the pretrained model.

    Questions about the pretrained model.

    Thank you @alexpashevich for sharing this great work. I am here for asking whether you would release your different versions of pretrained models you published in your paper and on Alfred leaderboard.

    opened by yizhouzhao 2
  • Question about the number of synthetic language demonstrations

    Question about the number of synthetic language demonstrations

    Thanks for your great work!

    I wanna ask why the number of synthetic language demonstrations is not equal to the number of expert trajectories in train split? Since each trajectory corresponds to a single PDDL, and your synthetic language demonstration is just generated from PDDL, I supposed that the number of synthetic language demonstrations would be just the same as the number of expert trajectories in train split. However, the number of synthetic language instructions is about five times larger than the expert trajectories. Is there anything I misunderstood or are you using other methods of data augmentation? Thanks for your help!

    opened by Gasoonjia 1
  • Not able to render on Colab

    Not able to render on Colab

    The below command:

    python -m alfred.gen.render_trajs

    works when i run it on my personal laptop, but does not seem to work on colab.

    Are there any changes I can do so that I can get it working on colab.

    opened by Vibha111094 1
  • Stuck while rendering trajectory and Error in trying evaluation task

    Stuck while rendering trajectory and Error in trying evaluation task

    These issues are present in the current repo.

    How to fix them: Issue 1: Stuck while rendering trajectory: https://github.com/alexpashevich/E.T./issues/8 One of the modules in requirements.txt need to be a older version. I don't remember which.

    Issue 2: Error in trying evaluation task: https://github.com/alexpashevich/E.T./issues/6 Happens when using cuda You can either comment it out in model.util, I think you can add .cpu() to the cuda tensor like so: feat_extracted = feat_extracted.cpu()

    opened by Samuel-Fipps 5
Owner
Alex Pashevich
PhD student at Thoth (Inria Alpes, France)
Alex Pashevich
C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

Build Status Travis CI VM: Linux x64: Raspberry Pi 3: Jetson TX2: Backstory I set to build ccv with a minimalism inspiration. That was back in 2010, o

Liu Liu 6.9k Jan 6, 2023
GA-NET: Global Attention Network for Point Cloud Semantic Segmentation

GA-NET: Global Attention Network for Point Cloud Semantic Segmentation We propose a global attention network, called GA-Net, to obtain global informat

null 4 Jul 18, 2022
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

Tencent 1.2k Dec 29, 2022
ncnn demo of DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

DocTr-ncnn ncnn demo of DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction model support: 1.Document Segmentation 2

FeiGeChuanShu 26 Nov 12, 2022
Transformer related optimization, including BERT, GPT

FasterTransformer This repository provides a script and recipe to run the highly optimized transformer-based encoder and decoder component, and it is

NVIDIA Corporation 1.7k Dec 26, 2022
Swin Transformer C++ Implementation

This is Swin Transformer C++ Implementation, inspired by swin-transformer-pytorch.

null 20 Dec 14, 2022
Open source modules to interface Metavision Intelligence Suite with event-based vision hardware equipment

Metavision: installation from source This page describes how to compile and install the OpenEB codebase. For more information, refer to our online doc

PROPHESEE 106 Dec 27, 2022
ROS wrapper for real-time incremental event-based vision motion estimation by dispersion minimisation

event_emin_ros ROS wrapper for real-time incremental event-based vision motion estimation by dispersion minimisation (EventEMin). This code was used t

Imperial College London 2 Jan 10, 2022
Fast, Attemptable Route Planner for Navigation in Known and Unknown Environments

FAR Planner uses a dynamically updated visibility graph for fast replanning. The planner models the environment with polygons and builds a global visi

Fan Yang 346 Dec 30, 2022
A c++ trainable semantic segmentation library based on libtorch (pytorch c++). Backbone: ResNet, ResNext. Architecture: FPN, U-Net, PAN, LinkNet, PSPNet, DeepLab-V3, DeepLab-V3+ by now.

中文 C++ library with Neural Networks for Image Segmentation based on LibTorch. The main features of this library are: High level API (just a line to cr

null 310 Jan 3, 2023
LibtorchSegmentation - A c++ trainable semantic segmentation library based on libtorch (pytorch c++). Backbone: VGG, ResNet, ResNext. Architecture: FPN, U-Net, PAN, LinkNet, PSPNet, DeepLab-V3, DeepLab-V3+ by now.

English | 中文 C++ library with Neural Networks for Image Segmentation based on LibTorch. ⭐ Please give a star if this project helps you. ⭐ The main fea

null 309 Dec 29, 2022
Navigation-mesh Toolset for Games

Recast & Detour Recast Recast is state of the art navigation mesh construction toolset for games. It is automatic, which means that you can throw any

null 5.2k Jan 5, 2023
Code for "Photometric Visual-Inertial Navigation with Uncertainty-Aware Ensembles" in TRO 2022

Ensemble Visual-Inertial Odometry (EnVIO) Authors : Jae Hyung Jung, Yeongkwon Choe, and Chan Gook Park 1. Overview This is a ROS package of Ensemble V

Jae Hyung Jung 94 Dec 8, 2022
Implementations of Multiple View Geometry in Computer Vision and some extended algorithms.

MVGPlus Implementations of Multiple View Geometry in Computer Vision and some extended algorithms. Implementations Template-based RANSAC 2D Line estim

Chenyu 6 Apr 7, 2022
Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer vision

The MRPT project 1. Introduction Mobile Robot Programming Toolkit (MRPT) provides C++ libraries aimed at researchers in mobile robotics and computer v

MRPT 1.6k Dec 24, 2022
open Multiple View Geometry library. Basis for 3D computer vision and Structure from Motion.

OpenMVG (open Multiple View Geometry) License Documentation Continuous Integration (Linux/MacOs/Windows) Build Code Quality Chat Wiki local/docker bui

openMVG 4.6k Jan 8, 2023
deep learning vision detector/estimator

libopenvision deep learning visualization C library Prerequest ncnn Install openmp vulkan(optional) Build git submodule update --init --recursuve cd b

Prof Syd Xu 3 Sep 17, 2022
Homework of RoboWalker Vision team of USTC for DJI Robomaster competition.

USTC RoboWalker战队 视觉组2022练习作业 “极限犹可突破,至臻亦不可止。” 作业列表 0. 编程基础教程 Hello World 针对没有学过C++/Python、没有太多相关编程经验的新同学的C++ & Python编程入门教程。 0. Git基础教程 Hello Git 学习世

Zhehao Li 4 Feb 20, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 65.7k Jan 4, 2023