Hi ,

I'm a master's student at McGill and Mila in Computer Science under Prof. Doina Precup, where I study exploration for reinforcement learning. I previously completed a bachelor of science in H. Prob&Stats at McGill. I am independently researching mathematical formalizations of consciousness and intelligence. My mission is to create artificial self-conscious beings that think and act intelligently.

My CV can be found here.


Our submission "Randomized Exploration in Reinforcement Learning with General Value Function Approximation" got accepted to ICML'21 :').

In Summer 2021, I will start my Masters in Computer Science at McGill University, where I'll be working on provably efficient exploration in RL with Prof. Doina Precup.

I co-instructed basic reinforcement learning starting in Winter 2021 with Gabriela. We speedran elementary concepts and ideas in RL. About halfway through, we snorkeled deeper into a selected number of topics we think are important (and exciting!), including abstraction, exploration, selected topics in applied PDEs, Markov chains, and trees. Course registration is closed.

Randomized Exploration for Reinforcement Learning with General Value Function Approximation - Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. Yang, June 2021

"We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle. Unlike existing upper-confidence-bound (UCB) based approaches, which are often computationally intractable, our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. To attain optimistic value function estimation without resorting to a UCB-style bonus, we introduce an optimistic reward sampling procedure. When the value functions can be represented by a function class $\mathcal{F}$, our algorithm achieves a worst-case regret bound of $\tilde{O}(\mathrm{poly}(d_EH)\sqrt{T})$ where $T$ is the time elapsed, $H$ is the planning horizon and $d_E$ is the \emph{eluder dimension} of $\mathcal{F}$. In the linear setting, our algorithm reduces to LSVI-PHE, a variant of RLSVI, that enjoys an $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret. We complement the theory with an empirical evaluation across known difficult exploration tasks."

Value Iteration-based Provably Efficient Exploration - Viet Nguyen, Eric Hu, January 2021

"Reinforcement learning has long piqued the interest of many due to its vast applicability and its highly promising potential. With foundational ideas stemming from the traditional theory of artificial intelligence, it’s been undergoing continuous development since the later decades of the 20th century. When the deep learning revolution occurred in the early 2010s with cornerstone works by Krizhevsky et. al., Srivastava et. al., and later on Vaswani et. al., many employed deep neural networks in reinforcement learning to achieve superhuman level game-playing, as well as a multitude of applications in other fields. Recently, by imposing various assumptions, Wang et. al. proposed a provably efficient algorithm in the general function approximation (GFA) setting, a setting that covers a restriction of the space of affine compositions of neural networks. In this work, as we take a deeper look into several important components of Wang et. al.’s work, we will later argue that although they spearhead recent efforts towards a theory for deep reinforcement learning, the strong assumptions that make up the heart of their proofs hint at the need for a different/alternate fundamental understanding of the reinforcement learning problem."

Provable Efficiency: Finding Regret Bounds in Reinforcement Learning - Eric Hu, Viet Nguyen, December 2020

"Reinforcement learning algorithms have been making great progress in many domains, from superhuman levels of play in abstract strategy games to robot optimal control. However, while these algorithms often show impressive empirical results, they also tend to lack rigorous theory proving their efficiency. In this work we shall examine the elements underlying the current research re- garding provably efficient reinforcement learning algorithms and consider their effectiveness and practicality in context. In particular we shall focus on the emergent use of eluder dimension in proving efficiency, the use of importance sampling in provably efficient algorithms, and touch upon the methods of encouraging exploration."

On the Analysis of Stochastic Gradient Descent in Neural Networks via Gradient Flows - Viet Nguyen, May 2020

"Research in neural network theory is steadily gaining traction, as there is a growing interest in the thorough understanding of the functionalities and the mechanisms through which these models achieve strong performances in decision problems. Several methods have been proposed to quantitatively assess the optimization process of the neural network’s high dimensional non-convex objective, employing various tools such as kernel methods, global optimization, optimal transport, and functional analysis. In this work, we focus on Mei et al.’s analysis of the mean risk field of two-layer neural networks which associates stochastic gradient descent’s (SGD) training dynamics to a partial differential equation (PDE) in the space of probability measures with the topology of weak convergence. Precisely, we dissect the proof of the convergence of SGD’s dynamics to the solution of the PDE, showcase several results regarding the analysis of the latter and their implications on the training process of neural networks via SGD, and discuss related work as well as potential further explorations stemming from various fields."

On the Concentration of Measure in Orlicz spaces of exponential type - Viet Nguyen, April 2020

"The study of Orlicz spaces, first described in 1931 by Wladyslaw Orlicz and Zygmunt Wilhelm Birnbaum, became popular in the empirical processes and functional analysis literature, due to the rising interest in chaining arguments to derive probabilistic bounds for stochastic processes, and generalizations of Lp spaces as well as Sobolev space embeddings, respectively. Orlicz spaces exhibit strong concentration phenomena, inherited from their construction. In particular, they are associated to the sub-Exponential and sub-Gaussian classes of random variables. In this article, we aim to provide a brief introduction to concentration of measure in Orlicz spaces, in particular, Orlicz spaces of exponential type. We begin by the construction of these spaces, and delve into certain concentration guarantees and applications."

Neural Networks: A Continuum of Potential - Eric Hu, Johnny Huang, Viet Nguyen, December 2019

"Theories on neural networks with an infinite number of hidden units were explored since the late 1990's, deepening the understanding of these computational models in two principal aspects: 1. network behavior under some limiting process on its parameters and 2. neural networks from a functional perspective. Continuous neural networks are of particular interest due to their computational feasibility from finite affine parametrizations and strong convergence properties under constraints. In this paper, we survey some of the theoretical groundings of continuous networks, notably their correspondence with Gaussian processes, computation and applications through the Neural Tangent Kernel (NTK) formulation, and apply the infinite dimensional extension to inputs and outputs all the while considering their universal approximation properties." I delivered a presentation on this topic at the Seminary on Undergraduate Mathematics in Montreal (SUMM) 2020.

Fader Networks: A Heuristic Approach - Marcos Cardenas Zelaya, Marie Leech, Viet Nguyen, April 2019

"In recent years, approaches to approximating complex data distributions have been centered around the generative adversarial networks (GANs) paradigm, eliminating the need for Markov chains in generative stochastic networks or approximate inference in Boltzmann machines. Applying GANs to image and video editing have been done in a supervised setting where external information about the data allows the re-generation of real images with deterministic complex modifications, using Invertible conditional GANs (IcGANs). Fader Networks extend on this idea by learning a post-encoding latent space invariant to labeled features on the image, and re-generating the original image by providing the decoder with alternate attributes of choice. In this paper, we explore the impacts of modifications on the encoding and decoding convolutional blocks, analyzing the effects of droput in the discriminator, implementations of different loss functions on the generated images' quality using appropriate metrics and extend the model by including Skip Connects. We finish by providing an empirical assessment on how Fader networks develop a pseudo-understanding of higher-level image features."

RL Animation

I am currently working with Diego Lopez on the animation of the value iteration algorithm on a RL grid world. Work in progress is here: Value Iteration Animation.

CAE - defective_pilots, Fall 2019

Given time series flight simulation KPI data of several commercial pilots, we are tasked to classify whether a pilot should undergo retraining and have their mental health assessed. We won the CAE sponsor challenge at ImplementAI 2019 by providing the best solution which combines various feature engineering techniques and a mix of supervised and unsupervised learning.

Milo, the Brain-Controlled Wheelchair, Winter 2019

"Students from different backgrounds — including biology, computer science, hardware engineering, data collection, and machine learning — got together on their free-time to start a project to challenge themselves and improve the world around them. By gathering EEG bio-signals from the brain, they were able to process the data in order to direct the wheelchair forward, left, right, and stop. Over the course of the development, the students were able to create a wheelchair without using motor controls – just the control of the user’s mind – further improved with impressive semi-autonomous enhancements." I worked on the Machine Learning team of the project and was tasked to classify the processed EEG signals into different wheelchair movements. We won first prize at the NeuroTechX "Open Challenge".

Handwriting Recognizer, Winter 2019

At ConUHacks IV, we leveraged deep learning models to create a handwriting recognizer, deployed as a backend to an Android app using their camera API.

YouTube thingies

A link to my YouTube channel.

U2S2 Class Notes and Assignments

These contain *incomplete* notes from MATH 565 - Advanced Real Analysis 2, as taught by Prof. John Toth, and MATH 598 - Topics in Prob&Stats: Concentration Phenomena, as taught by Prof. Jessica Lin, available here. They also contain my highly questionable assignment submissions.


I transcribed Danny Wright's version of Canon in D by Pachelbel for piano, here.

I transcribed Văn Vượng's Hà Nội Trong Mắt Ai for guitar, here. I 200% recommend the documentary with the same title by director Trần Văn Thủy.

I transcribed ほんをよむ by 百景 for guitar, I play it with capo 2, here.

Here is a list of cool people I had the opportunity to meet and meme around with. The order was decided by sampling from a Lévy process.