
5354 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 8, NO. 9, SEPTEMBER 2023
Fig. 2. Classical racing stack using localisation and planning modules, and
end-to-end racing using a neural network without state estimation.
A. Classical Racing
The classical racing method calculates an optimal trajectory
and then uses a path-following algorithm to track it [1]. Trajec-
tory optimisation techniques calculate a set of waypoints (posi-
tions with a s peed reference) on a track that, when followed, lead
the vehicle to complete a lap in the shortest time possible [3].A
path-following algorithm tracks the trajectory using the vehicle’s
pose as calculated by a localisation algorithm.
Localisation: Localisation approaches for autonomous racing
depend on the sensors and computation available. Full-sized
racing cars are often equipped with GPS (GNSS), LiDAR,
radar, cameras, IMUs, and powerful computers that can fuse
these measurements in real-time [10]. Classical F1Tenth racing
approaches have used a particle filter that takes a LiDAR scan
and a map of the track to estimate the vehicle’s pose [2], [4],
[11]. Localisation methods are inherently limited by requiring a
race track map and, thus, are inflexible to unmapped tracks.
Classical Path-Following: Model-predictive controllers
(MPC) and pure pursuit path-followers have been used for
trajectory tracking [1]. MPC planners calculate optimal con-
trol commands in a receding horizon manner [12] and have
demonstrated high-performance results racing F1Tenth vehicles
at speeds of up to 7 m/s [2]. The pure pursuit algorithm uses
a geometric model to calculate a steering angle to follow the
optimal trajectory [13], and has been used to race at speeds of
7m/s[11] and over 8 m/s [14].
Learning-based Path-following: Classical path-following al-
gorithms have been replaced by neural networks, aiming to
improve computational efficiency (compared to MPC) [12],
[15] and performance in difficult-to-model conditions such as
drifting [16]. Including upcoming trajectory points in the state
vector (as opposed to only centerline points [15]) has shown to
improve racing performance [17], [18]. This shows demonstrates
that using the optimal trajectory results in high-performance
racing.
While classical and learning-based path-following methods
have produced high-performance results, they are inherently
limited by requiring the vehicle’s location on the map.
B. End-to-end Learning
In contrast to classical methods that use a perception, planning
and control pipeline, end-to-end methods use a neural network
to map raw sensory data to control references [9]. While some
approaches have used camera images [19], the dominant input
has been LiDAR scans [7], [9], [20].
Autonomous Driving: End-to-end learning agents can use a
subset of beams from a LiDAR scan to output steering references
that control a vehicle travelling at constant speed [7]. While
imitation learning (IL) has been used to train agents to copy an
expert policy [21], deep reinforcement learning, has shown bet-
ter results, with higher lap completion rates [7]. DRL algorithms
train agents in an environment (simulation [7] or real-world
system [20] ), where at each timestep, the agent receives a state,
selects an action and then receives a reward. DRL approaches to
driving F1Tenth vehicles have considered low, constant speeds
of 1.5 m/s [7], [22],2m/s[20], and 2.4 m/s [8]. While indicating
that DRL agents can control a vehicle, these methods neglect the
central racing challenge of speed selection.
Autonomous Racing: Using model-free end-to-end DRL
agents to select speed and steering commands for autonomous
racing is a difficult problem [23], [24]. In response, Brunnbauer
et al. [23] turned to model-based learning and Zhang et al. [24]
incorporated an artificial potential field planner in the learning
to simplify t he learning problem. Both [23] and [24] show that
their agents regularly crash while using top speeds of only
5 m/s, demonstrating the difficulty of learning for high-speed
autonomous racing. Bosello et al. [9] use a model-free DRL
algorithm (DQN) for F1Tenth racing at speeds of up to 5 m/s,
but provide no detail on the speed profile, trajectory or crash
rate.
Summary: Classical racing methods have produced high-
performance racing behaviour using high maximum speeds but
are limited by requiring localisation. In contrast, end-to-end
DRL agents are successful in controlling vehicles at low speeds
using only the LiDAR scan as input. While some methods have
approached speed selection using DRL agents, there has been
little study on the speed profiles selected, and the highest speed
used is 5 m/s, which is significantly less than classical methods of
8 m/s. This letter targets the gap in developing high-performance
racing solutions for steering and speed control in autonomous
race cars.
III. M
ETHODOLOGY
A. Reinforcement Learning Preliminary
Deep reinforcement learning (DRL) trains autonomous
agents, consisting of deep neural networks, to maximise a reward
signal from experience [6]. Reinforcement learning problems
are modelled as Markov Decision Processes (MDPs), where the
agent receives a state s from the environment and selects an
action a. After each action has been executed, the environment
returns a reward r indicating how good or bad the action was
and a new state s
.
This work considers deep-deterministic-policy-gradient
(DDPG) algorithms since we work with a continuous action
space [25]. DDPG algorithms maintain two neural networks, an
actor μ that maps a state to an action and a critic Q that evaluates
the action-value function. A pair of networks are maintained for
the actor and the critic; the model networks are used to select
actions, and target networks calculate the targets μ
and Q
.A
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on October 11,2023 at 14:34:47 UTC from IEEE Xplore. Restrictions apply.
评论