Deep Reinforcement Learning of Marked Temporal Point Processes

Overview

Explore deep reinforcement learning techniques for marked temporal point processes in this 36-minute conference talk. Delve into the modeling of discrete events in continuous time, examining examples like information propagation and knowledge creation. Learn how to represent the timing and marks of events in marked temporal point processes (MTPPs) and understand their applications in scenarios such as optimizing when-to-post strategies and spaced repetition learning. Discover the reinforcement learning setup for continuous time processes, including state representation and policy optimization using policy gradient methods. Gain insights into solving real-world problems like smart repetition scheduling and strategic social media posting through the application of deep reinforcement learning to MTPPs.

Syllabus

Start
Deep Reinforcement Learning of Marked Temporal Point Processes
Many discrete events in continuous time
Variety of processes behind these events
Example I: Information propagation
Example II: Knowledge creation
Aren't these event traces just time series?
What are marked temporal point processes?
What can MTPPs model?
What can MTPPs model: when-to-post
What can MTPPs model: spaced-repetition
How to optimize Agent's policy?
Optimizing Agent's policy using RL
Outline
Representing Marks and Times of MTPPs
How to represent MTPPs: timing of events
How to represent MTPPs: marks of events
How to represent MTPPs: summary
Reinforcement Learning: Setup
Reinforcement Learning: Discrete time
Reinforcement Learning: Continuous time
RL with entire history as state
RL state: embedding marks
RL state: embedding source of event
RL state in parametrization of the policy
RL with Asynchronous Feedback
RL problem with MTPPs: summary
Policy optimization problem
Existing approaches have limitations
Policy Gradient method can be used!
Policy Gradient: Example iteration
Spaced repetition: Problem setup
Spaced repetition to smart repetition
When-to-post: Problem setup
When to post with unknown priorities
When to post with baselines
Deep Reinforcement Learning for Marked Temporal Point Processes
Thank you!!