Completed
Robustness to User Choice Models Change user choice model to cascade Joachims 2002
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Reinforcement Learning in Recommender Systems - Some Challenges
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 RL in User-Facing/Interactive Systems nature RL has found tremendous success with deep models
- 3 Some Challenges in User-facing RL (RecSys) Scale • Number of users (multi-user/MDPs) & actions combinatoriales, slates Idiosyncratic nature of actions
- 4 I. Stochastic Action Sets
- 5 SAS-MDPs: Constructing an MDP
- 6 SAS-MDPs: Solving Extended MDP
- 7 II. User-learning over Long Horizons Evidence of (very) slow user leaming and adaptation
- 8 Advantage Amplification Temporal aggregation leg, fixed actions can help amplify advantages
- 9 Advantage Amplification Temporal aggregation (eg, fixed actions) can help amplify advantages
- 10 Advantage Amplification Key points
- 11 An MDP/RL Formulation Objective: max cumulative user engagement' over session
- 12 The Problem: Item Interaction The presence of some items on the slate impacts user response hence value of others
- 13 User Choice: Assumptions Two key, but reasonable, assumptions
- 14 Full Q-Learning Decomposition still holds, standard Q-leaming update
- 15 Slate Optimization: Tractable Standard formulation: Fractional moved-integer program
- 16 Slate Optimization: Tractable Standard formulation: Fractional mixed-integer program
- 17 Synthetic Experiments Synthetic environment
- 18 Robustness to User Choice Models Change user choice model to cascade Joachims 2002