Reinforcement Learning in Recommender Systems - Some Challenges

Reinforcement Learning in Recommender Systems - Some Challenges

Simons Institute via YouTube Direct link

Intro

1 of 18

1 of 18

Intro

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Reinforcement Learning in Recommender Systems - Some Challenges

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 RL in User-Facing/Interactive Systems nature RL has found tremendous success with deep models
  3. 3 Some Challenges in User-facing RL (RecSys) Scale • Number of users (multi-user/MDPs) & actions combinatoriales, slates Idiosyncratic nature of actions
  4. 4 I. Stochastic Action Sets
  5. 5 SAS-MDPs: Constructing an MDP
  6. 6 SAS-MDPs: Solving Extended MDP
  7. 7 II. User-learning over Long Horizons Evidence of (very) slow user leaming and adaptation
  8. 8 Advantage Amplification Temporal aggregation leg, fixed actions can help amplify advantages
  9. 9 Advantage Amplification Temporal aggregation (eg, fixed actions) can help amplify advantages
  10. 10 Advantage Amplification Key points
  11. 11 An MDP/RL Formulation Objective: max cumulative user engagement' over session
  12. 12 The Problem: Item Interaction The presence of some items on the slate impacts user response hence value of others
  13. 13 User Choice: Assumptions Two key, but reasonable, assumptions
  14. 14 Full Q-Learning Decomposition still holds, standard Q-leaming update
  15. 15 Slate Optimization: Tractable Standard formulation: Fractional moved-integer program
  16. 16 Slate Optimization: Tractable Standard formulation: Fractional mixed-integer program
  17. 17 Synthetic Experiments Synthetic environment
  18. 18 Robustness to User Choice Models Change user choice model to cascade Joachims 2002

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.