My research interests lie primarily in Machine Learning and Artificial Intelligence, with emphasis on decision- making under uncertainty using principled mathematical tools from probability theory, decision theory, and statistics.
I received my Ph.D. degree in Computer Science from the University of Massachusetts Amherst in 2005. I was a Postdoctoral Fellow at the Department of Computing Science at the University of Alberta from 2005 to 2008. I was a Researcher at the Institut National de Recherche en Informatique et en Automatique (INRIA) in Lille, France from 2008 to 2013. In October 2013, I joined Adobe as a Senior Analytics Researcher to work in the area of Digital Marketing.
I am on the editorial board of the Machine Learning Journal (MLJ) since 2011 and have reviewed for the Journal of Machine Learning Research (JMLR), Journal of Artificial Intelligence Research (JAIR), Journal of Operations Research, IEEE Transactions on Automatic Control, Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS), NeuroComputing, Journal of Aunomous Robots, and International Journal of Robotics Research (IJRR).
I have been an Area Chair at NIPS 2013 and IJCAI 2011; a Program Comittee member at ICML 2006-2014, AAAI 2007, 2008, 2011, UAI 2012, and ECML 2010, 2012; and a Reviewer for NIPS 2006-2012, AISTATS 2009, 2011, 2012, AAAI 2005, IJCAI 2007, and COLT 2008.
For my detailed CV including my list of publications, please click here.
Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh(Jul, 2015)
International Conference on Machine Learning (ICML) 2015.
Georgios Theocharous, Philip S. Thomas, Mohammad Ghavamzadeh(Jul, 2015)
In this paper, we propose a framework for using reinforcement learning (RL) algorithms to learn good policies for personalized ad recommendation (PAR) systems. The RL algorithms take into account the long-term effect of an action, and thus, could be more suitable than myopic techniques like supervised learning and contextual bandit, for mod- ern PAR systems in which the number of returning visitors is rapidly growing. However, while myopic techniques have been well-studied in PAR systems, the RL approach is still in its infancy, mainly due to two fundamental challenges: how to compute a good RL strategy and how to evaluate a solution using historical data to ensure its “safety” before deployment. In this paper, we propose to use a family of off-policy evaluation techniques with statistical guarantees to tackle both these challenges. We apply these methods to a real PAR problem, both for evaluating the final performance and for optimizing the parameters of the RL algorithm. Our results show that a RL algorithm equipped with these off-policy evaluation techniques outperforms the myopic approaches. Our results also give fundamental insights on the difference between the click through rate (CTR) and life-time value (LTV) metrics for evaluating the performance of a PAR algorithm.
International Joint Conference on Artificial Intelligence (IJCAI) 2015.
Georgios Theocharous, Philip S. Thomas, Mohammad Ghavamzadeh(May, 2015)
The main objective in the ad recommendation problem is to find a strategy that, for each visitor of the website, se- lects the ad that has the highest probability of being clicked. This strategy could be computed using supervised learning or contextual bandit algorithms, which treat two visits of the same user as two separate independent visitors, and thus, optimize greedily for a single step into the future. Another approach would be to use reinforcement learning (RL) meth- ods, which differentiate between two visits of the same user and two different visitors, and thus, optimizes for multiple steps into the future or the life-time value (LTV) of a cus- tomer. While greedy methods have been well-studied, the LTV approach is still in its infancy, mainly due to two fun- damental challenges: how to compute a good LTV strategy and how to evaluate a solution using historical data to ensure its “safety” before deployment. In this paper, we tackle both of these challenges by proposing to use a family of off-policy evaluation techniques with statistical guarantees about the performance of a new strategy. We apply these methods to a real ad recommendation problem, both for evaluating the final performance and for optimizing the parameters of the RL algorithm. Our results show that our LTV opti- mization algorithm equipped with these off-policy evalua- tion techniques outperforms the greedy approaches. They also give fundamental insights on the difference between the click through rate (CTR) and LTV metrics for performance evaluation in the ad recommendation problem.
Workshop on Ad Targeting at Scale (WWW) 2015.