This summer I was very lucky to join JW Player as an engineering intern on the Data Team. It has been a fantastic experience. Aside from sailing on the Hudson River, enjoying Ping Pong games, and cycling on the Governors Island, I learned about their state-of-the-art data pipeline, followed Agile practices, and worked with an amazing group of people. I was part of the Discovery squad of the Data Team, worked on evaluating recommendation systems, and was responsible for developing an evaluation tool for our data-driven recommendations.
With data-driven recommendations, we want to show our users relevant videos to increase video plays and user engagement. The question is how to evaluate if the recommended content is relevant, and which metrics to use as the measure. Generally there are three methods for evaluating recommendation systems: offline experiments, online trials, and user studies. In this project, we are using the user study approach, by directly asking the opinions of the viewers whether the recommended video is relevant or not.
Architecture of the evaluation tool
How the data-driven recommendation system works is described here. For evaluation, we are focusing on the data-driven feeds. Since the feed list generated by the recommendation engine can change over time due to any kind of modification made to the system or the video metadata, such as the title or the description, we need to capture the snapshot of the feed list we are evaluating. To implement such a feature, we fetch the feed list (JSON) from the data-driven feeds API, add a timestamp to it, and store it in MongoDB, together with the seed media (i.e. the related_video in the data-driven feed, which we call the anchor media) that is used to generate the feed list. If only the media ID and the feed ID are provided, a set of recommended media is generated, captured and displayed. To load a historical recommendation set, the anchor media ID, the feed ID, and the timestamp are needed to find the match. On the evaluation page, the anchor icon next to each recommended video allows the user to view recommendations for that video. When clicked on, a new evaluation page is generated with that video as the new anchor media. All the recommendation views are stored in MongoDB, and can be retrieved later or shared via url.
Home page of the web app
Evaluation page of the web app
When I started on the project, I was so nervous but really excited at the same time, since this was the first real world web application I ever made. I was given time to learn all the new technologies, solve the problems I encountered, and build this web app independently. In the meanwhile, the senior members of the team were happy to help out and show me how to improve. All the feedback I got from the code reviews were very valuable for me to improve my coding practices. It has been a great learning experience this summer. Here’s a brief overview of some technologies I used:
There is a very well written Flask tutorial that helped me a lot to get started since this was the first time I have used Flask. As I worked, I encountered the concepts of application context and application context stack in Flask (and similarly, request context and request context stack), which were quite confusing at the beginning. I found a very nice post that explained these concepts to me.
MongoDB is very flexible for handling JSON-like files (which is one of the supported data formats by our feeds API). I used Flask-Pymongo to interact with MongoDB. This is also what I like about Flask: it is lightweight, but has a variety of extensions available for ease of use.
With the Bootstrap grid system, we can built a responsive web page fast. This tutorial was very helpful for me to better understand Bootstrap grid system, which is one of the key concepts of this framework.
I learned to use Grunt tasks, such as JSHint, Watch, and Bower, to automate the process of linting and managing front end components. I also learned how to use Grunt to copy files as part of the code deployment.
At school a lot of the times we rely on the test code of our professors, but in the real world, we need to write our own unit test code for each part of our application. I used the Python unittest module to test each API endpoint for the evaluation tool. For ease of testing, Flask Blueprint and Application Factories were used, so that a separate test configuration could be used for unit testing.
We follow the Scrum framework for implementing Agile, with daily standup, sprint planning, sprint demo, and sprint retrospective. A big project is divided into a set of JIRA tickets, with user stories written on the tickets to describe the expected feature from the end user’s perspective. Each ticket is linked to an epic, and has a story score assigned as a measure of the relative complexity of the work. As an intern, I also got involved in the scoring process.
By the end of my internship, we had an intern demo event in the monthly Product & Engineer Meeting, where I had the opportunity to present our work to the whole Product and Engineering team. I also had a chance to present my work within the Data team. Thanks to the help from Jamie and Alex, the project went smoothly. I was so excited to learn the tool was deployed on our web server with uWSGI and Nginx, and is planned to go into production soon. The tool is only available for internal use at the moment. We can use it for sanity checking our data and for on-boarding new customers. The long term plan is to use it to perform user studies through crowdsourcing services.