Tip Mo Ba Ako? A Tip Recommender System to Influence Passenger Tipping Behavior of Taxi and Ridesharing Platforms
This is a final project for our Machine Learning class held from May 2020 to September 2020. This was created alongside my Term 2 learning teammates: Ethan Casin, DK Go, Karl Navarro, and Daryll Tumambing. For this project, we were tasked to create a machine learning model with a business use case and publicly present our findings.
Objective
The competition between taxis and ridesharing companies has transformed the transportation industry. The emergence of ridesharing companies (also known as transportation network providers or ride-hailing services), such as Uber and Lyft, has been regarded as disruptive players in the transportation industry. The convenience and the relatively cheaper cost make ridesharing companies the preferred transportation means for more and more people nowadays, which is why it may come as a surprise to find out that at least for Uber, only 16% of the trips are tipped and only 1% of customers always tip, while nearly 60% never do [1]. For the City of Chicago, from which we acquired our dataset, only 18% for the ridesharing transactions were tipped which is very low, compared to the 49% tipping rate for taxis. This disparity on tips is quite odd considering both sides offer the same service to practically the same customers.
Our goal is to gain insights as to what causes this low tipping rate and develop a machine learning model that would serve as the backbone of a recommender system that would influence customer tipping behavior. The motivation is to generate business value that would benefit both drivers and customers, the two main stakeholders of taxis and ridesharing companies: drivers and customers.
- For drivers: The machine learning model is expected to increase the frequency of customer tipping and therefore, increase drivers’ revenue.
- For customers: The machine learning model will improve the customer tipping experience in terms of convenience. It will be able to predict whether a customer is more likely to tip or not. The system will then suggest different tip prices and the customer will be given an option to increase the tip to a certain amount adding a layer of personalization to the tip transaction.
About the data
Similar to our use case on finding customer segments, we used the City of Chicago’s Open Data API on Transportation Network Providers (TNP) and taxi transactions. On this project, we limited it to only 2019 data. The resulting dataset had about 15 million rows in total.
Methodology

- Data collection: Using the Python requestsmodule, we created a script to retrieve the taxi and TNP data from their respective pages in the Chicago data portal, which were filtered by the trip start timestamp. The script was run in multiple parallel processes to get the data faster. The requests ran on an interval of 5,000 datapoints due to the set limit. Moreover, each parallel run aimed to complete a month’s worth of data.
- Data storage: The retrieved data was then appended in separate SQLite databases, one for taxi and one for TNP, in every iteration.
- Data cleaning and sampling: Due to the size of the dataset, we randomly sampled 2% of the data for both taxi and TNP datasets. We cleaned the dataset by removing null values and imputing on certain rows. For certain columns such as payment type, we remapped these to more general payment types.
- Data sampling pt. 2: After cleaning the datasets, it resulted in 240,000 rows and 1.8 million rows for the taxi and TNP datasets, respectively. In order to make them equal, we conducted random sampling again on the TNP dataset until it reached an equal number of rows to the taxi dataset. To make the dataset usable for the machine learning algorithm, we conducted one-hot encoding on categorical features.
- Modelling: Our solution is a two-step model, which begins with a classifier model that tests if a passenger is likely to tip, followed by a regression model that tests how much a passenger will tip. We conducted hyperparameter tuning with 8 classifier models: Gaussian Naive Bayes, Logistic Regression, LightGBM Classifier, Linear Support Vector Classification, K-Neighbors Classifier, Random Forest Classifier, Decision Tree Classifier, and Extremely Randomized Trees Classifier.
- Hyperparameter tuning: From the transactions that have a recorded tip, we then conducted hyperparameter tuning with 8 regressor models: LightGBM Regressor, Linear Regression, Lasso Regression, Ridge Regression, Random Forest Regressor, Decision Tree Regressor, K-Neighbors Regressor, and Extremely Randomized Trees Regressor.
Insights

Average pickup and dropoff weights of each community area in the regressor model.
- The best model for classification was Logistic Regression that had a test accuracy of 97.62%. Moreover, Ridge Regression with an accuracy of 77.07% and RMSE of USD 1.51 was the best regressor to predict the tip amount.
- These models can impact passenger tipping behavior by determining if a passenger would tip and suggest a tip price to them. For some areas, the regressor model shows that these areas can spell a huge difference in the amount of tip given, as shown in the visualization above.
- TNP platforms can fully maximize the models given the vast amounts of data they have the platform. Taxi industries, on the other hand, need to modernize their business models for the models to be leveraged properly.
- This also reveals a new perspective on the tipping behavior of passengers for ridesharing companies. If taxi passengers are willing to tip their taxi drivers, why can’t they tip their ridesharing drivers as well? Ridesharing companies may want to consider adding improved tipping functionalities or even price adjustments based on the findings from this study. More so, ridesharing companies can greatly leverage these models given the vast amounts of data they have, but at the same time be wary about adding potential noise to the models.
References
[1] A. J. Hawkins, “Nearly two-thirds of Uber customers don’t tip their drivers, study says,” 21 October 2019. [Online]. Available: https://www.theverge.com/2019/10/21/20925109/uber-tipping-riders-drivers-percentage-gender-nber-study. [Accessed 20 August 2020].
