Comparing travel behavior during 2019 and 2020 in the New York City Taxi and Limousine Commission trip records
This mini project was written alongside my Term 3 learning team partner, Jasper Pangan. This was submitted for our Big Data and Cloud Computing class held from September 2020 to January 2021. In this mini project, we were tasked to do an exploratory data analysis on 15 GB worth of data (size is prior to pre-processing). We were required to use Dask, a Python library specifically for wrangling big data.
Objective
Government response has worked double time to balance safety of the people as well as health of the economy. However, the impact of the pandemic still stands to this day: unemployment has surged, businesses and restaurants have shut down, and lives have been lost.
In this paper, we zoom in to see how the pandemic has affected transactions made with New York taxi drivers and compare this to their figures before the pandemic. With New York City being the epicenter of the COVID-19 pandemic, how has travel behavior changed through the course of 2020 compared to pre-pandemic times?
About the data
Data of trips taken by taxis and vehicles in New York City were retrieved from the NYC Taxi and Limousine Commission (TLC) Trip Record Data found in the Registry of Open Data on Amazon Web Services (AWS) S3 bucket.
As of writing this, data available was only from January 1, 2019 to June 30, 2020. This corresponds to 9.36 GB worth of data. After preprocessing, this corresponds to 80,810,133 transactions in 2019 and 15,859,906 transactions in 2020.
Methodology
- Establishment of Dask Cluster: We first set up our Dask scheduler and workers through Amazon Web Services’ Elastic Compute Cloud (AWS EC2) web service, as we will be dealing with \(9.36\) GB of data thus the need for distributed computing. To run the succeeding codes, we have assigned Jojie, the Asian Institute of Management’s supercomputer, as our client.
- Data Extraction: After establishing the connections of our client, scheduler, and workers, we extracted NYC TLC data from the AWS Registry of Open Data [6]. The extracted data covers the period of January 2019 to June 2020 (the most recent data as of this study). To focus our efforts, the scope was trimmed down to Yellow Taxis which refers to the official taxicabs in New York City.
- Data Processing: We then cleaned the data by applying the following: 1) extracting year, month, day, and hour of transaction based on tpep_pickup_datetime; 2) adding a trip count; and 3) removing invalid trips (i.e., trips with invalid zones, no fare, no distance traveled, and no passengers).
- Exploratory Data Analysis (EDA) and Descriptive Analytics: In our EDA, we answered the following questions and researched on supporting facts and evidence in current events:
- How has the pandemic affected the daily number of taxi transactions?
- Comparing 2019 and 2020 travel patterns, where were people coming from and heading to?
- In analyzing 2020 travel patterns, were people traveling to and/or from known COVID-19 hotspots?
- In taxi trips, were people practicing social distancing by limiting the number of people within taxi trips?
- How did payment behavior change due to the pandemic?
- Since New York has implemented a phased reopening, how did this affect mobility of New Yorkers?
Insights
- Consistent with Gov. Andrew Cuomo’s stay-at-home orders issued on March 22, demand for taxis plummeted. However, travel ban for taxis were never issued. Instead, the quick drop in transactions is a demand-driven impact of the stay-at-home orders to the general New York population, as shown in the above graph. Nonetheless, it was too late, as New York quickly became the epicenter of the COVID-19 pandemic during early 2020.
- We found little change in routes and in travel times of New Yorkers, but instead we found that New Yorkers strictly complied with the stay-at-home measures enacted by the government, thus signifying a drop in the volume of taxi transactions.
- People usually travelled alone in pre-pandemic times, but this phenomenon increased during the pandemic. When lockdown measures eased in June 2020, couple passengers gradually increased.
- We also found that cashless transactions were banned by legislators in January 2020, and people are experiencing the impacts of this legislation.
- With the reopening of New York, we are now seeing a small increase in volume of transactions, and this should help NYC taxi drivers.