Data Science: Placement of traffic enforcement cameras by the correlation of red light violations with crime data
By Karuna Kumar
Traffic enforcement cameras, also known as red light cameras or road safety cameras, are devices installed on roads (particularly at intersections) or in enforcement vehicles to detect traffic regulation violations, such as speeding and red light violations. They are usually linked to an automated ticketing system that works in tandem with the latest automatic number plate recognition system.
In metropolitan centers, traffic enforcement cameras are mounted throughout the city due to the presence of immense road networks and the high population density. But, due to the existence of widespread diversity in big cities, traffic enforcement efforts may not always be effective to their full potential. Some sections of a city may require more traffic supervision whereas some neighborhoods might have little to no prevalence of traffic violations. Therefore, there is a need for the placement of traffic enforcement cameras depending on where they are most needed. But, this requirement is yet to be met in most cities as traffic enforcement is ineffectively concntrated throughout different neighborhoods.
For this research paper, we used data science to evaluate the factors causing traffic regulation violations by using Spearman rank-order correlation, so that traffic enforcement cameras can be placed appropriately thus ensuring more focused efforts for crime management and control.
Data science is an interdisciplinary field concerning processes and systems. It employs techniques and theories drawn from many fields including mathematics, statistics, operations research, information science, and computer science to extract insights from data.
Data science can be used to determine the correlation between variables, that is, to detect and measure the interdependency of different variables or quantities. The measurement of interdependency between two different variables is called the correlation coefficient. A correlation coefficient measures the extent to which two variables tend to change together. It describes both the strength and the direction of the relationship and is a number between 1 and ?1 calculated so as to represent the interdependence of two variables or sets of data.
The two main analytical methods used to measure correlation are Pearson product moment correlation and Spearman rank-order correlation. Pearson product moment correlation is used to measure the linear interdependency between two variables, whereas Spearman rank-order correlation measures the monotonic interdependency between two variables. In a monotonic relationship, the variables tend to change together, but not necessarily at a constant rate, that is, it doesn’t have to be a linear relationship. Pearson product moment correlation expects a linear relationship whereas Spearman rank-order correlation is more flexible in that it considers any monotonic relationship to be sufficient. Therefore, in most cases, including our research project, unless we are sure that the relationship between two variables is linear, it is more convenient to use Spearman rank-order correlation.
My objective was to utilize data science to analyze crime & traffic patterns and develop innovative solutions for managing them. In this project, we asked the following questions:
- Are traffic violations and crime related?
- If so, how are they related?
- Based on this information, can we make a prediction for where traffic police should focus more of their efforts?
We will arrive at the answers to these questions by employing different methods and tools of data science on real world data that is relevant and accessible.
Tools and Data
To carry out this activity, we will use a collection of data from the Chicago Police Department regarding the occurrence of traffic violations and various crimes throughout the city of Chicago.
We will transfer this data to Microsoft Excel (as depicted in Figures 3 and 4) so that we can then use the statistical software, ‘R’, to process our data, derive insights and arrive at conclusions that answer the questions we have proposed to the extent possible.
We have data for the red light violations in the City of Chicago from July 2014 to August 2016 (https://data.cityofchicago.org/Transportation/Red-Light-Camera-Violations/spqx-js37) as shown in Figure 1. The key elements in this data set are:
- the intersection at which the violation happened
- the identification number of the camera that recorded the violation
- the address at which the camera is installed
- the date of the violation
- the number of violations that happened on the specific day
- the x and y coordinates of the location of the camera as per the map of the City of Chicago
- latitude and longitude of the camera
- location of the camera consisting of latitude and longitude.
We also have crime data for the City of Chicago for the years of 2014 and 2015 (https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2) as shown in Figure 2. The key elements in this data set are:
- the identification number of the record of the crime
- the case number
- the date and time at which the crime was recorded
- the block at which the crime occurred
- the Illinois Uniform Crime Reporting (IUCR) number
- the primary type under which the crime falls
- a description of the crime
- a location description
- whether or not it involved arrest
- whether or not the crime was domestic
- the police beat
- the police district
- the ward
- the community area
- the FBI code
- the x and y coordinates of the location of the violation as per the map of the City of Chicago
- when the data was lasted updated
- the latitude and longitude of the intersection where the violation occurred
- the location of the violation consisting of latitude and longitude.
The key linkage between the two files was location by latitude and longitude which we mapped into zip codes by using the rgdal library in ‘R’. Since the year of 2015 is common to the time frames of both data sets, we focused on the red light violations and crimes that occurred only in the year of 2015. So to make the data more convenient to analyze, we first removed all the records for red light violations in 2014 and 2016, and 90,368 records remained. Then, we removed crime data records from 2014 and 2016, and about 260,000 records remained. Next, we grouped the red light violations and crimes by zip code and removed the records that did not have latitude and longitude. The final, processed file had 78 zip codes in the Greater Chicago Metropolitan Area, along with the crime and traffic camera violation statistics for the year 2015. For correlation analysis, we also removed the records for the zip codes that did not have any traffic cameras. This left a total of 48 zip codes that we used for correlation analysis.
We analyzed the data in ‘R’ as in Figure 5. After filtering out the data we didn’t require, we calculated the Spearman rank-order correlation coefficient between the red light traffic violations and various types of crimes in different zip codes in Chicago.
Finally, we could see that the total number of crimes was correlated to red light traffic violation by a coefficient of 0.49, but there were significantly higher correlation coefficients for specific crimes. For example, if we were to consider only the crime types correlated to red light violations by a coefficient of 0.5 or higher, we would have:
|Crime Type||Spearman Correlation Coefficient|
|Offense involving children||0.53|
|Motor vehicle theft||0.64|
We can therefore see that areas having higher prevalence of motor vehicle theft are most prone to red light traffic violations, which makes sense because thieves would tend to drive stolen vehicles with abnormal levels of urgency. Then at 0.57, we have burglary which also seems legitimate. So, we have answered the first two of our questions; traffic violations and crime are related, and they are related by the derived coefficients. To answer our third question, we went back to our excel document. We then filtered out the zip codes with traffic enforcement cameras installed. We saw that there were 30 remaining records of zip codes where there are no traffic enforcement cameras. When we look through the data, we see that some of these zip codes with no traffic enforcement cameras, for example 60615 and 60653, actually have very high numbers of motor vehicle theft and burglary (as well as offense involving children) which we can see in Figure 6. This brought us to the conclusion, that the Chicago Police Department should install more traffic enforcement cameras in zip codes like these where the indicating factors seem very strong, to ensure more effective efforts in traffic management & control.
In data science, insights are the understandings that we gain from the analytical method that we carry out. So, what did the data tell us? We discovered, after much data processing, that certain categories of crimes were more indicative of the probability of the occurrence of red light traffic violations in an area. Then, after further processing and analyzing, we discovered that there were particular zip codes where these types of crimes were significantly frequent but there were no traffic enforcement cameras yet installed.
So from the insights we have gained, we can reach the inference that the Chicago Police Department should pay more attention to areas in which motor vehicle theft, burglary, offense involving children, arson, criminal damage, sex offense, and narcotics, in that order, are most common so that they can focus their attempts to prevent traffic violations appropriately. Two examples of areas where traffic enforcement cameras could efficaciously be installed are the zip codes of 60615 and 60653.
Recommendations for Further Work in the Future
If we were to repeat this project, it would be a good idea to use smaller sections of cities as the units of the location variable as this would yield more refined insights from the data. The zip codes are very coarse resulting in the loss of context. Using more contained and specific divisions of Chicago definitely would have given much higher correlation coefficients. So, if carrying out this project in the future, I would recommend using blocks or intersections instead, that is, if the required data exists. I would also suggest considering more variables such as population density, income levels, or demographics.
“Are Traffic Enforcement Cameras worth the Effort?” The Christian Science Monitor. The Christian Science Monitor, 21 July 2016. Web. 07 Oct. 2016.
“Red Light Camera Violations | City of Chicago | Data Portal.” Chicago. N.p., n.d. Web. 07 Oct. 2016.
“Crimes – 2001 to Present | City of Chicago | Data Portal.” Chicago. N.p., n.d. Web. 07 Oct. 2016.
“A Comparison of the Pearson and Spearman Correlation Methods.” – Minitab Express. N.p., n.d. Web. 07 Oct. 2016.