Capstone Project — The Battle of Neighborhoods

4 min readDec 12, 2020

Introduction

Business problem

San Vendemiano is a town in the North of Italy. In recent years the city and the metropolitan area around it increased significantly in population, number venues and presence of tourists. Considering that this trend is going to continue in the future, a entrepreneur is contemplating to build a new restaurant in the area. We are going to help him, using a statistical analysis, by selecting the best neighborhood for his new restaurant.

Data

We will need a map of San Vendemiano and its surrounding. In order to build an index of attractiveness for each neighborhood we can use a list of venues provided by Foursquare. In particular a list of already present restaurant.

Data Sources

Foursquare API: list of venues and restaurants, with their coordinates (latitude and longitude) [1];
Openstreetmap: Map of the area [2]

Methodology

Configure the notebook and Data Collection

We need the following packages in python for the job.

In particular folium is used to plot the map, osmium to read and manipulate Openstreetmap files (.osm), geopy.geocoders is used to get the coordinates of San Vendemiano: 45.8907463°N , 12.3338437°E.

import pandas as pd
import requests   # to get urls
import numpy as np
import os
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib as mpl
import matplotlib.pyplot as plt
!pip install folium
import folium
!pip install osmium
import osmium as osm
from geopy.geocoders import Nominatim

As there is no clear definition of neighborhoods in the area we need to come up with our own: we will divide the area into a 10x10 grid where each cell has a size of ∼ 1 Km.

Data Preprocessing: Creating Features

The criteria for choosing the best neighborhood are the following:

good connection with the city and the population, i.e. a residential area is better than a rural area;
high number of venues in the neighborhood, a signal that it can be more attractive;
low number of restaurants, so less competition.

Regarding the first point, we can calculate an index of habitability from the number of highways in each neighborhood; these are provided by the Openstreetmap project [2].

A choropleth map indicating areas with higher number of highways (darker color).

The number of venues and restaurants can be downloaded using the Foursquare API [1]. Off all 226 found venues, 47 of them are some kind of restaurants.

Final Model and Visualization

Our final feature table looks like the following.

Feature table, all the useful data for each neighbor.

We can measure an index of attractiveness using the following formula:

attractiveness=highways × venues,

and order the neighbors from the most attractive to the least attractive.

As we can see neighbor = “33” is the third most attractive one but has zero restaurants. In conclusion, this is probably the best zone to build a new restaurant.

A choropleth map indicating the most attractive areas (darker color). Points are restaurants. White-delimited cell is the best neighborhood.

Results and Discussion

We have found a method to classify the map into neighborhoods of size 1km² . We have assigned to each neighborhood a score, called “attractiveness”, based on presence of highways and venues. By looking at the last table or figure, we can see that generally restaurants are in attractive neighborhood, so this proves that our model works. We can see that the third most attractive one, though, it has zero restaurants.

Conclusion

We can conclude that we have found a model that correlates well between attractiveness of neighborhoods and the presence of restaurants.

To answer our business problem, we can see that the best choice to build a new one would be in the neighbor highlighted in white in the last figure, since it doesn’t already have a restaurant.