1) Loading Data and Creating Maps in Madina
Madina, what, why and how?
The main motivation for building Madina, is to provide a free, open source environment for researchers and practitioners in urban planning. chosing python as a programming language was the result of the wide adoption of the language, and the available bodsy of open source packages that can be used to work and analize urban data.
One immediate benifit for having an extensive body of open sorced libraries written in the same language, is the ability to write complete analysis workflows that spans all elements of a typical workflow in a single script, offering immense advantages:
Organization: a complete research workflow written as a script means there is no need for intermediate files for each analysis step, as there is no need to pass the output of a step as input into the next step, likely in a diffferent software. A script-based research project, together with a folder for raw data eliminate the need to store, track, pass, and exchange intermediate results. reducing the chances for errors or mistakes
Non-linear progression: When using fragmented software to carry out an analyiss, a step that depends on a CAD software must pe carried out completely before starting a following step that depends on GIS, which needs to be carried out completely before starting a statistical analysis in Stata. This sequencial process makes it hard to collaborate, manage files, diagnose and detect mistakes, and most importantly, might result in work repitition if a mistake is made in an earlier stage of the process. Using a script that depends on a raw data folder solve this issue, as many steps could be carried out in parallel by multiple people by using synthatic or sample input. When all the steps are completed, it becomes simple to integrate everything in a single script. Re-running the script after fixing a mistake is significiantly less time consuming than having to repeat multiple tasks, multiple times.
Transperancy and Reproducibility: A key advantage for maintaining a single script for a research project, is that every step is explicitly documented. All steps are laid out, and all tools used, their setting, parameters and inputs are documented. This makes it possible for collaborators and the research community at large to inspect the process and help identify any issues. The script, once used in a project with a set of raw input data, could easily be replicated for other urban areas, or for other time periods quickly and easily, once the data is available. Making reserch more effecient, and also making outcomes and results comparable as they come out of an identical process.
Madina aims to provide a collection of tools and functionalites, by implementing commonly used urban planning methodologies. Madina also aims to reduce the effort needed to use multiple open source libraries. Currently, madina makes it seamless to handle spatial data (Through Geopandas), create origin and destination networks (Through NetworkX), run urban network analysis (Through a custom implementation of UNA), visualize results (through Deck.gl) using very few lines of code. All the formatting and data passing between these packages happens through the Zonal object. Madina’s equevelant of a workspace, or a layer management system.
Creating a Zonal object and Loading Data
[1]:
import madina as md
cambridge = md.Zonal()
cambridge is now a Zonal object. Madina’s representation of a workspace. This opject would hold data layers, networks and other data structures needed for urban research workflows. the function describe() gives details about the state of the Zonal object
[2]:
cambridge.describe()
No zonal_layers yet, load a layer using 'load_layer(layer_name, file_path)'
Geographic center: (None, None)
No network graph yet. First, insert a layer that contains network segments (streets, sidewalks, ..) and call create_street_network(layer_name, weight_attribute=None)
Then, insert origins and destinations using 'insert_nodes(label, layer_name, weight_attribute)'
Finally, when done, create a network by calling 'create_street_network()'
We notice that there is no layers yet. We load our first layer by calling the function load_layer. It takes two arguments:
layer_name: a string that represent a name for the layer. Used to identify layers when they are referenced in other functions.file_path: a string, or anything the function
As geopandas’s geodataframe is used internally to represent layers, any file format supported by geopandas could be used here. .shp and .geojson are some of the most widely spatial data formats, are recommended as input files.
[3]:
cambridge.load_layer(
name='sidewalks',
source='Cities/Cambridge/Data/sidewalks.geojson'
)
[4]:
cambridge.describe()
Layer name | Visible | projection | rows | File path
sidewalks | 1 | EPSG:3857 | 170 | Cities/Cambridge/Data/sidewalks.geojson
Geographic center: (-0.014266175861540071, 0.0016269462167978611)
No network graph yet. First, insert a layer that contains network segments (streets, sidewalks, ..) and call create_street_network(layer_name, weight_attribute=None)
Then, insert origins and destinations using 'insert_nodes(label, layer_name, weight_attribute)'
Finally, when done, create a network by calling 'create_street_network()'
Notice that we now have one layer called sidewalks and has 170 rows. An important thing that happens after loading the first layer, is that the default map centering for visualization is calculated, and you can see it as part of cambridge.describe() output. The visualization geographic center is a pair of lattiude and longitude coordinates and could easily be overriden by setting cambridge.geo_center = (24.77, 46.73) for instance. To visualize the workspace, call the function
create_map()
[5]:
cambridge.create_map()
[5]:
This map is produced by Deck.GL - PyDeck, a powerful visualization package by passing the layers contained in the Zonal object together with some default settings. layer data inside Madina is maintained in a GeoDataFrame. a table representation from the python package Geopandas. You can access a layer’s GeoDataFrame:
[6]:
cambridge['sidewalks'].gdf
[6]:
| __Length | __GUID | geometry | |
|---|---|---|---|
| id | |||
| 0 | 53.328770 | 1faf3b03-2e30-44b2-8b28-f84da30193c4 | LINESTRING (-1719.114 147.249, -1718.693 124.0... |
| 1 | 33.137771 | 1956c1b1-6c7b-46c6-be18-630210c0c086 | LINESTRING (-1705.465 177.057, -1714.466 163.1... |
| 2 | 82.471466 | 7a8f2a5b-e209-4b06-9c03-19df15c2e86c | LINESTRING (-1635.552 240.490, -1555.262 259.331) |
| 3 | 20.707448 | 65e6f380-1774-4439-9478-d23c97aa8346 | LINESTRING (-1648.164 226.821, -1647.328 234.0... |
| 4 | 60.851523 | a77163e9-5762-457c-8dda-99b4cfb29da4 | LINESTRING (-1662.239 245.651, -1696.978 295.612) |
| ... | ... | ... | ... |
| 165 | 15.383524 | db2d5c42-357e-4ef4-ac0d-c01414466159 | LINESTRING (-1651.361 256.529, -1662.239 245.651) |
| 166 | 22.520863 | b561650f-19dc-4496-a312-b9ecfebf9fbd | LINESTRING (-1651.361 256.529, -1635.552 240.490) |
| 167 | 21.412938 | 53692ac7-ccd5-4135-a75b-58540acdb01a | LINESTRING (-1680.556 208.784, -1691.978 226.897) |
| 168 | 23.915390 | 0097e461-14b4-4198-9d46-91766699850b | LINESTRING (-1741.957 154.332, -1719.114 147.249) |
| 169 | 17.737146 | d1037688-debc-4dbf-8bf4-82c1f8e643ac | LINESTRING (-1708.689 202.456, -1693.570 193.181) |
170 rows × 3 columns
We’ll learn more about manipulating GeoDataFrames later in this chapter.
Coordinate Reference Systems (CRS) and Projections
When dealing with urban data, the user must be familiar with Coordinate Reference Systems (CRS). Two importat CRS types to know are:
Geographic coordinate systems: The most common projection system. They use a pair of latitude and longitude coordinates in degrees from the equator and the prime meridian. The most recognized geographic coordinate system is World Geodetic System (WGS)
EPSG:4326. This is the coordinate system used in GPS, and in most navigation and mapping software. Geographic Coordinate systems should not be used directly in cartesian distance calculation. Deck.gl, the visualization library used in madina, expects files to be in this CRS, and needed conversions arem handled internally.Projected coordinate system: projected coordinates, are the result of using a map projection to convert the curved surface of the earth into a flat representation. Any projection method entails a loss of accuricy that varies in magnitude based on each map projection and location. It is very important to use a projected coordinate system that works best in the area of interest. Each projected coordinate system is assigned a distance usnit. For instance, the recommended projectied coordinate system for use in Massachussetts is the “Massachusetts State Plane Coordinate System, Mainland Zone meters”
EPSG:26986. Notice that this CRS is in meters, and all data reported in MassGIS is in this CRS. Familiarize yourself with the recommended CRS for use in your area of interest, and try avoiding less accurate, but global CRSs such as “WGS 84 / Pseudo-Mercator”EPSG:3857frequently used by global map providers such as Google Maps, OpenStreetMap, Bing, and ESRI. Geopandas, the package that handles spatial data representation, assumes the data is in a projected CRS, and would report measurements in the same units used in the given CRS.
Due to the variation across potential datasets, Madina would not re-project any layer to insure consistency, it would issue a warning. The user should be responsible to ensure all data layers are in an appropriate CRS before attempting any analysis. Now, load the buildings and subway layers. Notice that it is not strictly necissary to explicitly mention the argument names layer_name and file_path if you list the inputs for load_layer() in the correct order. Always reference
the documentation to ensure the right order of parameters, or explicitly specify parameter names.
[7]:
cambridge.load_layer('buildings', 'Cities/Cambridge/Data/building_entrances.geojson')
cambridge.load_layer('subway', 'Cities/Cambridge/Data/subway.geojson')
cambridge.describe()
Layer name | Visible | projection | rows | File path
sidewalks | 1 | EPSG:3857 | 170 | Cities/Cambridge/Data/sidewalks.geojson
buildings | 1 | EPSG:3857 | 118 | Cities/Cambridge/Data/building_entrances.geojson
subway | 1 | EPSG:3857 | 2 | Cities/Cambridge/Data/subway.geojson
Geographic center: (-0.014266175861540071, 0.0016269462167978611)
No network graph yet. First, insert a layer that contains network segments (streets, sidewalks, ..) and call create_street_network(layer_name, weight_attribute=None)
Then, insert origins and destinations using 'insert_nodes(label, layer_name, weight_attribute)'
Finally, when done, create a network by calling 'create_street_network()'
We notice that we have a buildings layer with 118 building entrances, and a subway layer with 2 subway stations. Lets look at the map:
[8]:
cambridge.create_map()
[8]:
Once each layer is loadedm it gets assigned a random color, which could result in less-than-ideal visuals. if you look at the documentation, you’ll notice that the function create_map() can take three arguments:
layer_list: This parameter takes a list of dictionaries of the form [{…}, {…}, …]. Each dictionary in this list represent a layer. each key:value pair in the dictionary represent a visualization parameter name, and a parameter setting. These parameters are used internally to prepare each layer’s Geodataframe, which is then passed to create a Deck.GL layer with the corresponding settings. This is a list of strings that can be used as dictionary keys, with appropriate value options:layer: the value can be the name of one of the layers contained in theZonalobject. you can get a list of layers by callingcambridge.describe()orcambridge.layers.layersgdf: the value can be a GeoDataFrame object. This allows visualzing data not inside yourZonalobject, or data that had been processed or filtered for instance. We’ll learn more about handling GeoDataFrames in the next section.color: the value can be a list of three numbers between 0 and 255 representing the RGP color. for instance[0, 0. 255]is blue.color_by_attribute: the value can be one of the layer/gdf attributes (i.e. column names). You can get a list of a layer’s column names by callingcambridge['sidewalks'].gdf.columns, or by hovering over any layer’s visualized geometries when callingcambridge.create_map()color_method: there are four coloring methods:single_color: This is the default setting and you don’t need to specify'color_method':'single_color'ifcoloris set. If color is not set, a new random color is assigned.categorical: This coloring method is suitable for categorical data with a few unique values. ifcolornot assigned, each unique value is assigned a random color. You can assign specific coloors to individual unique values by settingcolorto be a dictionary like'color: {‘value_1’: [255, 0, 0], ‘value_2’: [0, 255, 0], ‘value_3’:[0, 0, 255]} tp assign red to all geometries with ‘value_1’, green to all geometries with ‘value_2’ and blue to all geometries with ‘value_3’. You can get a list of unique values inside a layer’s column by callingcambridge['layer_name'].gdf['column_name'].unique()gradient: This coloring method is suitable for numerical data, where the highest value is set to green, and the lowest value is set to red. The scale is gradual and could easily be skewed by extremne value.quantile: This coloring method is suitable for numerical data, where instead of using the numerical value, each entry is assigned its percentile, the highest ranking value is set to green, and the lowest value is set to red. The scale is not sensitive to extremne values, as values are converted into ranked percentiles between 0 and 1. the median value would be yellow.
radius: if the layer/gdf contains points, setting this parameter to a column name would resize points acccording to values of that column. Must be numerical values only.width: if the layer/gdf contains lines/polylines, setting this parameter to a column name would resize line widths acccording to values of that column. Must be numerical values only.opacity: a number between 0 and 1 to indicate the layer/gdf’s opacity level, with 0 meaning fully transperant, and 1 meaning fully opaquetext: setting this to a column name would overlay text annotations on each geometry.
save_as: Maps are not saved by default. if this parameter is set to a file namesave_as='cambridge_map.html', it would save an HTML version of the map.basemap: False by default. if set to True, it would enable Deck.gl’s default base map, currently Carto
This is an example of how to use these visualization settings:
[9]:
cambridge.create_map(
layer_list=[
{
'layer': 'sidewalks',
'color_by_attribute': '__Length',
'color_method': 'quantile'
},
{
'layer': 'buildings',
'color_by_attribute': 'people',
'color_method':'gradient',
'radius': 'people',
'radius_min': 1,
'radius_max': 6,
},
{
'layer': 'subway',
'color': [0, 200, 255],
'text': 'id'
}
],
save_as='cambridge_map.html',
basemap=True
)
[9]:
Manipulating GeoDataFrames
Geopandas is a powerful package and provide +functionalities that rivals those of a typical GIS system, sometimes with more flexibility as many functionalities could incorporate more complex and customized operations. Geopandas over geometric manipulation, set operations and aggrigation functionalities that would come in handy in many urban planning applications.
Most operations in Geopandas create a new dataframe as a result. If you want to manipulate a layer’s dataframe, be sure to assign the result back to the layer.
As an example, we create a new attribute called “building_size”, and set it to small if less than 25 people live in that builing, and large if 25 people or more live in that building. This is a simple operation, the aim is to show the sequence: retrieve - process - assign back for manipulating GeoDataFrames in Madina.
[10]:
# retrieve geodataframe
buildings_gdf = cambridge['buildings'].gdf
# do some processing
buildings_gdf['building_size'] = buildings_gdf['people'].apply(lambda x: 'small' if x < 25 else 'large')
# assign back to layer
cambridge['buildings'].gdf = buildings_gdf
This is a good opprutunity to illusturate setting specified individual colors to each categorical value. When “building_type” is “small”, buildings are colored in red, ‘large’ is assigned blue.
[11]:
cambridge.create_map(
[
{'layer': 'sidewalks', 'color': [100, 100, 100]},
{
'layer': 'buildings',
'color_by_attribute': 'building_size',
'color_method':'categorical',
'color': {'small': [200, 100, 0], 'large': [0, 100, 200]},
'text': 'people'
},
]
)
[11]: