Complete the Circle
Part 3: PostGIS, AWS, React and Mapbox
In this post we will complete the circle back to the Geographic Daydream post from a few weeks back.
In that post, we were focused on finding the best location for a brand new state of the art Animal Hospital. We used a dataset of existing animal hospitals along with 2019 county population data for all the counties in the United States.
The number of people in the county, divided by the existing hospital count, gives the person_per_hospital count for the county. The higher the count, the more people (and their pets) are served by each hospital in the county. Darker green counties are where the person_per_hospital count is high and are potentially under-served counties and might be good candidates to locate a new animal hospital.
The goal in this post is to build on the idea of hunting geographically for an under-served area. Also, we’ll be leveraging AWS cloud microservices to allow for user interaction and user data to drive the map creation.
We want the user to be able to specify their own point dataset, choose their state of interest, and the app will return a map (and supporting data) similar to the Indiana map above, highlighting potentially under-served counties for whatever dataset the user chooses.
Here is a glimpse of where we are headed. This is a potential map of California counties under-served with regard to Animal Hospitals. The orange points represent existing hospitals in the state.
Before we can proceed, we are going to need a few more tables of data to drive the query.
First, we create a population_county table, which will be populated and joined to within the query. This table holds references to the county’s state and statefp along with the population count for that county.
CREATE TABLE population_county (
id SERIAL,
statefp VARCHAR(2),
state_name VARCHAR(255),
name VARCHAR(255),
type VARCHAR(100),
pop_2019 integer,
PRIMARY KEY (id)
);
Next, we can populate population_county from the terminal with this invocation of psql. The source csv file is in located at /tmp/county_pop_2019.csv.
Here is the psql command to insert the data. We connect to the AWS RDS cloud instance and copy the csv data to the table at the terminal.
$ psql carto_boundaries -U awspostgres -p 5432 -h aws-postgres-postgis.xxx.us-east-1.rds.amazonaws.com -c "\copy population_county FROM '/tmp/county_pop_2019.csv' with (format csv,header true, delimiter ',');"
After population, we can select from the new table, showing that it is indeed populated with 2019 U.S. County population data.
SELECT id, statefp, state_name, name, type, pop_2019
FROM public.population_county;
Lastly, we need county and state geo data tables populated in the cloud.
These tables will drive the geographically-aware queries, powered by PostGIS. Geo data (such as shapefiles or GeoJSON) can be imported into a PostgreSQL / PostGIS database directly with QGIS. For more details on how to do this, check out part 1 of this this blog series.
Once we have county and state borderline data imported, our database has the following tables…
The 3 tables with the prefix, cb_2018_us_ are the tables that hold geographic data for U.S. Congressional Districts, U.S. Counties, and U.S. States, respectively.
The Back End
Now that we have our base data populated, we can return to the Animal Hospitals related SQL from A Geographic Daydream. The following SQL returns U.S. county data, with the number of existing Animal Hospitals per county accumulated (geo_points_count). Also, the population for the county is returned (pop_2019), as well as the calculated persons_per_point. If there are no existing hospitals in the county, the population is returned for the persons_per_point value.
SELECT
county.geom as geom,
county.name,
max(county.countyfp) as countyfp,
max(pop.type) as type,
COUNT(geo_points.geom) as geo_points_count,
MAX(pop.pop_2019) as pop_2019,
CASE
WHEN count(geo_points.geom) = 0 THEN max(pop.pop_2019)
ELSE max(pop.pop_2019) / count(geo_points.geom)
END AS persons_per_point
FROM cb_2018_us_county_20m county
LEFT JOIN
(
SELECT ST_SetSRID(ST_MAKEPOINT(longitude::double precision,
latitude::double precision),4326) as geom
FROM csv_import_xxx
)AS geo_points on ST_WITHIN(geo_points.geom, county.geom)
LEFT JOIN population_county pop on pop.name = county.name
JOIN cb_2018_us_state_20m state on state.statefp = county.statefp
WHERE 1 = 1
AND state.stusps = 'IN'
AND pop.statefp = state.statefp
GROUP BY county.geom, county.name
ORDER BY persons_per_point desc;
There are three PostGIS specific functions in this statement.
First, the ST_WITHIN function allows the counting of the points within each county polygon. This count is returned as geo_points_count.
The ST_MAKEPOINT function creates a point geometry value from longitude and latitude float values, which come from a table named csv_import_xxx. I will explain the creation and population of this table shortly.
Lastly, the ST_SetSRID function ensures that the created points have the correct map projection, identified by the id: 4326.
So, where does the csv_import_xxx table come from?
It is created earlier in the process, in the function behind the API route, getGeoJsonForCsv. The table is populated with the CSV point data that the user specifies. GetGeoJsonForCsv does many actions that depend on the success of previous actions.
Let’s break down the getGeoJsonForCsv function…
- This is where the variables from the client are collected. The client will send data_description, stusps, and csvUrl in the body of a POST request that hits this function.
- This part of the function creates a table to hold user point information. csv_import_current_date is the name of the new table that will be created.
- This is where the connection is made to the database. Then, the fastcsv npm library is used to parse the data at csvUrl.
- After the data from csvUrl is parsed, a couple of validation checks are made on the parsed data.
- This is where the data from the user is translated into SQL insert statements that will be executed next.
6. The CREATE table SQL is executed, then a second alter table SQL statement is executed, which gives ownership for to the table to awspostgres.
7. The user data is actually inserted into a new table.
8. The SQL to return county geo data is assembled and executed. The utils.getSQLFor(“county”, columns) function returns the GeoJSON for the query.
9. The SQL for the points layer is assembled, based on data passed to the function.
10. The returned county and point GeoJSON features are combined.
11. The full set of data is returned to the client.
If we look deeper at the getSqlFor function, we can see that this is where the geo SQL is returned based on the type of data (counties or points).
- The columns are assembled from the csv data, based on the first row at csvURL.
- Based on the type passed in, the SQL for counties or points is returned. Either the SQL for counties is returned, or …
- The SQL for points is returned.
The Front End Map
To display the data from the microservice on the front end, we use a React application with a Mapbox GL based component.
At the top of the Map function, various state variables are set up to support different data for the map, as well as to drive queries made by the user interface.
The useRef hook is used to allow for the Mapbox GL library to control a specific div without any conflicting DOM issues with React. This ref is called mapContainer.
Next, a series of useEffect hook blocks are set up. These blocks “react” to various changes to the state variables. Most of the changes occur when asynchronous data is returned (county and point data).
The queryAPI function is called when the user clicks the button. The usStateChange function is called when the user selects a state from the drop down.
In the return area of the function, the useRef attribute is referenced, allowing Mapbox to control this node in the DOM.
The usStateChange function is called when the state dropdown is changed. The states dropdown is populated by the usStates state array value, which is in turn populated by an asynchronous call from the first useEffect function.
The csvUrl is also stored in state, and is updated whenever the csvUrl input is changed.
When the “Try sample data” button is clicked, values that relate to California state Animal Hospital demonstration data is populated.
Here is the map in action:
Demo Map
Github
Code Sandbox
Under-served county finder code sandbox