If you are planning to contribute to the data wrangling and database management on this project and will need to run the Python script, follow the installation and setup instructions below.
Docker is a platform that allows you to containerize and run applications in isolated environments, making it easier to manage dependencies and ensure consistent deployments. Download the latest version of Docker Desktop for your operating system.
PostgreSQL is an open-source relational database management system. We use it to store and our data. Make sure you have the latest version of PostgreSQL installed on your computer. You can download it here. As part of that setup, you will also need to install PostGIS; this should be done through the setup wizard, as detailed here.
- Navigate to our GitHub repository.
- Create a fork of the repository by clicking the "Fork" button in the top right corner of the page. This will create a copy of the repository in your own GitHub account.
- Clone your fork of the repository to your local machine using
git clone
.
Note: make sure to keep your fork up to date with the original repository by following the instructions here.
- Using powershell, navigate to the directory where PostgreSQL is installed. You can do this with a command like cd
"C:\Program Files\PostgreSQL\13\bin"
(replace 13 with your PostgreSQL version). Run./psql -U postgres
. Enter your computer password. - Run
CREATE DATABASE vacantlotdb;
to create a new database namedvacantlotdb
. - Run
\c vacantlotdb
to connect to the database. - Run
CREATE EXTENSION postgis;
. - Run
\q
to exist PostgreSQL.
- Ensure that PostgreSQL is installed and running on your Mac.
- In your terminal, run
createdb vacantlotdb
. This command directly creates a new database namedvacantlotdb
. - Run
psql -U postgres -d vacantlotdb
. Enter the password when prompted. This command opens the PostgreSQL command line interface and connects you to thevacantlotdb
database. - Run
CREATE EXTENSION postgis;
- Run
\q
to exit the PostgreSQL command line interface.
- Ensure that PostgreSQL is installed and running on your Mac.
- In your terminal, run
createdb vacantlotdb
. This command directly creates a new database namedvacantlotdb
. - Run
psql -d vacantlotdb
. Note that you might need to start postgres on mac:brew services start postgresql
before running psql If prompted, enter the password for your PostgreSQL user. This will open the PostgreSQL command line interface and connect you to thevacantlotdb
database. You will know it’s succeeded when you seevacantlotdb=#
- Run
CREATE EXTENSION postgis;
- To exit the PostgreSQL interface, type
\q
and press Enter.
Note for all OS: Optionally, in /config/config
, set FORCE_RELOAD
= False
to read "cached" data in postgres instead of downloading new data.
Open the command prompt as an admin. Run setx VACANT_LOTS_DB "postgresql://postgres:password@localhost/vacantlotdb"
. Make sure to replace “password” with your user password (not your postgres password). You should get a message saying something like “Success! Specified value was saved.”
In the terminal, open your shell's profile file, like ~/.bashrc
or ~/.bash_profile
, using a text editor. You should be able to do this by running something like nano ~/.bashrc
. Add the following lines at the end of the file:
export VACANT_LOTS_DB="postgresql://postgres:password@localhost/vacantlotdb"
Replace password
with your PostgreSQL user password. Save and close the file. Apply the changes by running source ~/.bashrc
.
In the terminal, open your shell's profile file, such as ~/.zshrc
(for Zsh, which is the default shell on recent versions of macOS) or ~/.bash_profile
(for Bash), using a text editor like Nano or Vim. For instance, nano ~/.zshrc
. Add the following lines at the end of the file
export VACANT_LOTS_DB="postgresql://postgres:password@localhost/vacantlotdb"
Make sure to replace password
with your actual PostgreSQL password. Save and close the file. To apply these changes, run source ~/.zshrc
(or the appropriate file for your shell).
Note for all OS: you can choose to write to local, remote, both, or neither in the settings in config.py
All of the data scripting is in python and lives in the data
folder. Everything below should be run in that folder.
For all three OS, you'll first have to go into the data
subdirectory and open the docker-compose.yml
file. Change the filepath under volumes
to the location of your repository. (Currently it is hardcoded to Brandon's filepath.)
For example, if your repository is located at user/Documents/vacant-lots-proj
, you would change the filepath to user/Documents/vacant-lots-proj/data
. Save and close the file. Alternatively, you can run the image in Docker following the steps below. If needed, it will build (this will take a few minutes). It should only need to build if it's your first time running or if major configuation changes are made. Changes to the python script should not trigger a re-build.
- Make sure Docker is running by opening the Docker Desktop app.
- Open the command prompt. Navigate to the location of the
vacant-lots-proj
repository. Runcd data
and thendocker-compose run vacant-lots-proj
. - When the script is done running, you’ll get a notification. When you’re done, to shut up off the Docker container (which uses memory), run
docker-compose down
.
- In the terminal, navigate to your repository location using
cd path/to/repository
. Then runcd data
to move into thedata
directory. - Run
sudo docker-compose run vacant-lots-proj
. Enter your password if requested. If you run into an error message related to "KEY_ID" or something like similar, you may have to do the following:
- Hard-code your VACANT_LOTS_DB variable in
docker-compose.yml
. - Also in
docker-compose.yml
, addextra_hosts: -"host.docker.internal:host-gateway"
- In your
postgresql.conf
file, setlisten_addresses = '*'
in - In your
pg_hba.conf
file, add the following new lines:host all all 10.0.0.0/24 md5
andhost all postgres 172.18.0.2/32 trust
. You may have to modify these based on your own IP address. - Finally, after restarting postgres, navigate back to the
data
subdirectory in the project and rundocker-compose --verbose up -d
. This should run successfuly; message
The backend also works on WSL Ubuntu running Docker for Linux on Windows 10.
- When you're finished, and you want to shut down the Docker container, run
docker-compose down
.
In the terminal, use the cd
command to navigate to your repository location, and then into the data
directory. Run docker-compose run vacant-lots-proj
. This command starts Docker Compose and sets up your environment as defined in your docker-compose.yml
file. When you're finished and want to shut down the Docker containers, run docker-compose down
.
Changes to our codebase should always address an issue and need to be requested to be merged by submitting a pull request that will be reviewed by at least the team lead or tech lead.
Format all python files by running:
docker-compose run formatter
The map data is converted to the pmtiles format and served from Google Cloud. For access to production credentials, contact the project lead.
You can run the tile build locally with docker-compose run vacant-lots-proj
to create a tile file and upload it to your own GCP bucket. First, create your own GCP account using their free trial. You will need to create the following assets in your GCP account and configure them in the environment variables in docker-compose.yml:
- Under APIs and Services -> Credentials, create an API key and put that in the CLEAN_GREEN_GOOGLE_KEY variable
- Under APIs and Services -> Credentials, create a service account. After you create the service account you will download the service account private key file named like encoded-keyword-ddd-xxx.json. Copy that to ~/.config/gcloud/application_default_credentials.json. This path is specified by default in the volumes section of the docker compose file.
- Go to Cloud storage -> Buckets and create a new bucket. Name it logically, e.g. cleanandgreenphl-{your_initials}. It has to be globally unique. Grant access to at least write to the bucket to your service account. Put your bucket name in the GOOGLE_CLOUD_BUCKET_NAME variable. Make sure the tiles file in your bucket is publicly accessible by following Google's instructions online.
The python script loads the tiles to Google Cloud as vacant_properties_tiles_staging.pmtiles
. You can check this tileset by changing the config setting on the frontend useStagingTiles
to true
. If the tiles look OK, manually change the name in Google Cloud to remove the _staging
and archive the previous copy.
To update streetview images, after running the full data script run:
docker-compose run streetview
The script should only load new images that aren't in the bucket already (new properties added to list).