Table of Contents

Intro

Data aggregation and structural collection can be a daunting task in any research project. It is most often an overlooked part of a project as it can be the limiting factor in the workflow. Large scale studies are often done by a team of multiple groups and organizations which needs a considerable amount of time for compiling data from multiple people or source. There are available commercial tools which can aid in data collection and aggregation automatically and seamlessly. ArcGIS 1 coupled with their Android/IOS app called Collector for ArcGIS 2 is a very powerful tool for the task. The app allows multiple users to collect data with their mobile device and push data to a centralized database which natively can be imported into ArcGIS. However for small research groups or non-profit organizations, this may be too costly and not fit their budget

From my experience I found a viable solution which does not require an expensive license. The software tool is from an open-source software by Open Data Kit(ODK)3. The initial cost is $5 per month for a virtual private server (VPS) which has enough storage for a decent size database. If more storage is needed, one can scale their VPS to accommodate easily. Alternatively one can use the Google App Engine which provides a free instance for light useage4. Mind you, there are various ways to tackle this problem. For my solution, I found this approach to be the easiest.

I wish to help other research studies with a cheaper alternative and greatly facilitate their pursuit. I have written this guide to allow anyone to build from the ground up on setting a database and providing a viable way to organize an enterprise. This guide is aimed to initially setup a database for non-profit organization and academic research with a limited budget.

Required Components

Here you must setup a server that stores you data and be accessible through the Internet. Having your own computer machine as a server is a viable solution but it requires more extensive networking knowledge. An alternative solution is to purchase a rent a server from a third party which automatically sets up and quickly gets connected to the Internet. A Virtual Private Server (VPS) are cheap and flexible servers to set up a database up on the cloud. For as little as $5 dollars a month one can have a server with everything they need to start collecting data.

DigitalOcean, Amazon (free trial for a year), Google VPS can work. I suggest anyone of them, but in this guide we will choose to setup from DigitalOcean as its the cheapest available server. I recommend Debian or Ubuntu linux distribution to setup your database. In this guide, we will choose Debian 9 (Stretch).

  • Server
    • Linux (RHEL, Debian or Ubuntu)
    • Windows Server
  • Android phones
  • QGIS

Server Setup

General Review

In the terminal, you will need bash commands to setup the server’s configuration. In this guide, bash commands will be represented like this:

# This is a comment
run this command   

With some lines of code, there will be included comments which will be represented by the #.

You must get an SSH client to access the server for setting up ODK Aggregate. For Linux or Mac, use the native terminal application to access the server. For windows users, I recommended program is PuTTy https://www.putty.org/. Install this program for windows. With PuTTy, open up PuTTy and enter in the IP address and click “Open”. You will enter “root” and the password given by the online service or by the user installation setup. Usually when using VPS services, the IP address and the password for user root information is given by email. The command for linux or mac users, they can use Terminal and use this command:

# SSH with root as user
# Replace [EnterYourIp] with the given IP address
ssh root@[EnterYourIp]   

Preliminary Setup

From a fresh installation, the first prompt when accessing the server will be to change your password. Choose a good strong password for the root username and have it written down. After setting the password, preform any upgrades and install sudo which allows a normal user to have elevated privileges and install nano, a text editor. This may be already installed but some distributions might not have them. To do this, follow these commands.

# Use the apt package manager and update repository
apt update            
# Preform  upgrades
apt upgrade -y        
apt dist-upgrade -y   
# Install the sudo and nano packages
apt install sudo nano 

Alternatively you could the previous steps as one command: apt update && apt upgrade -y && apt install sudo nano

Next, we will set a separate username. The root user is only used to administer the server. Choose any name as you see fit and something you can remember. For our example, for the case of this guide, dummy will be the username.

adduser dummy

You will be prompted to create a password. When asked for name and other user information, hit “Enter” to leave it blank. This does not need to be filled out.

Here, we will also assign this new user as a sudo user. This will enable the new user to have elevated privileges. This will allow to run command lines with root access if “sudo” is run before the line of command. Be cautious with commands with sudo.

# Add the user dummy to sudo group
usermod -a -G sudo dummy 

Its highly advisable to setup a RSA key pair for more secure login setup. This prevents unauthorized access to the server and prevents the basic brute-force attacks. Here are the steps necessary to secure the server against malicious attacks. We will also setup a firewall called ufw (Uncomplicated Firewall) which is very simple to setup and secure your server. We will also generate a key-pair for this user. You can have a passphrase which is an additional layer of security, or leave it blank. When generated, the public key will be located at . Follow this command:

# Switch out from root and change the user to dummy
su dummy             
# Install uncomplicated firewall
sudo apt install ufw 
# Allow SSH ports, aggregate and PostGreSQL ports
sudo ufw allow 22    
sudo ufw allow 8080  
sudo ufw allow 5432  
# Generate the keys for SSH
ssh-keygen           

Next, we will move the generate public key to the authorized key folder. We will also limit the permissions of the authorized keys for only the user dummy to access, install putty tools and use it to convert the private key for windows client to use:

sudo mv ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
sudo apt install putty-tools
cd ~/.ssh
puttygen id_rsa -o id_rsa.ppk

Next, you have to retrieve the converted key file from the server to your own local machine. With windows, I recommend using WinSCP for secure file transfer https://winscp.net/eng/index.php. When prompted, enter the IP address in host name, port 22, and enter the new username and password (dummy). In WinSCP, hidden files are not shown by default. Hit Ctrl+Alt+H to show hidden files. On the right side panel, you will find the servers directory. The left side is the local computer’s directory. Here we can transfer files between the machines securely. On the right side, locate .ssh folder and find the id_rsa.ppk file and download it to your document folder. Simply double-click the file to download it to your local machine.

This file can be pre-loaded int PuTTy. Double click the id_rsa.ppk file, open up PuTTy and type the IP address. Entering in the username will automatically log in without password as the key authenticates you.

Next, we will disable login by password which only allows users with the key file to access the server. This secures the server as it prevents malicious bots from brute-force attacks:

sudo nano /etc/ssh/sshd_config

The command will open up nano text editor. Scroll down the config file by pressing the down arrow key. Find the line that contains

PermitRootLogin yes

Change it to:

PermitRootLogin no

Scroll all the way to the end and find the line containing:

PasswordAuthentication yes

And change it to:

PasswordAuthenticion no

Make sure there is no # in the beginning of the changed lines. Exit nano text editor by pressing “Ctrl+X”. Press “Y” to save file and hit “Enter” to write file.

Next restart ssh:

sudo service ssh restart

ODK Aggregate

Next, you will install ODK Aggregate. ODK Aggregate is a Java web applet that handles the data and stores into a database. The web applet is hosted as a website and can be accessed through any web browser. On the host machine, we will need to setup tomcat server that hosts ODK Aggregate along with an SQL server. PostgreSQL is used as we can also extend this to work with spatial data and be accessible from QGIS. As of writing this guide, ODK Aggregate is version 1.5.0. You may need to find the current version. Go to https://github.com/opendatakit/aggregate. If the version is different than 1.5.0, change the link following after the “wget” command.

sudo apt install tomcat8 unzip
wget https://github.com/opendatakit/aggregate/releases/download/v1.5.0/ODK-Aggregate-v1.5.0-Linux-x64.run.zip
unzip ODK-Aggregate*.zip
chmod +x ODK-Aggregate*.run
./ODK-Aggregate*.run

You will see a prompt when the ODK Aggregate Setup Wizard appears. Follow these steps:

  1. Read the license. Hit “Enter” to scroll through the document. Type “Y” at the end of the document to accept the license terms.

  2. Type to create a folder where the necessary files will be created.

  3. Type “3” to setup a PostgreSQL platform setup

  4. Type “Y” to downloaded

  5. Type “N” for ssl

  6. Type “1” for no ssl

  7. Type “Y” for port config

  8. Hit “Enter” to default 8080

  9. Type the IP address of the server.

  10. Hit “Y” for PostgreSQL

  11. Hit “Enter” to set the default PostgreSQL port to 5432

  12. Hit “Enter” for default username odk_user

  13. Make up a password for this user login

  14. It will ask the name of the database odk_prod

  15. Then it will ask to setup a name for your schema. odk_prod

  16. The “ODK Aggregate Instance Name” is used as a display for your users. This can typically be set for a project name or title. For our purpose, it will be set to demo.

  17. Create a super user name, most likely whomever is managing the data. We will set ours to be called super.

Next, you must setup PostgreSQL:

echo deb http://apt.postgresql.org/pub/repos/apt/stretch-pgdg main | sudo tee -a /etc/apt/sources.list
wget --quiet -O https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt update
sudo apt install postgresql-9.4

Next you need to create postgres as a username and setup a password. Here you will be switching into psql command prompt to set things up. Once you are done, you will exit psql by typing $\backslash$q which will lead into the regular bash home directory. Follow these command:

sudo -u postgres psql postgres 

This will log you into psql command prompt.

  \password postgres
psql
  \q

Now you must set up the configuration of PostgreSQL to allow remote connections. You will edit the file and the :

sudo nano /etc/postgresql/9.4/main/pg_hba.conf

Change the line that has this:

local   all    all      peer

To this:

local   all    all      md5

Also add this line at the very end of the file:

host   all     all      0.0.0.0/0 md5

Then exit by pressing “Ctrl+X”, then hit “Y” to confirm, then hit “Enter”. Lets edit the other file:

sudo nano /etc/postgresql/9.4/main/postgresql.conf

Find this line:

#listen_addresses = 'localhost'

Replace it with this, note the removal of # at the beginning of the line:

listen_addresses = '*'

Now you will run the SQL configuration file generated from ODK Aggregate to setup the database in PostgreSQL. This will create the user odk_user and the database and schema odk_prod. Once this is done, you will restart PostgreSQL and have the database online.

sudo -u postgres psql postgres

Now you are in psql. Change directory and run the setup

    \cd '/home/dummy/ODK/ODK\ Aggregate'
        \i create_db_and_user.sql
        \q
    $  sudo systemctl restart postgresql

ODK Aggregate run file from the previous step generated a .war file that needs to be copied to apache tomcat webapps folder. With tomcat8 installed on debian, the webapps folder is located at .

cd $HOME/ODK/ODK\ Aggregate/
sudo cp ODKAggregate.war /var/lib/tomcat8/webapps
sudo service apache restart

General workflow

Now you should have the ODK Aggregate server running. You can simply access the server by going to your web browser on your local computer and access it by replacing your given ip address of the server in this form: http://xxx.xxx.xxx.xxx:8080/ODKAggregate Log into your super user login and click “Site Admin”. Here you can add more users to administer the site or be a data collector. Site administrators have the ability to create more users and have all other power as a data collector. For a simple user who is just collecting data, you can have them set as “Data Collector” and not be able to change any data that has already been collected. Next, we must create a form for the data collectors to use.

Before you start collecting data, you must set up a predefined form which consists of variables of interest. The ODK Collect app can collect spatial data in a form of GPS points, path of coordinates or trace a spatial polygon. ODK Aggregate can accept .xml files which needs to be created. The ODK Suite provides an online tool to create a desired form https://build.opendatakit.org/. When you first go to the site, you will be asked to sign up for a login. You do not have to create an account and can simply click cancel.

Here you can design your survey form. On the top-right hand corner, you must name this form. This will be displayed for the user, so choose something that pertains to what kind of data is being collected. Next, you will choose what kind of data are going to be collected. The most important are usually the date/time and GPS coordinates. On the bottom, there is an option to select location. When selected, a configuration window will appear on your right-hand side. For the name, name it location. You can have it configured to be a single point, a path which creates a polyline, or a shape which creates a polygon shape. For the date and time, you can configure the full date and time or just the date. Depending on your project, you can select a variety of data ranging from audio or video data or a design questionnaire. For audio or video there is a media option. You can collect audio, video and picture or have all three options if you wish. The questionnaires can be designed by clicking the select multiple option. Here you can have options for the data collector to choose from. This can be configured to have a follow-up questions depending on what options the data collector initially selects. More information can be found by clicking help for more advance configurations.

Once you have a satisfied form, you must save the form as an .xml file. On the top-right hand side, click File and select Export to XML. Once saved, head over you your ODK Aggregate website and click on “Form Management”. Click “Add New Form” and select the .xml file to upload to the server.

Next, the data collectors need to have the ODK Collect app to start collecting data. Android users can download the ODK Collect app either through the Google Play store or directly download the .apk file from https://github.com/opendatakit/collect/releases/tag/v1.15.1. Unfortunately there is no app for the IOS platform.

First, you must set the settings. Go to the “general settings” and tap on “server” settings. For the option “Type” choose the ODK Aggregate option. The URL will be the the same as the ODK Aggregate web address. Username and password will be the users that are created from ODK Aggregate website, or the super user. Head back to the main menu and click on “Get Blank Form”. This will download the forms you have uploaded to the ODK Aggregate server. The app can collect data offline and send the filled forms once connected online. For each entry of data, tap the “Fill Blank Form” option and select the appropriate form. Here you will start filling out the form. Once completed, you will save the form offline. If you are connected to the Internet, you will have to upload the filled forms to the ODK Aggregate server. To do so, head to the main menu and select the “Send Finalized Form” option and simply tap on the form that is saved. This will now upload the data to your server.

Conclusion

The utility from using ODK tools is beneficial for large scale studies, especially handing with geospatial data. ODK Aggregate is very flexible in working with other forms of applications. With PostgreSQL database, the stored data can be accessed either throught the ODK Aggregate webserver or by connecting to the PostgreSQL database connection. QGIS ,ArcGIS Desktop or Microsoft Access can connect to this database and pull data. For more advance configuration, see the ODK documentation: https://docs.opendatakit.org/. You may want to consider registering a domain name which allows you to enter a human-recognizable address instead of an IP address.

  1. ESRI 2017. ArcGIS Desktop: Release 10.5.1 Redlands, CA: Environmental Systems Research Institute. 

  2. https://www.esri.com/products/collector-for-arcgis 

  3. Open Data Kit 2.0: Expanding and Refining Information Services for Developing Regions http://www.hotmobile.org/2013/papers/full/2.pdf Waylon Brunette, Mitchell Sundt, Nicola Dell, Rohit Chaudhri, Nathan Breit, Gaetano Borriello In HotMobile, 2013. http://dl.acm.org/citation.cfm?id=2444790 

  4. https://docs.opendatakit.org/aggregate-app-engine/