Tutorial 3: Data Visualizations

Binder

This notebook will show you a few basic visualizations you can make with your own observation data.

We’ll do this with Pandas and Altair. Don’t worry if you’re not familiar with those tools, this is just to demonstrate the kinds of things you can do with your data.

import altair as alt
import pandas as pd

from pyinaturalist import iNatClient, pprint

# enable_logging()
client = iNatClient()

Observation data

We’ll start with all of your own observation data:

# Replace with your own username
USERNAME = 'jkcook'
my_observations = client.observations.search(user_id=USERNAME).all()
pprint(my_observations[:5])
                                                                                                                   
  ID         Taxon ID   Taxon                                           Observed on    User     Location           
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 
  30688807   1415100    Cleomella serrulata (Rocky Mountain Beeplant)   Aug 12, 2019   jkcook   Johnston, IA, USA  
  30688955   47912      Asclepias tuberosa (Butterfly Milkweed)         Aug 12, 2019   jkcook   Johnston, IA, USA  
  30689111   60251      Verbena hastata (Blue Vervain)                  Aug 12, 2019   jkcook   Johnston, IA, USA  
  30689221   121968     Andropogon gerardi (Big Bluestem)               Aug 12, 2019   jkcook   Johnston, IA, USA  
  30689306   121968     Andropogon gerardi (Big Bluestem)               Aug 12, 2019   jkcook   Johnston, IA, USA  
                                                                                                                   

Basic historgam

Next, let’s make a simple histogram to show your observations over time.

Start by putting your observations into a DataFrame to make them easier to work with:

source = pd.DataFrame([{'date': o.observed_on.isoformat()} for o in my_observations])

And then display it as a bar chart:

(
    alt.Chart(source)
    .mark_bar()
    .properties(width=700, height=500)
    .encode(
        x='yearmonth(date):T',
        y=alt.Y(
            'count()',
            scale=alt.Scale(type='log'),
            axis=alt.Axis(title='Number of observations'),
        ),
    )
)

Histogram by iconic taxon

To show a bit more information, let’s break down the observations by category (iconic taxon):

source = pd.DataFrame(
    [
        {'date': o.observed_on.isoformat(), 'iconic_taxon': o.taxon.iconic_taxon_name}
        for o in my_observations
    ]
)
(
    alt.Chart(source)
    .mark_bar()
    .properties(width=700, height=500)
    .encode(
        x='yearmonth(date):T',
        y=alt.Y(
            'count()',
            scale=alt.Scale(type='symlog'),
            axis=alt.Axis(title='Number of observations'),
        ),
        color='iconic_taxon',
    )
)

Observation map

Next, we can show the observations on a map. Note: This example only shows observations in the United States.

First, get the coordinates for all your observations, skipping any that are missing locatino info:

source = pd.DataFrame(
    [
        {
            'latitude': o.location[0],
            'longitude': o.location[1],
            'iconic_taxon': o.taxon.iconic_taxon_name,
        }
        for o in my_observations
        if o.location
    ]
)

Then add the base layer. This example uses the us_10m dataset from vega-datasets:

from vega_datasets import data

states = alt.topo_feature(data.us_10m.url, feature='states')
background = (
    alt.Chart(states)
    .mark_geoshape(fill='lightgray', stroke='white')
    .properties(width=850, height=500)
    .project('albersUsa')
)

And finally, add your observation locations:

points = (
    alt.Chart(source)
    .mark_circle()
    .encode(
        longitude='longitude:Q',
        latitude='latitude:Q',
    )
)

# Show the combined background + points
background + points