Tutorial 3: Data Visualizations#

Binder

This notebook will show you a few basic visualizations you can make with your own observation data.

We’ll do this with Pandas and Altair. Don’t worry if you’re not familiar with those tools, this is just to demonstrate the kinds of things you can do with your data.

[1]:
from datetime import datetime, timedelta

import altair as alt
import pandas as pd
from dateutil.relativedelta import relativedelta
from pyinaturalist import Observation, enable_logging, get_observations, pprint
from rich import print

enable_logging()

Observation data#

We’ll start with all of your own observation data from the last 3 years:

[2]:
# Replace with your own username
USERNAME = 'jkcook'

start_date = datetime.now() - timedelta(365 * 3)
response = get_observations(user_id=USERNAME, d1=start_date, page='all')
my_observations = Observation.from_json_list(response)
pprint(my_observations[:10])
  ID         Taxon ID   Taxon                     Observed on    User     Location          
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  30688807   78444      🌱 Species: Peritoma      Aug 12, 2019   jkcook   Johnston, IA, USA 
                        serrulata (Rocky                                                    
                        Mountain beeplant)                                                  
  30688955   47912      🌱 Species: Asclepias     Aug 12, 2019   jkcook   Johnston, IA, USA 
                        tuberosa (butterfly     
                        milkweed)               
  30689111   60251      🌱 Species: Verbena       Aug 12, 2019   jkcook   Johnston, IA, USA 
                        hastata (blue vervain)                                              
  30689221   121968     🌽 Species: Andropogon    Aug 12, 2019   jkcook   Johnston, IA, USA 
                        gerardi (big bluestem)  
  30689306   121968     🌽 Species: Andropogon    Aug 12, 2019   jkcook   Johnston, IA, USA 
                        gerardi (big bluestem)                                              
  30689425   128701     🌱 Species: Desmanthus    Aug 12, 2019   jkcook   Johnston, IA, USA 
                        illinoensis (Illinois   
                        bundleflower)           
  30689463   121976     🌻 Species: Silphium      Aug 12, 2019   jkcook   Johnston, IA, USA 
                        laciniatum (compass                                                 
                        plant)                                                              
  30689506   136376     🌻 Species: Rudbeckia     Aug 12, 2019   jkcook   Johnston, IA, USA 
                        triloba (Brown-eyed     
                        Susan)                  
  30689603   121976     🌻 Species: Silphium      Aug 12, 2019   jkcook   Johnston, IA, USA 
                        laciniatum (compass                                                 
                        plant)                                                              
  30689780   81594      🌾 Species: Elymus        Aug 12, 2019   jkcook   Johnston, IA, USA 
                        hystrix (bottlebrush    
                        grass)                  

Basic historgam#

Next, let’s make a simple histogram to show your observations over time.

Start by putting your observations into a DataFrame to make them easier to work with:

[3]:
source = pd.DataFrame([{'date': o.observed_on.isoformat()} for o in my_observations])

And then display it as a bar chart:

[4]:
(
    alt.Chart(source)
    .mark_bar()
    .properties(width=700, height=500)
    .encode(
        x='yearmonth(date):T',
        y=alt.Y(
            'count()',
            scale=alt.Scale(type='log'),
            axis=alt.Axis(title='Number of observations'),
        ),
    )
)
alt.Chart(...)
[4]:

Histogram by iconic taxon#

To show a bit more information, let’s break down the observations by category (iconic taxon):

[5]:
source = pd.DataFrame(
    [
        {'date': o.observed_on.isoformat(), 'iconic_taxon': o.taxon.iconic_taxon_name}
        for o in my_observations
    ]
)
(
    alt.Chart(source)
    .mark_bar()
    .properties(width=700, height=500)
    .encode(
        x='yearmonth(date):T',
        y=alt.Y(
            'count()',
            scale=alt.Scale(type='symlog'),
            axis=alt.Axis(title='Number of observations'),
        ),
        color='iconic_taxon',
    )
)
alt.Chart(...)
[5]:

Observation map#

Next, we can show the observations on a map. Note: This example only shows observations in the United States.

First, get the coordinates for all your observations, skipping any that are missing locatino info:

[6]:
source = pd.DataFrame(
    [
        {
            'latitude': o.location[0],
            'longitude': o.location[1],
            'iconic_taxon': o.taxon.iconic_taxon_name,
        }
        for o in my_observations
        if o.location
    ]
)

Then add the base layer. This example uses the us_10m dataset from vega-datasets:

[7]:
from vega_datasets import data

states = alt.topo_feature(data.us_10m.url, feature='states')
background = (
    alt.Chart(states)
    .mark_geoshape(fill='lightgray', stroke='white')
    .properties(width=850, height=500)
    .project('albersUsa')
)

And finally, add your observation locations:

[8]:
points = (
    alt.Chart(source)
    .mark_circle()
    .encode(
        longitude='longitude:Q',
        latitude='latitude:Q',
    )
)

# Show the combined background + points
background + points
alt.LayerChart(...)
[8]: