top of page
  • Writer's pictureSimon Roberts

Applying Cloud Sustainability Principles to Your Non-Cloud Choices

I was reviewing the AWS Well-Architected framework recently, and in particular the Sustainability Pillar. We already do a lot of work with clients around the Cost Optimisation Pillar, which helps eliminate unnecessary costs, which in turn helps with Sustainability (consume less => lower impact), but there are some other things to consider.

So, What is Cloud Sustainability?

The Design principles for sustainability in the cloud describe the following cloud-computing principles:

  • Understand your impact

  • Establish sustainability goals

  • Maximise utilisation

  • Anticipate and adopt new, more efficient hardware and software offerings

  • Use managed services

  • Reduce the downstream impact of your cloud workloads

Reading about this made me consider how I might apply some of those principles to my own (non-computer) life, and the most obvious example is around my energy consumption. After a bit of research, implementing Solar Power Generation at my house seems like a good first step:

  • Reduce greenhouse emissions, carbon emissions and air pollution from existing power generation

  • Reduce strain on supply of non-renewable resources (coal and gas)

  • Reduce personal dependence on the power grid

  • Save money (eventually)

My long-term goal would be to become off-grid capable! So, being the data-driven person that I am, my first step was to look at my historical power consumption, to identify usage patterns, trends etc. FWIW, my house is currently extremely inefficient:

  • it is 130 years old, with poor insulation (none in the walls, and very little in the roof)

  • the kids live in a cottage out the back, which also poorly insulated; as well as having its own water-heater and aircon/heater

  • both my workshop and my wife’s art studio are lightly-insulated metal structures

Spotting a trend here?

Anyway, back to the data. My power distributor has a web interface that shows graphs for historical power usage etc, which is great, but the data only goes back two years, and the graphs are pretty limited. I’d like a way to save (and update) that data locally, as well as perform other kinds of analysis and visualisation.



Looking at the network traffic generated by the website in Chrome, it uses a simple web-service, which fetches usage data from a JSON backend. For example, to fetch a month of data, you provide an offset (number of months back, in this case):

https://energyeasy.ue.com.au/electricityView/period/month/{offset}

I checked their TOS, and there’s nothing that would prevent us from scraping this data, so… here goes.

Login and Data Collection

The first thing we need to do is login to the website with username and password. Like most people, requests is the python module I use for this kind of thing. So, fetch the main page of the website (which creates a session, sets a cookie or two, etc); then we submit the login form. This has just a login_email, login_password, and submit field, so is easy to submit.

Then, we fetch all the pages of data (month by month), until there isn’t any more. Since the data call returns JSON data, and requests has support for this built in, it’s very easy to return a list of data-structures, one per page:

import requests

MAIN_URL = 'https://energyeasy.ue.com.au/electricityView/index'
LOGIN_FORM = 'https://energyeasy.ue.com.au/login_security_check'
MONTH_URL_FORMAT = 'https://energyeasy.ue.com.au/electricityView/period/month/%d'

def fetch_all_data(username, password) -> list[dict]:
    """Fetch all the historical data for energy consumption and return as a list of dicts"""
    # get main page, start session
    s = requests.session()
    s.get(MAIN_URL)

    # login
    s.post(LOGIN_FORM, data={
        'login_email': username,
        'login_password': password,
        'submit': 'Login'
    })

    # get each month of data
    results = []
    for offset in range(100):
        data = s.get(MONTH_URL_FORMAT % offset).json()
        results.append(data)

        if not data['isPreviousPeriodDataAvailable']:
            break

    return results

Converting to Something Usable

See below for an except of the JSON that comes back from the data-source for a particular period (month). Each record in the “peak” section (for example) represents one sample. When a month of data is requested, each sample represents one day (1st of month, 2nd of month, etc).

This JSON has two blocks: one for the selected period, and one for the comparison period (to compare to last-month or last-year etc). For the purpose of this bit of code, we’ll ignore the second block, and just use the selectedPeriod block, which is what is shown below.

{
  "selectedPeriod": {
    "consumptionData": {
      "peak": [
        {
          "total": 40.783,
          "meters": {
            "123456": 40.783
          }
        },
        {
          "total": 15.951,
          "meters": {
            "123456": 15.951
          }
        },
        ...
      ],
      "offpeak": [...],
      "shoulder": [...],
      "generation": [...],
    },
    "costData": {
      "peak": [
        {
          "total": 8.97226,
          "meters": {
            "123456": 8.97226
          }
        },
        {
          "total": 3.50922,
          "meters": {
            "123456": 3.50922
          }
        },
        ...
      ],
      "offpeak": [...],
      "shoulder": [...],
      "generation": [...],
    },
    "periodType": "month",
    "offset": 0,
    "title": "This month",
    "subtitle": "July 2022",
    "estimated": false,
    "validated": true,
    "netConsumption": 1091.887,
    "consumptionDataAvailable": true,
    "consumptionRatePercentages": {
      "peak": 100,
      "offpeak": 0,
      "shoulder": 0
    },
    "averageNetConsumptionPerSubPeriod": 51.99,
    "netConsumptionForUsageComparisonPeriod": 1091.887,
  },

Within that, there are a few sections we’re interested in:

  • consumptionData: contains data about how much power was used (or generated) for each day, for each of peak, offpeak, shoulder rates (with potentially different prices), as well as generation which provides details about the power fed back to the grid.

    • Each of these items has a list of meters (usually one), with a consumption for each, and a total (which this script uses)

  • costData: which contains the $$ cost for the consumption of each category above (prices change over time).

To calculate which day each of the data items represents, we look to another field in the data period.subtitle which contains a string like July 2022 which we can parse into a date, then add one day for every item down the list. With the dates known, we create a series for each of the categories for both consumption and cost data.

Finally, we need to convert each of these eight series (two data sets: consumption + cost, with four categories in each) into a data-frame and return it. Besides the “need” to get the data, this was partly an exercise to learn Pandas:

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Pandas has great docs, including:

  • 10 minutes to pandas

  • Intro to data structures

  • IO tools (text, CSV, HDF5, …)

The code used for this is:

import datetime
import pandas as pd

DATA_SETS = ['consumptionData', 'costData']
CATEGORIES = ['generation', 'offpeak', 'peak', 'shoulder']

def make_dataframe(period: dict) -> pd.DataFrame:
    """given an energyeasy datastructure for a given period, return a dataframe for the data within"""
    if period['periodType'] != 'month':
        raise ValueError('Only support month data for now')

    # build the index for time-series
    period_start = datetime.datetime.strptime(period['subtitle'], '%B %Y')
    index = [period_start + datetime.timedelta(days=n) for n in range(len(period['consumptionData']['peak']))]

    # build the data series for DATA_SETS * CATEGORIES
    data_series = {}
    for dataset in DATA_SETS:
        for cat in CATEGORIES:
            cat_values = [d['total'] for d in period[dataset][cat]]
            data_series[f'{dataset}_{cat}'] = pd.Series(cat_values, index)

    # return the dataframe
    return pd.DataFrame(data_series)

Wrapping it all Up

Finally, we need to build a single DataFrame by combining (and sorting) all the monthly DataFrame.

Once we have that, we can do whatever analysis we want with it, like graphing it, statistical analysis, exporting for another program, etc.

Here’s the code that calls the two functions above, combines the data, saves the DataFrame, renders a simple graph, and exports to Excel. The code below takes the username and password as a couple of environment variables

import matplotlib.pyplot as plt

all_results = fetch_all_data(os.environ['USERNAME'], os.environ['PASSWORD'])

# convert data to a dataframe, sort, and save
dataframes = [make_dataframe(data['selectedPeriod']) for data in all_results]
df = pd.concat(dataframes)
df.sort_index(inplace=True)
df.to_pickle('energyeasy.pkl')

# save data to excel
df.to_excel('energyeasy.xlsx')

# display a graph
df.plot(y='costData_peak', kind='line', title='Power Cost per Day').set_ylabel('$')
plt.show()

Sample Chart from the Excel Data (yes, the Y axis shows both consumption and cost):


Sample graph from the plotter in the code:


See https://matplotlib.org/stable/gallery/index.html for (better) examples of matplotlib in action.

This is obviously just a simple use of the data, and I’ll be extending this over time to support my own use-case. Otherwise, extending this is left as an exercise for the reader!

  • I had a meter swap when I got three-phase power installed - I will be able to compare before and after

  • Once my Solar is feeding back to the grid, and saving me power, I’ll be able to see and graph those numbers

  • Lots of fun with statistics!

Bonus: Prettier Graph

Of course, the default graph from matplotlib is a bit ugly. Add some moving averages, limit the x/y axis a bit, and remove the outer frame with the following code:

    plt.style.use('seaborn-muted')
    fig, ax = plt.subplots(figsize=(20, 5))
    df.plot(y='costData_peak', kind='line', title='Power Cost per Day', ax=ax).set_ylabel('cost per day ($)')
    plt.plot(df['costData_peak'].rolling(7).mean(), label='7-day average')
    plt.plot(df['costData_peak'].rolling(30).mean(), label='30-day average')
    plt.legend()
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    # set axis limits more tightly than default
    not_zero = df.query('costData_peak != 0')
    ax.set_xlim(not_zero.index[0], not_zero.index[-1])
    ax.set_ylim(0)

The source code for the script above is available on my github at https://github.com/lyricnz/energyeasy

Comments


bottom of page