Friday, 21 September 2018

Kanpur Air Quality Analysis using Yuktix Monitor and ankiDB ™ Software




Access to clean water and air should be our right as citizens. You can buy bottled water but you cannot buy pure air. Bad air is a health tax the society is forced to pay. (There was a Canadian company selling mountain air during Beijing crisis though!). The green activism around air quality suffers on two counts
  • The problem that we have no data
  • The problem that raw data is not action plan.

We touched upon the first problem in a separate blog where we talked about the density of stations, improving coverage for a city and developing hybrid network models where sensors can be deployed in large numbers. Here we want to talk about the second problem, namely what to do with the air quality data.

Kanpur is a city in North India that is synonymous with pollution. It used to be an industrial town and is famous for  leather tanneries.  Unfortunately there are few checks and balances in government and society re. the pollution. One drive along the stretch of Ganges near Unnao can convince you of that fact.  A current survey put Kanpur as the most polluted city in India.  This study was based on readings collected by Air quality stations like the one located in Nehru Nagar which is about 9 Km from Panki Power plant.


The link for accessing real time data from Nehru Nagar CPCB air Quality station is here.

We decided to do our own investigation and put an air pollution meter in Kanpur to capture PM 2.5 and PM 10 data for 3 months. We wanted to reach our own conclusions. We also wanted to see what kind of air quality analysis can be useful.

What is important to understand is that instead of poring over large data sets, most people want
    - visual clues
    - relationships in data set
    - quick comparisons

Now armed with 3 months worth of data , we set out to make ourselves a wish list for analysis. The same data is available as an excel file on request, just drop us a line on support email.

Question is, what can be done to make conclusions jump out of the data? Here, we have few suggestions.

1. The sensor locations on a map can be turned red or green in real time based on a pre defined threshold. This provides a visual clue about the areas needing more action.



2. Pollution time series data can be plotted to identify peaks hours.
3. Pollution data can be sliced by day of week and hours. This will tell us if some days or hours can better be avoided.

We capture PM2.5 and PM10 data every 5 minutes. Here is how it looks on Yuktix Air Quality dashboard. You can see the trends by hours, days and weeks. We took the same data in an excel sheet and went looking for day of week correlation with pollution peaks.





Here is the  excel plot with data bucketed on day of the week.  Here one interesting find is that pollution counter is going off on Thursdays! Do we really have more pollution on Thursdays?



We fire up the python SDK that comes as part of Yuktix ankiDB. Yuktix python SDK allows us to pull data from cache for a range of dates and devices and then we can run them through computation routines with ease.

Suppose if I want to download data for a group of devices between certain dates for my analysis, all I have to do is,

$python cron/cache/report.py --name dump:aq:raw:1 --serial devaq01 --start 20082018 --end 29082018
and viola I have all the data in a file. I can also instruct Yuktix Python SDK to run a series of computations on the data during download.  For example, to get differences between subsequent readings, we can use numpy epdiff1d on data and to filter outliers, we can use numpy to deal with a multi dimensional array. Plugging a new computation routine is as simple as writing a method and registering it with the SDK. for example, here is one computation routine that  runs on air quality devices.




We have code to update the serial routine mappings via the SDK. The SDK stores the map in database tables and our python lookup code can dynamically plug it when data for a device is downloaded. The results of computations are stored for further processing.  One neat analysis that we do is to detect peak hours of pollution. We are using peakutils and numpy to detect the changes and then use plotly to show the peak data on our web GUI.






We saw how can we go from merely collecting data to actually analyzing it and show useful actionable items.  Like, 

- Maps to show where to focus attention
- Peak detection in time can help us locate the source of pollution 
- Comparison of aggregates over time can show the effectiveness of strategies used to deal with pollution

Here is a screencast of Yuktix Air Quality dashboard. We value your feedback. If you have ideas on what can be done to improve data analysis, please drop us a line on support@yuktix.com