Friday, 21 September 2018

Kanpur Air Quality Analysis using Yuktix Monitor and ankiDB ™ Software

Access to clean water and air should be our right as citizens. You can buy bottled water but you cannot buy pure air. Bad air is a health tax the society is forced to pay. (There was a Canadian company selling mountain air during Beijing crisis though!). The green activism around air quality suffers on two counts
  • The problem that we have no data
  • The problem that raw data is not action plan.

We touched upon the first problem in a separate blog where we talked about the density of stations, improving coverage for a city and developing hybrid network models where sensors can be deployed in large numbers. Here we want to talk about the second problem, namely what to do with the air quality data.

Kanpur is a city in North India that is synonymous with pollution. It used to be an industrial town and is famous for  leather tanneries.  Unfortunately there are few checks and balances in government and society re. the pollution. One drive along the stretch of Ganges near Unnao can convince you of that fact.  A current survey put Kanpur as the most polluted city in India.  This study was based on readings collected by Air quality stations like the one located in Nehru Nagar which is about 9 Km from Panki Power plant.

The link for accessing real time data from Nehru Nagar CPCB air Quality station is here.

We decided to do our own investigation and put an air pollution meter in Kanpur to capture PM 2.5 and PM 10 data for 3 months. We wanted to reach our own conclusions. We also wanted to see what kind of air quality analysis can be useful.

What is important to understand is that instead of poring over large data sets, most people want
    - visual clues
    - relationships in data set
    - quick comparisons

Now armed with 3 months worth of data , we set out to make ourselves a wish list for analysis. The same data is available as an excel file on request, just drop us a line on support email.

Question is, what can be done to make conclusions jump out of the data? Here, we have few suggestions.

1. The sensor locations on a map can be turned red or green in real time based on a pre defined threshold. This provides a visual clue about the areas needing more action.

2. Pollution time series data can be plotted to identify peaks hours.
3. Pollution data can be sliced by day of week and hours. This will tell us if some days or hours can better be avoided.

We capture PM2.5 and PM10 data every 5 minutes. Here is how it looks on Yuktix Air Quality dashboard. You can see the trends by hours, days and weeks. We took the same data in an excel sheet and went looking for day of week correlation with pollution peaks.

Here is the  excel plot with data bucketed on day of the week.  Here one interesting find is that pollution counter is going off on Thursdays! Do we really have more pollution on Thursdays?

We fire up the python SDK that comes as part of Yuktix ankiDB. Yuktix python SDK allows us to pull data from cache for a range of dates and devices and then we can run them through computation routines with ease.

Suppose if I want to download data for a group of devices between certain dates for my analysis, all I have to do is,

$python cron/cache/ --name dump:aq:raw:1 --serial devaq01 --start 20082018 --end 29082018
and viola I have all the data in a file. I can also instruct Yuktix Python SDK to run a series of computations on the data during download.  For example, to get differences between subsequent readings, we can use numpy epdiff1d on data and to filter outliers, we can use numpy to deal with a multi dimensional array. Plugging a new computation routine is as simple as writing a method and registering it with the SDK. for example, here is one computation routine that  runs on air quality devices.

We have code to update the serial routine mappings via the SDK. The SDK stores the map in database tables and our python lookup code can dynamically plug it when data for a device is downloaded. The results of computations are stored for further processing.  One neat analysis that we do is to detect peak hours of pollution. We are using peakutils and numpy to detect the changes and then use plotly to show the peak data on our web GUI.

We saw how can we go from merely collecting data to actually analyzing it and show useful actionable items.  Like, 

- Maps to show where to focus attention
- Peak detection in time can help us locate the source of pollution 
- Comparison of aggregates over time can show the effectiveness of strategies used to deal with pollution

Here is a screencast of Yuktix Air Quality dashboard. We value your feedback. If you have ideas on what can be done to improve data analysis, please drop us a line on

Tuesday, 11 September 2018

From Sensors to cloud - Digitizing Agriculture Research

"How do you get data for experiments"? I asked this question to one of the scientist working in an Agriculture institute. We do measurements twice a day, morning 8'0 clock and afternoon 2 pm, pat came the reply. Two data points for temperature and humidity. Wind speed is a counter and wind direction is inspected manually. This same data is used by researchers developing predictive models at their center. Then I made a suggestion. what if we automate the data collection process for not just above 4 variables but 10 and increase the data points to 300 per day? Now it was his turn to get excited. "It would be awesome! I can then focus more on developing models. I can as well do better modeling."

This conversation fits the same pattern whenever I talk to researchers. The reasons can be easy to see. Recently, we came in touch with one of the scientists working in the environmental science field at Kerala University, Trivandrum. After our discussions, the requirements were similar. Can we automate the process of data collection and analysis?

I have just come back from installing a Yuktix solar powered research grade weather station in their department premises with following sensors.
1. Temperature 2. Humidity 3. Pressure 4. Rainfall 5. Wind speed 6. Wind direction 7. Soil Moisture (VWC) 8. Soil temperature 9. Solar Radiation 10.Leaf Wetness 11. Digital Pan Evaporation meter

As part of our Yuktix software bundle they get 1. Access to real time data from anywhere, anytime 2. Downloading raw data between two dates 3. Downloading aggregate data between 2 dates (minimum, maximum and average etc.) 4. Daily, weekly and monthly reports on email with attached .csv file 5. Alert rules for variables and notification via SMS and Email. 6. Sunshine hours calculation, ETO reports, Peak detection analysis. 7. Add custom reports (on-demand feature). 8. Outliers filtering

A LCD TV was also installed in the reception area to show data and reports from the weather station. Below is the screencast.

Yuktix Solar Powered Automatic Weather station

Yuktix Weather station is powered by Yuktix wireless sensing platform. Data collected by Yuktix Weather Station is pushed to Yuktix cloud. The cloud makes this data available for real time access and monitoring. Then Yuktix ankidb can access the same data via API and run statistical and other calculations to prepare reports for your subscriber base. Yuktix AnkiDB can (1) Receive data using Yuktix cloud API, (2) Run outlier filtering to remove the sensor errors, (3) Store the data in a time series data store, (4) Run computation modules for devices, (5) Generate reports for subscribers, (6) Send notifications to the users (7) Display the data on our web GUI (8) Make the data available for further integration using REST API.
In case if you want a research grade weather station installed in your institute, please contact team Yuktix at +91-8884315300 or drop us a line on