Over the last few weeks I’ve been using Python to scrape the Pennsylvania Department of Health’s Coronavirus page. Over time the page has evolved and even split into sub-pages which contain table date of cases, deaths and other statistics.

I’ve decided to put my Python script on GitHub for public consumption. Initially when I had created the script, it was used to send me alerts when the reported numbers changed as there was no set time during the day that the website was updated, so I wanted to set up a cron job to check the website and alert me of new updates.

Below you’ll find the main script. Note that the code below may be out of date, so please check my GitHub repository for the latest. The code changes almost daily as it seems Pennsylvania Dept. of Health changes the structure, or adds new data, to the webpage which throws off my code.

Python Script

import pandas as pd
from bs4 import BeautifulSoup
import requests
import os
import re

url = r'https://www.health.pa.gov/topics/disease/coronavirus/Pages/Cases.aspx'
html_content = requests.get(url).text
lastupdatedfile = "lastupdated.txt"
soup = BeautifulSoup(html_content, "lxml")
stats = soup.find("span",attrs={"class": "ms-rteStyle-Quote"})
#TODO: byte / string issue here on updatecheck
updatecheck = stats.text[stats.text.find("at "):][3:]

if os.path.isfile(lastupdatedfile):
	lastupdate = open(lastupdatedfile).read()
	if lastupdate == updatecheck:
		print("Skipping check, no new update.")
		os._exit(0)
	else:
		print("***UPDATE***\nOld: {}".format(lastupdate))
		print("New: {}".format(updatecheck))

tables = pd.read_html(html_content, header=0)
df = tables[3]

totalCounties = 67

print("Pennsylvania Data ({})".format(updatecheck))

deathsTotal = int(df["Deaths"].sum())
casesTotal  = int(df["Number of Cases"].sum())
mortalityPercent = round((deathsTotal / casesTotal) * 100,2)
reportingTotal = int(df["County"].count())
reportingCases = df["Number of Cases"]
reportingCasesPct = round((reportingCases.count() / totalCounties) * 100,2)
reportingDeathsObj = df.apply(lambda x: True if x['Deaths'] > 0 else False, axis=1)
reportingDeaths = len(reportingDeathsObj[reportingDeathsObj == True].index)
reportingDeathsPct = round((reportingDeaths / reportingTotal) * 100,2)

print("Total Cases: {}".format(casesTotal))
print("Total Deaths: {}".format(deathsTotal))
print("Mortality Rate(%): {}".format(mortalityPercent))
print("Counties Reporting Cases: {}".format(reportingCases.count()))
print("Counties Reporting Cases(%): {}".format(reportingCasesPct))
print("Counties Reporting Deaths: {}".format(reportingDeaths))
print("Counties Reporting Deaths(% of counties reporting cases): {}".format(reportingDeathsPct))


f=open(lastupdatedfile,"w")
f.write(updatecheck)
f.close()
# Add some notification stuff here...

Output Example

Pennsylvania Data (12:00 p.m. on 3/28/2020)
Total Cases: 2751
Total Deaths: 34
Mortality Rate(%): 1.24
Counties Reporting Cases: 56
Counties Reporting Cases(%): 83.58
Counties Reporting Deaths: 13
Counties Reporting Deaths(% of counties reporting cases): 23.21

#covid-19, #python

A quick Powershell script I hacked together that will enumerate all Active Directory users, and build an XML file for an application I wrote that generates user reports for a specific application.

The output file format is similar to:

<quickreports>
  <report name=Organizational Unit>
  <database name=v001/>
  <usrlogin name=jdoe01/>
</report>
<quickreports>
$ErrorActionPreference= 'silentlycontinue'
$arr=@{}

foreach ($usr in Get-ADUser -Filter *  | select samaccountname) {

$user = get-aduser -identity $usr.samaccountname -ErrorAction SilentlyContinue
$userou = (($user.DistinguishedName -split =,2)[-1].split(,)[1] -split =,2)[-1]

$key = $userou
$value = $usr.samaccountname

if ($arr.ContainsKey($userou)) {
$arr[$userou] += $usr.samaccountname 
} else {
$arr[$userou] = @()
$arr.add($key,$value)
}

}
$foo = <?xml version=1.0 encoding=utf-8?>`r`n
$foo += <quickreports>`r`n

$foo += foreach ($ou in $arr.keys) {

    write-output   <report name=$ou>`r`n

    foreach ($u in $arr[$ou]) {
                if ($u.contains(01)) {
                    write-output <database name=v001/>`r`n
                } elseif ($u.contains(04)) {
                    write-output <database name=v004/>`r`n
                } else { 
                    write-output <database name=/>`r`n
                }
        write-output <usrlogin name=$u/>`r`n
}
    write-output </report>`r`n
}

$foo += </quickreports>

$foo | out-file quickreport.xml

# for some reason the outputted file was dumping 0x00 into the file.  Eventually
# I'll clean all this up and just write all the attributes and elements from up above
# but now is not the time as this is just a quick and dirty POC

$fn = quickreport.xml
$xmlDoc = [system.xml.xmldocument](get-content $fn)

$xmlDoc.save($fn)

I made this 5 years ago in Excel and it was pretty popular. I think there are some errors in the math formulas that were pointed out for RAW calculations of backend IOPs. So use with caution, but mostly it should give a good enough idea on whatever it is you might be doing.

If you make modifications, please drop me a line in the comments or get in touch with me. I’ll update my post to include fixes.

RAID-and-IOPS Cheat Sheet Download

I mean, who hasn’t written a calculator application? Not much explaining to do here… LOL

width=276

This was a rather interesting project I took on.  The project was created to monitor and restrict how many Remote Desktop sessions were permitted to logon based on configuration per-OrganizationalUnit (OU) in Active Directory.

There are 2 parts to this.

Userlock – the GUI management part

sessioncheck – the executable that needs to be set to run for each user login and logoff script on a RDS Host.

Syntax of sessioncheck: sessioncheck logon or sessioncheck logoff

Userlock GUI

sessioncheck pops up a notification message box upon logon if there are no available slots left for “Max Sessions” and then logs the user immediately off the RDSH.

sessioncheck and Userlock GUI utilize a MSSQL Express database to store session data and check configurations for Maximum limits.

It works surprisingly well.  If you might be interested in this project for use at your organization, there are some code changes I’d need to make for more portability outside my environment, but it’s doable.  Get in touch with me.

s
search
c
compose new post
r
reply
e
edit
t
go to top
j
go to the next post or comment
k
go to the previous post or comment
o
toggle comment visibility
esc
cancel edit post or comment