Category Archives: My Projects

Pennsylvania COVID-19 Data Scraping with Python

Over the last few weeks I’ve been using Python to scrape the Pennsylvania Department of Health’s Coronavirus page. Over time the page has evolved and even split into sub-pages which contain table date of cases, deaths and other statistics.

I’ve decided to put my Python script on GitHub for public consumption. Initially when I had created the script, it was used to send me alerts when the reported numbers changed as there was no set time during the day that the website was updated, so I wanted to set up a cron job to check the website and alert me of new updates.

Below you’ll find the main script. Note that the code below may be out of date, so please check my GitHub repository for the latest. The code changes almost daily as it seems Pennsylvania Dept. of Health changes the structure, or adds new data, to the webpage which throws off my code.

Python Script

import pandas as pd
from bs4 import BeautifulSoup
import requests
import os
import re
url = r'https://www.health.pa.gov/topics/disease/coronavirus/Pages/Cases.aspx'
html_content = requests.get(url).text
lastupdatedfile = "lastupdated.txt"
soup = BeautifulSoup(html_content, "lxml")
stats = soup.find("span",attrs={"class": "ms-rteStyle-Quote"})
#TODO: byte / string issue here on updatecheck
updatecheck = stats.text[stats.text.find("at "):][3:]
if os.path.isfile(lastupdatedfile):
	lastupdate = open(lastupdatedfile).read()
	if lastupdate == updatecheck:
		print("Skipping check, no new update.")
		os._exit(0)
	else:
		print("***UPDATE***\nOld: {}".format(lastupdate))
		print("New: {}".format(updatecheck))
tables = pd.read_html(html_content, header=0)
df = tables[3]
totalCounties = 67
print("Pennsylvania Data ({})".format(updatecheck))
deathsTotal = int(df["Deaths"].sum())
casesTotal  = int(df["Number of Cases"].sum())
mortalityPercent = round((deathsTotal / casesTotal) * 100,2)
reportingTotal = int(df["County"].count())
reportingCases = df["Number of Cases"]
reportingCasesPct = round((reportingCases.count() / totalCounties) * 100,2)
reportingDeathsObj = df.apply(lambda x: True if x['Deaths'] > 0 else False, axis=1)
reportingDeaths = len(reportingDeathsObj[reportingDeathsObj == True].index)
reportingDeathsPct = round((reportingDeaths / reportingTotal) * 100,2)
print("Total Cases: {}".format(casesTotal))
print("Total Deaths: {}".format(deathsTotal))
print("Mortality Rate(%): {}".format(mortalityPercent))
print("Counties Reporting Cases: {}".format(reportingCases.count()))
print("Counties Reporting Cases(%): {}".format(reportingCasesPct))
print("Counties Reporting Deaths: {}".format(reportingDeaths))
print("Counties Reporting Deaths(% of counties reporting cases): {}".format(reportingDeathsPct))
f=open(lastupdatedfile,"w")
f.write(updatecheck)
f.close()
# Add some notification stuff here...

Output Example

Pennsylvania Data (12:00 p.m. on 3/28/2020)
Total Cases: 2751
Total Deaths: 34
Mortality Rate(%): 1.24
Counties Reporting Cases: 56
Counties Reporting Cases(%): 83.58
Counties Reporting Deaths: 13
Counties Reporting Deaths(% of counties reporting cases): 23.21

Allscripts Vision User Reporting

A quick Powershell script I hacked together that will enumerate all Active Directory users, and build an XML file for an application I wrote that generates user reports for a specific application.

The output file format is similar to:

<quickreports>
  <report name=Organizational Unit>
  <database name=v001/>
  <usrlogin name=jdoe01/>
</report>
<quickreports>
$ErrorActionPreference= 'silentlycontinue'
$arr=@{}
foreach ($usr in Get-ADUser -Filter *  | select samaccountname) {
$user = get-aduser -identity $usr.samaccountname -ErrorAction SilentlyContinue
$userou = (($user.DistinguishedName -split =,2)[-1].split(,)[1] -split =,2)[-1]
$key = $userou
$value = $usr.samaccountname
if ($arr.ContainsKey($userou)) {
$arr[$userou] += $usr.samaccountname
} else {
$arr[$userou] = @()
$arr.add($key,$value)
}
}
$foo = <?xml version=1.0 encoding=utf-8?>`r`n
$foo += <quickreports>`r`n
$foo += foreach ($ou in $arr.keys) {
    write-output   <report name=$ou>`r`n
    foreach ($u in $arr[$ou]) {
                if ($u.contains(01)) {
                    write-output <database name=v001/>`r`n
                } elseif ($u.contains(04)) {
                    write-output <database name=v004/>`r`n
                } else {
                    write-output <database name=/>`r`n
                }
        write-output <usrlogin name=$u/>`r`n
}
    write-output </report>`r`n
}
$foo += </quickreports>
$foo | out-file quickreport.xml
# for some reason the outputted file was dumping 0x00 into the file.  Eventually
# I'll clean all this up and just write all the attributes and elements from up above
# but now is not the time as this is just a quick and dirty POC
$fn = quickreport.xml
$xmlDoc = [system.xml.xmldocument](get-content $fn)
$xmlDoc.save($fn)

Raid and IOPs Cheat Sheet – Excel

I made this 5 years ago in Excel and it was pretty popular. I think there are some errors in the math formulas that were pointed out for RAW calculations of backend IOPs. So use with caution, but mostly it should give a good enough idea on whatever it is you might be doing.




If you make modifications, please drop me a line in the comments or get in touch with me. I’ll update my post to include fixes.


RAID-and-IOPS Cheat Sheet Download

Userlock – Restricting Number of Remote Desktop Sessions Per OU

This was a rather interesting project I took on.  The project was created to monitor and restrict how many Remote Desktop sessions were permitted to logon based on configuration per-OrganizationalUnit (OU) in Active Directory.

There are 2 parts to this.

  1. Userlock – the GUI management part
  2. sessioncheck – the executable that needs to be set to run for each user login and logoff script on a RDS Host.
    Syntax of sessioncheck: sessioncheck logon or sessioncheck logoff

Userlock GUI

sessioncheck pops up a notification message box upon logon if there are no available slots left for “Max Sessions” and then logs the user immediately off the RDSH.

sessioncheck and Userlock GUI utilize a MSSQL Express database to store session data and check configurations for Maximum limits.

It works surprisingly well.  If you might be interested in this project for use at your organization, there are some code changes I’d need to make for more portability outside my environment, but it’s doable.

TheWatcher

An excercise in FileSystemWatcher in .NET that was used to create an application to monitor a given folder (and subfolders) for specific file/folder operations such as Creation, Deletion, Rename, and Modification.  It also has the ability to send an email alert upon detecting one of the operations selected to monitor for.



PDF2TIFF

I wrote PDF2TIFF to facilitate monitoring of a specified folder location for new PDF file drops and then converting to TIFF automatically using Ghostscript and storing the TIFF to a destination folder.
Here’s what it looks like…


If you are interested in this and think it might be of use to you or your company, I will sell a use license for $275,000.00 USD. Kidding. Just drop me a line and I’ll send it and the source code over.