Writing a portscan utility in .NET

I’m working on a side project that is a portscan utility written in VB.net. Here’s my progress so far, and it is working.

There’s some way to go on this little project. I think I can optimize it further, clean up the code, and fix my logic for many of the options. See an earlier post about handling command line arguments I wrote which is what class I use for handling the arguments.

Check out my C port scanner which is significantly faster (65k ports in < 30 seconds).

FileSystemWatcher – LastAccess Not Working

To have System.IO.FileSystemWatcher LastAccess work, the system must have access logging enabled with the following command.

fsutil behavior set DisableLastAccess 0

After setting this, reboot, and you can successfully use the FileSystemWatcher to monitor LastAccess of files (sorta).

Example Code

FileSystemWatcher lWatcher = new FileSystemWatcher(@"C:\windows\temp", "*.*");
                lWatcher.NotifyFilter = NotifyFilters.LastAccess;
                lWatcher.EnableRaisingEvents = true;
                lWatcher.Changed += new FileSystemEventHandler(HandlerWatcherChanged);
                lWatcher.IncludeSubdirectories = true;
            }
        }
        static void HandlerWatcherChanged(object sender, FileSystemEventArgs e)
        {
            Console.WriteLine("[" + DateTime.Now.ToString() + "] ACCESS " + e.FullPath.ToString());


        }

Get File Count Recursively

I’ve been working on a small tool to aid in removing duplicate files and as I’m going back over my roughed in code, I’m trying to optimize it for some performance gains.

This snippet of code works really well for recursively counting files given a specific path.  I originally found it at StackOverflow and slightly modified to suit my needs.

Sub ProcessFile(ByVal path As String)
        fileCounter += 1
    End Sub
    Sub ApplyAllFiles(ByVal folder As String, ByVal extension As String, ByVal fileAction As ProcessFileDelegate)
        For Each file In Directory.GetFiles(folder, extension)
            fileAction.Invoke(file)
        Next
        For Each subDir In Directory.GetDirectories(folder)
            Try
                ApplyAllFiles(subDir, extension, fileAction)
            Catch ex As Exception
            End Try
        Next
    End Sub

It processes about 27k files in 1.5 seconds on my SSD disk.  I have it running against a NAS with considerably larger amount of files, so I’ll see how well it performs.
In my sub, I use the following to kick it off.

        Dim fileCounter as Long = 0L
        Dim path = "z:"
        Dim ext = "*.*"
        ToolStripStatusLabel1.Text = "Calculating files..."
        stpw.Start()
        Dim runProcess As ProcessFileDelegate = AddressOf ProcessFile
        ApplyAllFiles(path, ext, runProcess)
        stpw.Stop()
        Dim rslts3 As String = String.Format("Total files = {0:n0}. Took {1:n0} ms.", fileCounter, stpw.ElapsedMilliseconds)
        ToolStripStatusLabel1.Text = rslts3.ToString

Graphically speaking, this isn’t much to look at – but the important part is in the ToolStripStatus. I have a timer on my form that updates the latest file count every 15 seconds so that a user would know it’s still working. Interestingly enough, if I update the ToolStripStatus with every single file that is found, it exponentially increases the time it takes to go through the files, so I decided to just update every 15 seconds.

Development Log: Duplicate File Finder

I have thousands of files stored on an external USB attached 1TB drive.  My drive is currently 95% full.  I know I have duplicate files throughout the drive because over time I have been lazy and made backups of backups (or copies of copies) of images or other documents.
Time to clean house.
I’ve searched online for a tool to do the following things, relatively easily and in a decent designed user interface:

  • Find duplicates based on hash (SHA-256)
  • List duplicates at end of scan
  • Give me an option to delete duplicates, or move them somewhere
  • Be somewhat fast

Every tool I’ve used fell short somewhere.  So I decided to write my own application to do what I want.
What will my application do?
Hash each file recursively given a starting path and store the following information into an SQLite database for reporting and/or cleanup purposes.

  • SHA-256 Hash
  • File full path
  • File name
  • File extension
  • File mimetype
  • File size
  • File last modified time

With this information, I could run a report such as the following pseudo report:
Show me a list of all duplicate files with an extension of JPG over a file size of 1MB modified in the past 180 days.

That’s just a simple query, something like:

SELECT fileHash, fileName, filePath, fileSize COUNT(fileHash) FROM indexed_files WHERE fileExtension='JPG' and fileSize > 1024 GROUP BY fileHash HAVING COUNT(fileHash)>1

My application can show me a list of these and make some decisions to allow me to move or delete the duplicates after the query runs.

One problem comes to mind in automating removal or moving duplicates… What if there are more than 1 duplicate file; how do I handle this?

So on to the bits and pieces…

The hashing function is pretty straight-forward in VB.NET (did I mention I was writing this in .NET?).

Imports System.IO
Imports System.Security
Imports System.Security.Cryptography
Function hashFile(ByVal fileName As String)
  Dim hash
  hash = SHA256.Create()
  Dim hashValue() As Byte
  Dim fileStream As FileStream = File.OpenRead(fileName)
  fileStream.Position = 0
  hashValue = hash.ComputeHash(fileStream)
  Dim hashHex = PrintByteArray(hashValue)
  fileStream.Close()
  Return hashHex
End Function
Public Function PrintByteArray(ByVal array() As Byte)
  Dim hexValue As String = ""
  Dim i As Integer
  For i = 0 To array.Length - 1
    hexValue += array(i).ToString("X2")
  Next i
  Return hexValue.ToLower
End Function
Dim path As String = "Z:"
' Insert recursion function here and inside, use the following:
Dim fHash = hashFile(path) ' The SHA-256 hash of the file
Dim fPath = Nothing ' The full path to the file
Dim fName = Nothing ' The filename
Dim fExt = Nothing ' The file's extension
Dim fSize = Nothing ' The file's size in bytes
Dim fLastMod = Nothing ' The timestamp the file was last modified
Dim fMimeType = Nothing ' The mimetype of the file

Ok cool, so I have a somewhat workable code idea here. I’m not sure how long this is going to take to process, so I want to sample a few hundred files and maybe even think about some options I can pass to my application such as only hashing specific exensions or specific file names like *IMG_* or even be able to exclude something.
But first… a proof of concept.

Update: 11/28/2016

Spent some time working on the application.  Here’s a GUI rendition;  not much since it is being used as a testing application.

I have also implemented some code for SQLite use to store this to a database.  Here’s a screenshot of the database.

Continuing on with some brainstorming, I’ve been thinking about how to handle the multiple duplicates.
I think what I want to do is

  • Add new table “duplicates”
  • Link “duplicates” to “files” table by “id” based on duplicate hashes
  • Store all duplicates found in this table for later management (deleting, archiving, etc.)

After testing some SQL queries and using some test data, I came up with this query:

SELECT * FROM file a
WHERE ( hash ) IN ( SELECT hash FROM file GROUP BY hash HAVING COUNT(*) > 1 )

This gives me the correct results as illustrated in the screenshot below.

So with being able to pick out the duplicate files and display them via a query, I can then use the lowest “id” as the base or even the last modified date as the original and move the duplicates to a table to be removed or archived.
Running my first test on a local NAS with thousands of file.  It’s been running about 3 hours and the database file is at 1.44MB.

Update 12/1/2016

I’ve worked on the application off and on over the past few days trying to optimize the file recursion method.  I ended up implementing a faster method than I created above, and I wrote about it here.

Here’s a piece of the code within the recursion function.  I’m running the first test on my user directory, C:Users
kreider.  The recursive count took about 1.5 seconds to count all the files (27k).  I will need to add logic because the file count doesn’t actually attempt to open and create a hash like my hash function does;  so 27k files may actually end up only being 22k or whatever.

Just a file count of C:\users\rkreider (SSD) took about 1.5 seconds for 26k files.

File count of my user directory (SSD disk), no file hashing or other processing done.


Hashing Test Run 1
On this pass, I decided to run the hash on the files.  It took considerably longer, just under 5 minutes.

File hashing recursively of my user directory (SSD).


Something important to note.  Not all 26,683 of the original files scanned were actually hashed for various reasons such as Access Permissions, file already opened by something, etc.
For comparison, the database (SQLite) created 26,505 records and is 5.4MB in size.
Hashing Test Run 2
I moved the file counter further into the hash loop and only increment the counter when a file is successfully hashed.  Here are my results now.

Recursive hash of my user directory (SSD) with a found/processed indicator now.


As you can see, it found 26,684 file and could only process (hash) 26,510.

Comparing the result in GUI to the database with SELECT COUNT(*) FROM file, it matches properly.  The database size remains about the same, 5.39MB.

One thing that I’m trying to decide is whether or not to put some type of progress identifier on the interface.
The thing is, this adds overhead because I have to first get a count of files and that will take x seconds.  In the case of the NAS scan, it took 500+ seconds, over 5 minutes.  So I’d be waiting 5 minutes JUST for a count and then I’d start the file hashing which will take time.  I just don’t know if it is worth it, but it sure would be nice I believe.

Database Schema

CREATE TABLE [file] (
[id] INTEGER  PRIMARY KEY AUTOINCREMENT NOT NULL,
[hash] text  NULL,
[fullname] text  NULL,
[shortname] text  NULL,
[extension] text  NULL,
[mimetype] text  NULL,
[size] intEGER  NULL,
[modified] TIMESTAMP  NULL
);

Office.com Online using WinForms

This isn’t true API access, just a WebBrowser control. I’m looking into the API though which would require me to register my Application even though it’s not a Windows App.

It has a launcher (I haven’t decided exactly how I’ll integrate this) that will launch the Microsoft Office Online services at the click of a button.

As you can see, you can still access the Navigation provided by Microsoft. I fixed the Winform so it doesn’t launch a new window and keeps everything inside the Winform on the WebBrowser control.

I guess I see this side project being useful for someone who wants to use the free Office.com Online services with their Microsoft account and don’t want the full browser experience.


2014-08-08_141453
2014-08-08_141506
2014-08-08_141519
2014-08-08_141525
Some notes to get this to render properly in the WebBrowser control.

I had to force a User Agent to indicate I was using IE11. I think what I was seeing was that the WebBrowser control was defaulting to MSIE7. Made things look like crap.
VB.NET Code for this:

Private Property uag As String = "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko"
     _
    Private Shared Function UrlMkSetSessionOption(ByVal dwOption As Integer, ByVal pBuffer As String, ByVal dwBufferLength As Integer, ByVal dwReserved As Integer) As Integer
    End Function
    Const URLMON_OPTION_USERAGENT As Integer = &H10000001
    Public Function ChangeUserAgent(ByVal Agent As String)
        UrlMkSetSessionOption(URLMON_OPTION_USERAGENT, Agent, Agent.Length, 0)
    End Function

I implemented forcing new windows to the WebBrowser control in the Winform. If you use the top navigation at all, it opened the service in a new Internet Explorer Window.
VB.NET Code for this:

    Private Sub Webbrowser1_NewWindow(sender As Object, e As CancelEventArgs) Handles WebBrowser1.NewWindow
        WebBrowser1.Navigate(WebBrowser1.StatusText)
        e.Cancel = True
    End Sub

Originally, I wanted to just wrap this in a HTA or something simple… but I got errors indicated that I couldn’t pull this in an iFrame. So I tried some Ajax/jQuery stuff in HTA and that was a complete fail (I’m not familiar with ajax/jquery things).