PDFBox searching

Search through PDFs with PDFBox and leverage Tika for mime-type verification
Read more →

Concurrent eTl processing in groovy

Converting and structuring government NPI entries for direct import into Mongo Collections
Read more →

Reusing threads and objects via ThreadLocal

Walkthrough of executor services, custom thread factories and threads with ThreadLocal references
Read more →

A concurrent, Groovy thread ripper

Exercise and exhaust your multi-core CPU with this simple groovy example
Read more →

Face identification with AWS Rekognition

Using Rekognition to build collections of faces to cross identify people and parties
Read more →

Home Network Monitoring

Project to poll, report and track cable modem stats and general network performance stats (like ICMP)
Read more →

Groovy Snippets: Face detection with OpenIMAJ

Building on the previous post, this snippet leverages concurrent execution and OpenIMAJ for facial detection and analysis. OpenIMAJ implements types a variety of facial detection algorithms in its library. For this exercise, we use the: HaarCascadeDetector Frontal Keypoint Enhanced, which wraps the HaarCascadeDetector The results of the script were used to narrow the possible matches within the dataset prior to execution, leveraging cloud infrastructures. #!/usr/bin/env groovy @GrabResolver(name=‘OpenIMAJ Maven Repo’, root=‘http://maven.
Read more →

Groovy Snippets: Content detection w/ tika

A few months back I had to process a few million, a few terrabytes of assets that were missing content extensions. I needed to filter the assets so that I could stream assets that were images to 3rd party provider’s API for further analysis. Rough estimates for the number of images in the few million assets came out to no more than about an 1⁄8. Fortunately I had access to the data on a particular machine and used Tika to quickly zip through the assets and identify those that met my required content types.
Read more →