Paper: Jointly benchmarking small and structural variant calls with vcfdist
vcfdist is a distance-based small variant calling evaluator that:
Paper: vcfdist: Accurately benchmarking small variant calls in human genomes
During my MSR internship I developed a secure and scalable cloud-based pipeline for working with both PacBio sequencing data and clinical FHIR data, from initial data through tertiary analysis. The electronic health records are stored in FHIR -- Fast Healthcare Interoperability Resource -- format, the current leading standard for health care data exchange. For the genomic data, we perform variant calling on long read PacBio HiFi data using Cromwell on Azure. Both data formats are parsed, processed, and merged in a single scalable pipeline which securely performs tertiary analyses using cloud-based Jupyter notebooks. We then demonstrate three example downstream applications: exporting patient information to a database, clustering patients, and performing a simple pharmacogenomic study.
Paper: A cloud-based pipeline for analysis of FHIR and long-read data
nPoRe is a nanopore sequencing read realigner which recalculates each read's fine-grained alignment in order to more accurately align ''n-polymers'' such as homopolymers (n=1) and tandem repeats (2 <= n <= 6). In other words, given an input BAM, it adjusts each read's CIGAR string to more accurately model the most likely sequencing errors and actual variants. Traditional affine gap penalties are context-agnostic, and do not model the higher likelihood of INDELs in low-complexity regions (particularly n-polymers), leading to poor or inconsistent alignments. We find that npore improves pileup concordance across reads and results in slightly better variant calling performance.
Pre-Print: nPoRe: n-Polymer Realigner for improved pileup-based variant calling
Nanopore sequencing is a relatively new technology which enables portable genome sequencing, including strain-level virus detection. The vast majority of DNA in a typical spit sample is non-viral. Nanopore sequencers provide a unique feature called Read Until, which can eject non-viral DNA strands in real time to increase the proportion of viral DNA sequenced. This approach comes with several computational challenges, and in this work we propose replacing the traditional pipeline of basecalling and alignment with a hardware accelerated squiggle-level read filter. Our resulting SquiggleFilter is an efficient high thoughput and low latency non-viral read filter capable of keeping pace with the next generation of sequencing devices.
During my internship at Assured Information Security (AIS) I worked with Danika Gaviola and Jason Rising on integrating a virtual Trusted Platform Module (vTPM) into OpenXT in order to provide virtual machines (VMs) the same security guarantees as a physical TPM would provide. A TPM is a secure cryptoprocessor most frequently used to verify machine state at each stage of the boot process. OpenXT is a security-enhanced fork of Xen which uses QEMU internally for peripheral device emulation. In addition to successfully providing TPM 2.0 for Linux VMs in OpenXT, we modified the Xen Project so that it could use TPM 2.0 and provide TPMs to Windows virtual machines.
This research constitutes my Clarkson Honors Thesis, which extends my previous work on using thermal cameras to reliably detect swipe pressure. I explored using convolutional neural networks to perform action classification, and found that the best results were obtained by training ConvNets on video frames with the user's hand removed, using filtering techniques designed previously.
Honors Thesis: Detection of Swipe Pressure using a Thermal Camera and ConvNets for Natural Surface Interaction
During my internship at TU Darmstadt I worked on extending an existing solver for linear optimization programs to perform semidefinite optimization. The most challenging obstacle I overcame involved computing the product of multiple large sparse matrices without storing any intermediate results. Each element in the resulting matrix could be computed individually due to certain matrix properties, and the process is shown in the image above.
The goal of this research was to improve the capabilities of current augmented reality systems and projective touchscreens by detecting the pressure at which a user interacts with objects using a thermal camera. The main challenge in developing such a classifier was in removing the user's hand from the thermal video, which required utilizing a series of morphological filters as demonstrated above.
During my internship at Los Alamos National Laboratory I worked with an interdisciplinary team of scientists to enable their ptychographic image reconstruction algorithms to run on CPUs in addition to Los Alamos' GPU cluster. Afterwards, I greatly improved the software's ease of use and automated the installation and building processes. Lastly, I worked on implementing a parallel beam position refinement algorithm to improve image reconstruction accuracy.Image from http://compphotolab.northwestern.edu/project/adp-automatic-differentiation-ptychography/
While working with the TARS research group at Clarkson, I improved the algorithms for generating a single PointCloud from our multi-Kinect system. By the end of the semester, I had decreased the total runtime required to generate a single PointCloud from 7.33 seconds to 0.178 seconds.
This project involved accelerating the detection of duplicate problem reports in large software problem report repositories. I implemented several basic parallel algorithms in CUDA C++ which could more quickly compare reports from massive repositories such as Bugzilla.
Paper: GPU Acceleration of Document Similarity Measures for Automated Bug Triaging Workshop: IWSF 2016 Awards: Best Poster Presentation in EE & CS, Clarkson University Symposium on Undergraduate Research Experiences