News

  • publication
    Wednesday, January 22, 2014
    We demonstrate that a concept of "weighted information content" (known as topological pressure, from the ergodic theory literature) can be used to facilitate the analysis of genomic data (in particular, find areas of a genome that have many genes in them). This is a conceptual extension to topological entropy approach presented earlier.
  • publication
    Wednesday, January 1, 2014
    We review a variety of entropy/randomness-based techniques that are useful in a variety of data mining applications.
  • publication
    Thursday, June 20, 2013
    We introduce an extremely fast, light-weight, "big data" algorithm to quickly answer the question of "which bacteria are present?" in a given sample of DNA. The method is based on the theory of compressed sensing and aims to find the simplest explanation for the data in terms of known information.
  • publication
    Tuesday, May 1, 2012
    This is my PhD thesis from Penn State (advised by Manfred Denker).
  • publication
    Monday, February 21, 2011
    I define a new notion of "randomness" (called topological pressure) suitable for use on sequences of symbols (words) of finite length. I show that this can be used to distinguish between biologically interesting sequences in the human genome.