A Text Transformation Scheme for Degenerate Strings

Jacqueline W. Daykin, Bruce Watson

Research output: Chapter in Book/Report/Conference proceedingConference Proceeding (Non-Journal item)

2 Citations (Scopus)

Abstract

The Burrows-Wheeler Transformation computes a permutation of a string of letters over an alphabet, and is well-suited to compression-related applications due to its invertability and data clustering properties. For space eciency the input to the transform can be preprocessed into Lyndon factors. We consider scenarios with uncertainty regarding the data: a position in an indeterminate or degenerate string is a set of letters. We first define Indeterminate Lyndon Words and establish their associated unique string factorization; we then introduce the novel Degenerate Burrows-Wheeler Transformation which may apply the indeterminate Lyndon factorization. A core computation in Burrows-Wheeler type transforms is the linear sorting of all conjugates of the input string - we achieve this in the degenerate case by applying lex-extension ordering. Indeterminate Lyndon factorization, and the degenerate transform and its inverse, can all be computed in linear time and space with respect to total input size of degenerate strings.
Original languageEnglish
Title of host publicationProceedings of the 2nd International Conference on Algorithms for Big Data
Subtitle of host publicationPalermo, Italy, April 07-09, 2014.
EditorsCostas S. Iliopoulos, Alessio Langiu
PublisherCEUR Workshop Proceedings
Pages23-29
Number of pages7
Publication statusPublished - 16 Apr 2014
Externally publishedYes

Fingerprint

Dive into the research topics of 'A Text Transformation Scheme for Degenerate Strings'. Together they form a unique fingerprint.

Cite this