Category: Speech Processing

libllsm2 Release

The long-awaited update for libllsm is finally ready. libllsm2, successor to the speech processing library powering Moresampler and Synthesizer V (WIP) is now available on Github.

While there isn't much change to the model and algorithm, the new version is more of a clean-up and rewriting with a focus on the usability in host applications. In particular, I have been long aware that the separated storage of layer 0 and layer 1 parameters in libllsm is an extremely clumsy design when access to both layers is desired. This problem is addressed in libllsm2 by storing all frame-level parameters in a dynamic array structure named llsm_container, which can hold an arbitrary number of arbitrary objects. llsm_container frames are further wrapped in a llsm_chunk, the equivalent of a llsm_model in the legacy version.


Announcing SHIRO the Speech Alignment Toolkit

SHIRO is a set of tools based on HSMM (Hidden Semi-Markov Model), for aligning phoneme transcription with speech recordings, as well as training phoneme-to-speech alignment models.

Blatantly I created SHIRO because there seems to be no open-source alternatives to HTK available for automatic phoneme alignment. Though by no means SHIRO can be a completely replacement for HTK, which also does speech recognition and language modeling, SHIRO is useful for what it's designed for.

Donald Trump Hillary Clinton HTK SHIRO
Big Mouth
Crooked ?
Costs $$$
Supports both LRHMM and LRHSMM-based speech alignment with multi-state, multi-stream GMM and arbitrary state tying and allows client-side deployment.

libllsm Update

So as promised a year ago, I updated the libllsm repository to the latest version used by Moresampler.

There're quite some changes to the legacy version, in particular the two-layer representation of speech signal. The legacy libllsm is now called layer 0 LLSM, and there's a layer 1 model being added. The layer 1 model is an acoustic re-interpretation of the harmonic component in layer 0 and the conversion from layer 0 to layer 1 involves a new algorithm, which I prefer to call as Pseudo Glottal Inverse Filtering (PGIF), that approximately inverses the speech production process. All these changes are made in the direction that sticks to the principles of speech production, and enables more acoustically-meaningful modification of speech.

More details can be found in readme.md.