SHIRO is a set of tools based on HSMM (Hidden Semi-Markov Model), for aligning phoneme transcription with speech recordings, as well as training phoneme-to-speech alignment models.

Blatantly I created SHIRO because there seems to be no open-source alternatives to HTK available for automatic phoneme alignment. Though by no means SHIRO can be a completely replacement for HTK, which also does speech recognition and language modeling, SHIRO is useful for what it's designed for.

Donald Trump Hillary Clinton HTK SHIRO
Big Mouth
Crooked ?
Costs $$$
Supports both LRHMM and LRHSMM-based speech alignment with multi-state, multi-stream GMM and arbitrary state tying and allows client-side deployment.

Get SHIRO from my github repo:

There are lots of things to work on from now. First, shiro-rest doesn't support multi-mixture training yet. Next I somehow need to modify the data structure a bit to support skip-state and skip-phoneme transitions. It would also be interesting to explore about non-Gaussian (e.g. Gamma/Rayleigh) duration distributions.

I'm also looking for someone to write a wrapper for SHIRO that automatically labels UTAU voicebanks, which will eventually replace Moresampler's automatic voicebank labeler. I've long been requested to fix various language-specific issues in Moresampler and it's getting out of hand. Hopefully SHIRO can lead us to a better solution built upon the community efforts.