SHIRO is a set of tools based on HSMM (Hidden Semi-Markov Model), for aligning phoneme transcription with speech recordings, as well as training phoneme-to-speech alignment models.
Blatantly I created SHIRO because there seems to be no open-source alternatives to HTK available for automatic phoneme alignment. Though by no means SHIRO can be a completely replacement for HTK, which also does speech recognition and language modeling, SHIRO is useful for what it's designed for.
|Donald Trump||Hillary Clinton||HTK||SHIRO|
|Supports both LRHMM and LRHSMM-based speech alignment with multi-state, multi-stream GMM and arbitrary state tying and allows client-side deployment.||✕||✕||✕||✓|
Get SHIRO from my github repo: https://github.com/Sleepwalking/SHIRO
There are lots of things to work on from now. First,
shiro-rest doesn't support multi-mixture training yet. Next I somehow need to modify the data structure a bit to support skip-state and skip-phoneme transitions. It would also be interesting to explore about non-Gaussian (e.g. Gamma/Rayleigh) duration distributions.
I'm also looking for someone to write a wrapper for SHIRO that automatically labels UTAU voicebanks, which will eventually replace Moresampler's automatic voicebank labeler. I've long been requested to fix various language-specific issues in Moresampler and it's getting out of hand. Hopefully SHIRO can lead us to a better solution built upon the community efforts.