libllsm2 Release

The long-awaited update for libllsm is finally ready. libllsm2, successor to the speech processing library powering Moresampler and Synthesizer V (WIP) is now available on Github.

While there isn't much change to the model and algorithm, the new version is more of a clean-up and rewriting with a focus on the usability in host applications. In particular, I have been long aware that the separated storage of layer 0 and layer 1 parameters in libllsm is an extremely clumsy design when access to both layers is desired. This problem is addressed in libllsm2 by storing all frame-level parameters in a dynamic array structure named llsm_container, which can hold an arbitrary number of arbitrary objects. llsm_container frames are further wrapped in a llsm_chunk, the equivalent of a llsm_model in the legacy version.

Help me with revising the experimental Arpasing recording script

Hello folks,

After 300 CPU-hours of intense computation, a new recording script for Arpasing has been generated.

Unlike the lists you've seen before, this one consists of actually meaningful words and phrases selected from a small subset of public-domain books from Project Gutenberg, yet it is different from CMU Arctic (which is also based on Gutenberg) in that the new Arpasing script is designed for singing synthesis. You can take a glimpse at the first few lines,

1 was Matthew's consolatory rejoinder
2 he stood before the wide opening
3 we shall begin our researches here
4 with his usual craving on him
5  but this little line of dancing men

This blog might remain broken for an indefinite amount of time.

University tech service said there was a disk fault in the server that handles the redirects. The issue was fixed right away. I'll take back my words.

Apparently the university has shut down the redirection from to without notifying me. All the links on this blog point toward I'm not sure how long it takes before getting all the links fixed and I don't want to waste my time on fixing hundreds of links which are not supposed to be broken.

I am upset by the fact that the university cares little about maintaining students' edu websites, an important channel for displaying research and personal projects and information exchange with entities inside and outside of the campus. It surely means a lot for an educational institute with a 40k+ student population.

Announcing SHIRO the Speech Alignment Toolkit

SHIRO is a set of tools based on HSMM (Hidden Semi-Markov Model), for aligning phoneme transcription with speech recordings, as well as training phoneme-to-speech alignment models.

Blatantly I created SHIRO because there seems to be no open-source alternatives to HTK available for automatic phoneme alignment. Though by no means SHIRO can be a completely replacement for HTK, which also does speech recognition and language modeling, SHIRO is useful for what it's designed for.

Donald Trump Hillary Clinton HTK SHIRO
Big Mouth
Crooked ?
Costs $$$
Supports both LRHMM and LRHSMM-based speech alignment with multi-state, multi-stream GMM and arbitrary state tying and allows client-side deployment.

Moresampler 0.8.3 Release

This release fixes two crashing bugs that do not happen all the time, though chances of occurring still exist and occurrences have been reported by only two users so far. There's no immediate need to upgrade to 0.8.3 if the previous version works fine under your setup.

Download Moresampler 0.8.3 here

On the effectiveness of a headphone amplifier

I got quite intrigued by the design of audio devices ever since getting my ER4 earphone, which claims to reproduce the pressure field in ear canal as what it would be in reality.

The only thing sounding fishy to me is how could a soundcard with an output impedance of 40 Ohm drive a 45 Ohm earphone without voltage division taking place. A search on Google told me nothing besides audiophiles selling their impedance matching snake oil. It just didn't seem like many people had done a serious measurement on this matter, so I decided to do one, and also for what will happen if an op-amp is used to boost the current delivery.

The report can be downloaded from here.

Moresampler 0.8.2 Release

Our users in Japan had long discovered that Moresampler adds a vibe to the voice when modulation parameter goes to 100%. (FYI: what modulation does is to restore the pitch fluctuation in the recording.) While this feature technically isn't deliberately designed, it comes as a surprisingly nice byproduct of Moresampler's sophisticated algorithm and architecture. Though the mod 100% vibe isn't perfect yet - in some cases just like using mod 100% on many other engines, the pitch goes completely off making the voice sound almost "drunk". In this release I present a fix to this problem and I'd recommend everyone to give it a try.

Inspired by the modulation parameter, a new flag 'Mp' is added that randomly perturbs the pitch curve. The number after Mp controls the degree of perturbation. While the range is from 0 to 100, a small number around 5 should be enough to notice the difference.

An Arpasing-related bug in the oto generator is also fixed. Moresampler 0.8.2 is fully compatible with the recently updated Arpasing 0.2.

Arpasing Standard+Script+Plugin Update

This is an update to Arpasing standard (0.2.0), recording script (0.2.0) and Arpasing Assistant plugin (0.2).

Since the initial launch a few months ago I've received a lot of feedback on various aspects of the Arpasing framework. This update is a major renovation to the whole system in regard to the feedback. The major changes are,

Arpasing standard:

  • Triphone/quadphone units are removed.
  • "ax" (schwa) is added to the phone set.

Arpasing recording script:

  • The N-gram coverage script is redesigned.
  • Extension script "isolated vowels" is added.

Arpasing Assistant:

  • Bug fixes.
  • An option is added to force the note duration to be uniform (explained in detail in readme.txt).

Arpasing toolkit (version 0.2) can be downloaded from here.

I'm also going to update Moresampler very soon.

libllsm Update

So as promised a year ago, I updated the libllsm repository to the latest version used by Moresampler.

There're quite some changes to the legacy version, in particular the two-layer representation of speech signal. The legacy libllsm is now called layer 0 LLSM, and there's a layer 1 model being added. The layer 1 model is an acoustic re-interpretation of the harmonic component in layer 0 and the conversion from layer 0 to layer 1 involves a new algorithm, which I prefer to call as Pseudo Glottal Inverse Filtering (PGIF), that approximately inverses the speech production process. All these changes are made in the direction that sticks to the principles of speech production, and enables more acoustically-meaningful modification of speech.

More details can be found in

Moresampler 0.8.1 Release

Thanks to @ 's report, I found a severe bug in the recently upgraded pitch estimator where by mistake a standard deviation value was treated as a variance. This resulted in a strong noise being added to the input before running pitch estimation, and it tremendously reduces the accuracy of pitch and voicing detection. This bug is fixed in Moresampler 0.8.1 as the only change made, but it is a very important bug fix.

Download Moresampler 0.8.1 here

0.8.1 (Mar. 29, 2017) Download

  • Bug fix: a severe bug in the recently upgraded pitch estimator.