akuz.me/nko machine learning, artificial intelligence, and what not

Stereo Sound Autoencoder

I’ve decided to test a small (not very deep) autoencoder on the audio data. It has two convolutional layers, with convolutions of size 4 and strides of 2, with ReLU nonlinearities. This results in 1/4 frame rate of the original audio.

The corresponding deconvolutions are then applied. The network is trained using the Adam algorithm using batches of 100 audio segments of 1024 stereo samples each. This only takes a few seconds, when training on the random batches extracted from the same song, and the parameters converge after 150 iterations. The animation below shows how the ability of the network to encode a random sample of music changes with training, during all 150 iterations (the gif is looped on itself backwards). Notice how it learns the mono features first, and then adjusts them to approximate the stereo music better.

This of course does not result in any compression because, even though we reduced the frame rate to 1/4 of the original, we now have 64 channels per sample versus the original 2 channel stereo. This gave me an interesting idea that with enough structure extracted in the features trained on a single song (or an artist), it might be possible to achieve a very high compression ratio — I am sure somebody is already working on this idea. If not, please thank me later.

Please click the link below to see 3 more gifs (+15 Mb to your traffic) with some further comments on the ability of the above autoencoder to approximate the music with 64 features at 1/4 of the original frame rate.

Read more

LDA vs Document Clustering

I was asked at the interview what’s the difference between LDA and document clustering. I tried to explain it by explaining the difference between generative models that are assumed for the respective models. However, now I realise it would have been much more effective to give a much simpler example.

Read more

Topic Keywords Case Study

I present a case study on a corpus of 10,000 news articles. We will investigate the topic structure of the corpus, by gradually “freezing” the topics through specifying their keywords, and seeing what other topics come up. The process shows how you can extract useful topics from your corpus, such that these topics would provide a meaningful basis for topic detection in future articles.

#: 0008  P: .0449    protests

.0374 police     .0353 protesters .0180 killed     .0155 charged
.0148 government .0116 ukraine    .0112 man        .0110 clashes
.0110 officer    .0108 anti       .0096 kiev       .0090 street

Read more

Release v.0.0.2

Released version 0.0.2 of the Java source code. This release allows specifying LDA topic keywords. Please see software page for downloads.

Release v.0.0.1

The first version of the Java source code for running the heavily optimised LDA Gibbs sampling has been released. Please see the software page for details.

Read more

Simulated Annealing for Dirichlet Priors in LDA

When estimating the parameters of the LDA (Latent Dirichlet Allocation) model using Gibbs sampling, if we set the Dirichlet priors to the fixed target values (usually small), then we reduce the mixing of the samples from the target distribution from the beginning, even though we haven’t found a good approximation yet.


Read more

Gibbs Sampling for LDA with Asymmetric Dirichlet Priors

The original articles on LDA (Latent Dirichlet Allocation) assume symmetric Dirichlet priors on topic-words and document-topics distributions. This means that a-priori we assume that all topics are equally likely to appear within each document, and all words are equally likely to appear within each topic.

Read more