Articles


High Precision in Frequency Domain

Modern electronic house and techno music is highly repetitive. I am not saying that it is, therefore, boring. I am a big fan of certain styles myself! What I am saying is that patterns are being repeated over and over, and it is the variation of the patterns over time … Cover

Stereo Sound Autoencoder

I’ve decided to test a small (not very deep) autoencoder on the audio data. It has two convolutional layers, with convolutions of size 4 and strides of 2, with ReLU nonlinearities. This results in 1/4 frame rate of the original audio. The corresponding deconvolutions are then applied. The … Cover

LDA vs Document Clustering

I was asked at the interview what’s the difference between LDA and document clustering. I tried to explain it by explaining the difference between generative models that are assumed for the respective models. However, now I realise it would have been much more effective to give a much simpler … Cover

Release 0-0-2

Released version 0.0.2 of the Java source code. This release allows specifying LDA topic keywords. Please see the software page for downloads.

Topic Keywords Case Study

I present a case study on a corpus of 10,000 news articles. We will investigate the topic structure of the corpus, by gradually “freezing” the topics through specifying their keywords, and seeing what other topics come up. The process shows how you can extract useful topics from your corpus …

Release 0-0-1

The first version of the Java source code for running the heavily optimised LDA Gibbs sampling has been released. Please see the software page for details. Below you can see the 20 topics output produced from 10,000 news articles, after running just 200 iterations. First 100 iterations are burn-in …

Simulated Annealing for Dirichlet Priors in LDA

When estimating the parameters of the LDA (Latent Dirichlet Allocation) model using Gibbs sampling, if we set the Dirichlet priors to the fixed target values (usually small), then we reduce the mixing of the samples from the target distribution from the beginning, even though we haven’t found a good … Cover

Gibbs Sampling for LDA with Asymmetric Dirichlet Priors

The original articles on LDA (Latent Dirichlet Allocation) assume symmetric Dirichlet priors on topic-words and document-topics distributions. This means that a-priori we assume that all topics are equally likely to appear within each document, and all words are equally likely to appear within each topic. However, if we want to …