The main code you would probably want to look at is optimised LDA Gibbs sampling in the
akuz-nlp library, which includes the following enhancements over the standard implementations:
akuz-nlp- Natural Language Processing (NLP) library
akuz-nlp-run-lda- How to run LDA Gibbs sampling
The below zip files contain abstracts (or full texts, depending on the source) of news articles. Close duplicates from the same source have been removed. The data does not have source names or timestamps. First line in each file is a title.
To use this data with algorithms from the NLP library, unpack the archive into a directory on your computer, and then specify that directory in the parameters to the program (see
akuz-nlp-run-lda project for an example).