Brown clusters The English Brown clusters are induced as described in the paper: Joseph Turian, Lev-Arie Ratinov and Yoshua Bengio (2010) "WORD REPRESENTATIONS: A SIMPLE AND GENERAL METHOD FOR SEMI-SUPERVISED LEARNING", on the RCV1 corpus, cleaned as described in the paper (roughly 37M words of News text). The Chinese Brown clusters are induced from Chinese Gigaword 3 with the implementation from Percy Liang https://github.com/percyliang/brown-cluster The 1000 cluster version does not have a word frequency cutoff. The 3200 cluster version requires each word to occur at least 20 times. brown-rcv1.clean.tokenized-CoNLL03.txt-c*-freq1.txt gigaword-zh-c*.txt Brown clusters for a particular number of induced classes. The first column is the name of the cluster. The second column is the word. The third column is the frequency of the word in the Corpus.