关键词:K-means 聚类,CTK,MMKMEANS,MapReduce I ABSTRACT With the increasingly rapid development of computer technology and the rapid spread of the Internet, the data (including structured and unstructured text data) which people in contact with is growing explosively. At present, how to effectively mining valuable information from massive data is of great significance. Cluster analysis is one of the core technologies of data mining. No matter from efficiency or from the computational complexity, the traditional single clustering algorithms have been unable to meet the processing needs of massive information, cloud computing technology development provides a new research direction for cluster analysis. As an open source project of Apache, Hadoop is a distributed computing framework