(1 中南民族大学电子信息工程学院,湖北武汉 430074;2 华中科技大学计算机科学与技术 学院,湖北武汉 430074;3 ***通信集团湖北有限公司业务支撑中心,湖北武汉 430040) 摘要针对聚类算法特点给出了编程模型实现聚类算法的方法函数完 k-means , MapReduce k-means ,Map 成每个记录到聚类中心距离的计算并重新标记其属于的新聚类类别函数根据函数得到的中间 ,Reduce Map 结果计算出新的聚类中心供下一轮使用实验结果表明算法并行化后 , MapReduce Job . :k-means MapReduce 部署在集群上运行具有较好的加速比和良好的扩展性 Hadoop , . 关键词云计算并行计算模型数据挖掘聚类算法 ; ;MapReduce ; ;k-means 中图分类号文献标志码文章编号 TP301 A 1671-4512(2011)S1-0120-05 Parallel implementing k-means clustering algorithm using MapReduce programming mode Jiang Xiaoping1 Li Chenghua1 Xiang Wen2 Zhang Xinfang2 Yan Haitao3 (1College of Electronics and Information Engineering,South-Central University for Nationalities,Wuhan 430074,China;2School puter Science and Technology,Huazhong University of Science and Technology, Wuhan 430074,China;3Business Support Center,China Mobile Group Hubei .,Wuhan 430040,China) Abstract How to implement the k-means clustering algorithm using MapReduce programming mode was distance between each point and each cluster was calculated and new center ID was assigned to each point in the Map the points of the same key value(current cluster ID) were sent to a single reduc