J. Zheng, X. Deng, H. Zhang. A novel method to generate frequent itemsets in distributed environment[C]//2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC). IEEE, 2018: 1-8.
发布时间:2024-03-13
点击次数:
发表刊物:2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC)
摘要:Abstract—Frequent itemset mining (FIM) is an important topic in data mining, which extracts knowledge of the relationships among items in a transaction dataset. Apriori algorithm and its variants, apriori-like algorithms, are widely used FIM algorithms. However, in a big data environment, these algorithms are inefficient. Due to the iterative calculation and modification of intermediate results, if an apriori-like algorithm is applied on a high-dimension or large-scale dataset, the memory requirement is unacceptable for a single machine. Although parallel and distributed programming could be a solution to deal with big data problems, apriori-like algorithms are not quite suitable for parallel computing because they need extra time overhead of communication to update intermediate results iteratively in cluster memories. To solve this problem, we propose a novel FIM algorithm, Distributed Apriori Based on Itemset-Encoding (DABIE). Different from existing methods, DABIE has two main advantages. Firstly, it stores intermediate results encoded in the form of 0 and 1 to reduce memory usage. Secondly, generating frequent itemsets is based on logical operation of encoding to reduce modification of data in cluster memories. These two advantages make DABIE more friendly to cluster computing. We apply DABIE on datasets with different scales. Compared with other distributed apriori-like algorithms, the results of our experiments show that DABIE can efficiently improve the multi-iterative FIM in big data environment.
备注:http://faculty.csu.edu.cn/dengxiaoheng/zh_CN/lwcg/10445/content/49191.htm
是否译文:否
附件:
57-A_Novel_Method_to_Generate_Frequent_Itemsets_in_Distributed_Environment.pdf
上一条: J. Luo, X. Deng, H. Zhang, et al. Ultra-low latency service provision in edge computing[C]//2018 IEEE International Conference on Communications (ICC). IEEE, 2018: 1-6.
下一条: X. Deng, G. Li, M. Dong, et al. Finding overlapping communities based on Markov chain and link clustering[J]. Peer-to-peer Networking and Applications, 2017, 10: 411-420.