FAN Wei-wei,ZHAO Dong-sheng*.Big Data Processing Platform Spark and Its Biomedical Applications[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2015,39(2):1-5.[doi:10.3969/j.issn.2095-5707.2015.02.001]
大数据处理平台Spark及其生物医学应用
- Title:
- Big Data Processing Platform Spark and Its Biomedical Applications
- Keywords:
- big data; Spark; medical research; biomedical informatics
- 文献标志码:
- A
- 摘要:
- 随着生命科学和医疗信息化的快速发展,生物医学数据出现了爆炸式增长趋势,其处理面临数据量大、维度关系复杂和交互式响应要求高等问题。传统的数据库以及Hadoop框架在处理生物医学大数据方面都存在一些不足。Spark是一个新兴的基于内存计算的开源大数据平台,具有丰富的编程接口、通用的处理框架和多元化的运行模式。本文介绍了Spark的关键技术和特性,以及不同来源生物医学大数据特点和成功案例,表明Spark在生物医学大数据处理中的适用性和潜在优势。
- Abstract:
- With the rapid development of life sciences and medical informatization, an explosive growth trend of biomedical data has appeared, whose processing has the problems of a large mount of data, complex multi-dimensional relations and high interactive response demands. There are some defects in biomedical big data processing by using traditional database and Hadoop. Spark is a novel open-source big data platform based on memory computation, which has abundant programming interfaces, general processing framework and pluralistic operation modes. This article introduced the key technologies and features of Spark, combined analysis of characteristics of biomedical big data and successful cases of Spark, and discussed the applicability and potential advantages of Spark in the biomedical big data processing.
参考文献/References:
[1] MW Davidson, DA Haim, JM Radin. Using Networks to Combine “Big Data” and Traditional Surveillance to Improve Influenza Predictions[J/OL].[2015-01-29].http://www.nature.com/srep/2015/150129/srep08154/full/srep08154.html.
[2] Miguel Helft.大数据能治愈癌症吗?[EB/OL].[2014-10-29]. http://www.fortunechina.com/business/c/2014-10/29/content_225089.htm.
[3] OBAMA ADMINISTRATION: Big Data Research and Development Initiative[EB/OL].[2012-03-30].http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release.pdf.
[4] RC Taylor. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics[J].BMC Bioinformatics,2010,11(Suppl 12):S1.
[5] MC Schatz. Cloudburst: Highly Sensitive Read Mapping with Mapreduce[J]. Bioinformatics,2009,25(11):1363-1369.
[6] S Leo, F Santoni, G Zanetti. Biodoop: Bioinformatics on hadoop[C]//Proceedings of the 38th International Conference on Parallel Processing Workshops (ICPPW 2009). Vienna, Austria:2009:415-422.
[7] A Matsunaga, M Tsugawa, J Fortes. CloudBLAST: Combining Mapreduce and Virtualization on Distributed Resources for Bioinformatics Applications[C]//IEEE 4th International Conference on eScience (eScience 2008). Indiana,USA:2008:222-229.
[8] H Lee, Y Yang, H Chae, et al. BioVLAB-MMIA: A cloud environment for microRNA and mRNA integrated analysis (MMIA) on Amazon EC2[J]. IEEE Transactions on NanoBioscience,2012,11(3):266-272.
[9] AP Heath, M Greenway, R Powell, et al. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets[J]. J Am Med Inform Assoc,2014,21(6):969-975.
[10] J Zhang, DW Chen, JH Zhao, et al. RASS: A Portable Real-time Automatic Sleep Scoring System[C]// Proceedings of 2012 IEEE 33rd Conference on Real-Time Systems Symposium(RTSS). Washington D.C., USA:2012:105-114.
[11] Explorys Co.Ltd. Unlocking the Power of BIG DATA to Improve Healthcare for Everyone [EB/OL].[2015-01-20].https://www.explorys. com/docs/data-sheets/explorys-overview_factsheet.pdf.
[12] M Zaharia, M Chowdhury, M.J. Franklin, et al. Spark: Cluster Computing with Working Sets[C].//Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing. Boston, USA,2010.
[13] M Zaharia, M Chowdhury, T Das, et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing[C]//Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. San Jose: USENIX Association,2012:2-2.
[14] M Chowdhury. Performance and Scalability of Broadcast in Spark[EB/OL].[2014-10-08].http://www.cs.berkeley.edu/~agearh/ cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf.
[15] S Agarwal, A Panda, B Mozafari, et al. Blink and It’s Done: Interactive Queries on Very Large Data[C]//Proceedings of 38th International Conference on Very Large Databases (VLDB 2012). Istanbul, Turkey,2012:1902-1905.
[16] L Haoyuan, G Ali, Z Matei, et al. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks[C]// Proceedings of the ACM Symposium on Cloud Computing(SOCC ’14). Seattle, Washington, USA,2014:1-15.
[17] D Patterson. Spark meets Genomics: Helping Fight the Big C with the Big D [EB/OL].[2015-02-25].http://spark-summit.org/2014/talk/david-patterson.
[18] Z Matei, William J. Bolosky, C Kristal, et al. Faster and More Accurate Sequence Alignment with SNAP [EB/OL].[2015-02-25].http://arxiv.org/abs/1111.5572.
[19] M Massie, F Nothaft, C Hartl, et al. ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing[EB/OL].[2013-12-15].http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/ EECS-2013-207.html.
[20] A Talwalkar,J Liptrap,J Newcomb, et al. SMASH: A Benchmarking Toolkit for Human Genome Variant Calling[J]. Bioinformatics,2014,30(19):2787-2795.
相似文献/References:
[1]潘文,程涛,牛崇信,等.大数据时代中医药信息的应用[J].中国中医药图书情报杂志,2014,38(1):2.[doi:10.3969/j.issn.2095-5707.2014.01.001]
Pan Wen,Cheng Tao,Niu Chongxin,et al.Application of TCM Information in the Age of Big Data[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2014,38(2):2.[doi:10.3969/j.issn.2095-5707.2014.01.001]
[2]郑红月,蒋丽平,李如辉.大数据时代背景下高校图书馆的服务变革探析[J].中国中医药图书情报杂志,2014,38(3):9.[doi:10.3969/j.issn.2095-5707.2014.03.003]
Zheng Hongyue,Jiang Liping,Li Ruhui.Analysis of University Library Service Reform in the Era of "Big Data"[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2014,38(2):9.[doi:10.3969/j.issn.2095-5707.2014.03.003]
[3]朱毓梅.大数据时代背景下中医古籍面临的机遇与挑战[J].中国中医药图书情报杂志,2014,38(3):12.[doi:10.3969/j.issn.2095-5707.2014.03.004]
Zhu Yumei.Opportunities and Challenges of TCM Ancient Books in the Context of Big Data[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2014,38(2):12.[doi:10.3969/j.issn.2095-5707.2014.03.004]
[4]李慧芳.大数据时代高校图书馆开放科学数据服务[J].中国中医药图书情报杂志,2015,39(2):24.[doi:10.3969/j.issn.2095-5707.2015.02.007]
LI Hui-fang.Open Scientific Data Services of University Libraries in the Big Data Era[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2015,39(2):24.[doi:10.3969/j.issn.2095-5707.2015.02.007]
[5]吴俊玲.大数据时代中医药期刊面临的机遇和挑战[J].中国中医药图书情报杂志,2015,39(3):60.[doi:10.3969/j.issn.2095-5707.2015.03.019]
WU Jun-ling.Opportunities and Challenges of TCM Periodicals in Big Data Era[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2015,39(2):60.[doi:10.3969/j.issn.2095-5707.2015.03.019]
[6]于彤,李敬华,杨硕*,等.中医药“知识密集型”数据研究思路[J].中国中医药图书情报杂志,2015,39(4):1.[doi:10.3969/j.issn.2095-5707.2015.04.001]
YU Tong,LI Jing-hua,YANG Shuo*,et al.Research Thoughts of “Knowledge-Intensive” Data of Traditional Chinese Medicine[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2015,39(2):1.[doi:10.3969/j.issn.2095-5707.2015.04.001]
[7]崔蒙,李海燕,杨硕,等.中医药信息学理论科学领域研究进展[J].中国中医药图书情报杂志,2015,39(5):1.[doi:10.3969/j.issn.2095-5707.2015.05.001]
CUI Meng,LI Hai-yan,YANG Shuo,et al.Research Progress in the Theoretical Science of Traditional Chinese Medicine Informatics[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2015,39(2):1.[doi:10.3969/j.issn.2095-5707.2015.05.001]
[8]许慧,李宝琴*,周莹,等.大数据背景下高校图书馆数字化资源共享研究[J].中国中医药图书情报杂志,2015,39(6):13.[doi:10.3969/j.issn.2095-5707.2015.06.003]
XU Hui,LI Bao-qin*,ZHOU Ying,et al.Study on Digitalizing Resource Sharing in Libraries of College and University under the Background of Big Data[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2015,39(2):13.[doi:10.3969/j.issn.2095-5707.2015.06.003]
[9]赵彦辉,黄均.移动互联网时代增强现实技术在智能图书馆个性化服务中应用探究[J].中国中医药图书情报杂志,2016,40(1):47.[doi:10.3969/j.issn.2095-5707.2016.01.010]
ZHAO Yan-hui,HUANG Jun.Study on Application of Augmented Reality Technology in the Intelligent Library Personalized Service in the Mobile Internet Era[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2016,40(2):47.[doi:10.3969/j.issn.2095-5707.2016.01.010]
[10]金玉.大数据时代公共图书馆数字文化治理实现路径研究[J].中国中医药图书情报杂志,2016,40(3):27.[doi:10.3969/j.issn.2095-5707.2016.03.007]
JIN Yu.Study on the Realization Path of Digital Culture Governance of Public Libraries during Big Data Era[J].Chinese Journal of Library and Information Science for Traditional Chinese Medicine,2016,40(2):27.[doi:10.3969/j.issn.2095-5707.2016.03.007]
备注/Memo
收稿日期:2015-02-27
更新日期/Last Update:
2015-03-24