Our lab focuses on next-generation sequencing data analysis, software development and the application of machine learning to bioinformatics and medical imaging.
NGS has become the standard tool to the quest to understand genomics, epigenomics and their relationship to human diseases. We have extensive experience analyzing large-scale NGS data. In the past few years, our group has analyzed more than 10,000 NGS samples in the total of more than 100TB in collaboration with researchers from Mount Sinai as wells as other institutes. Based on these results, we have published many papers in top-tier journals, such as Nature, Science, Nature Medicine and Nature Genetics.
We are also actively participating in developing software tools for NGS data analysis. Two examples are ngs.plot and diffReps. ngs.plot is a very popular tool for NGS data visualization and exploration. It is very versatile and memory efficient. Since its publication in 2015, the paper has been cited for more than 170 times. diffReps is a package for the differential analysis of ChIP-seq (count) data, taking into consideration of biological replicates. It is also popular among bioinformaticians.
Machine learning is an essential tool for building models on complex, high-dimensional data. Besides our daily usage of clustering, dimensionality reduction and generalized linear models on high-throughput biological data, we have also made novel contributions on the following two fields: automated genome segmentation and mammography based diagnosis for breast cancer. Genome segmentation aims to label the genomic sequence with functional categories using large-scale ChIP-seq data, which is generally an unsupervised problem, or semi-supervised at most. We combined hidden Markov models with artificial neural networks to significantly improve the accuracy from previous methods. In breast cancer diagnosis, the goal is to predict a patient’s likelihood to develop cancers within a certain period based on mammograms. We developed deep learning models based on an all-convolutional design to achieve top-notch performance.
Machine learning is an essential tool for bioinformatics, genomics and biomedical imaging. Dr. Shen has a long-term interest in machine learning research and its application to the bioinformatics field. His recent interests include:
- Breast cancer diagnosis using deep convolutional neural networks and mammograms:
– L. Shen, “End-to-end Training for Whole Image Breast Cancer Diagnosis using An All Convolutional Design,” arXiv:1708.09427 [cs, stat], Aug. 2017. Companion website: https://github.com/lishen/end2end-all-conv– L. Shen, “Breast cancer diagnosis using deep residual nets and transfer learning,” 2017. [Online]. Available: https://www.synapse.org/#!Synapse:syn9773182/wiki/426912. [Accessed: 23-Aug-2017].
- Automated genome segmentation based on ChIP-seq using hidden Markov models and neural networks:
– L. Shen, “Automatic genome segmentation with HMM-ANN hybrid models,” presented at the Workshops on Machine Learning in Computational Biology at the NIPS, 2015. bioRxiv: http://www.biorxiv.org/content/early/2016/10/29/034579
During his postdoctoral training at UCSD, Dr. Shen studied transcriptional regulation using genomic data and computational methods. His main contributions include:
- Predicting gene expression from DNA sequences using Bayesian networks:
– L. Shen, J. Liu, and W. Wang, “GBNet: Deciphering regulatory rules in the co-regulated genes using a Gibbs sampler enhanced Bayesian network approach,” BMC Bioinformatics, vol. 9, no. 1, p. 395, 2008.
- A gene clustering algorithm using mixture modeling:
– K.-J. Won, S. Agarwal, L. Shen, R. Shoemaker, B. Ren, and W. Wang, “An Integrated Approach to Identifying Cis-Regulatory Modules in the Human Genome,” PLoS ONE, vol. 4, no. 5, p. e5501, 2009.
During his Ph.D. study, Dr. Shen studied the problem of cancer classification using microarray data. His main contributions include:
- Binary cancer classification using dimensionality reduction and logistic regression:
– L. Shen and E. C. Tan, “Dimension Reduction-Based Penalized Logistic Regression for Cancer Classification Using Microarray Data,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 02, no. 2, pp. 166–175, 2005.
- Reducing multi-classification into binary ones and combine them with error-correction output coding:
– L. Shen and E. C. Tan, “Reducing multiclass cancer classification to binary by output coding and SVM,” Computational Biology and Chemistry, vol. 30, no. 1, pp. 63–71, 2006.
Our lab is actively involved in developing bioinformatic software for next-generation sequencing (NGS) and gene analysis. Our lab’s has its own Github site. Dr. Shen also has his own Github profile, mainly machine learning related projects.
- ngs.plot is a very popular tool for NGS data visualization and exploration. It features a very versatile pipeline that allows anyone to visualize sequence alignment pileups at a collection of genomic regions and summarize the enriched patterns. It is carefully engineered to index and read large NGS data efficiently without causing memory explosion. Since its publication in 2015, the paper has been cited for >170 times. Access the paper here.
- diffReps is a tool for ChIP-seq differential analysis considering biological replicates. Differential analysis is one of the most commonly performed tasks for biological studies, which needs to consider both mean differences and variation among two or more groups. There are many programs for comparing RNA-seq samples but much less for comparing ChIP-seq samples. The main reason is that ChIP-seq is more difficult to deal with because there is no predefined region and the entire genomic sequence has to be considered. diffReps uses a sliding window approach to perform millions of negative binomial tests on the genomic sequence and then merge and summarize the results into a text report. Since its publication, it has remained one of the most popular tools for ChIP-seq differential analysis.
- GeneOverlap is a small utility package for comparing multiple gene lists and determine the significance of their overlaps. It solves the statistical problem with Fisher’s exact tests. Although relatively simpler than ngs.plot and diffReps, it is among the top 20% downloaded packages on Bioconductor. Why is that? It’s simply because people have so many gene lists to compare against one another!
- Our lab has also developed SPEctRA – an RNA-seq processing pipeline and chip-seq_preprocess – a ChIP-seq preprocessing pipeline. Both feature multi-processing and can run seamlessly on an HPC cluster or a local workstation.
- Bagot, R.C., Cates, H.M., Purushothaman, I., Vialou, V., Heller, E.A., Yieh, L., LaBonté, B., Peña, C.J., Shen, L., Wittenberg, G.M., et al. (2017). Ketamine and Imipramine Reverse Transcriptional Signatures of Susceptibility and Induce Resilience-Specific Gene Expression Profiles. Biological Psychiatry 81, 285–295.
- Descalzi, G., Mitsi, V., Purushothaman, I., Gaspari, S., Avrampou, K., Loh, Y.-H.E., Shen, L., and Zachariou, V. (2017). Neuropathic pain promotes adaptive changes in gene expression in brain networks involved in stress and depression. Sci. Signal. 10, eaaj1549.
- Engmann, O., Labonté, B., Mitchell, A., Bashtrykov, P., Calipari, E.S., Rosenbluh, C., Loh, Y.-H.E., Walker, D.M., Burek, D., Hamilton, P.J., et al. (2017). Cocaine-Induced Chromatin Modifications Associate With Increased Expression and Three-Dimensional Looping of Auts2. Biological Psychiatry.
- Feng, J., Pena, C.J., Purushothaman, I., Engmann, O., Walker, D., Brown, A.N., Issler, O., Doyle, M., Harrigan, E., Mouzon, E., et al. (2017). Tet1 in Nucleus Accumbens Opposes Depression- and Anxiety-Like Behaviors. Neuropsychopharmacology.
- Jiang, Y., Loh, Y.-H.E., Rajarajan, P., Hirayama, T., Liao, W., Kassim, B.S., Javidfar, B., Hartley, B.J., Kleofas, L., Park, R.B., et al. (2017). The methyltransferase SETDB1 regulates a large neuron-specific topological chromatin domain. Nat Genet advance online publication.
- Loh, Y.-H.E., Koemeter-Cox, A., Finelli, M.J., Shen, L., Friedel, R.H., and Zou, H. (2017a). Comprehensive mapping of 5-hydroxymethylcytosine epigenetic dynamics in axon regeneration. Epigenetics 12, 77–92.
- Loh, Y.-H.E., Feng, J., Nestler, E., and Shen, L. (2017b). Bioinformatic Analysis for Profiling Drug-induced Chromatin Modification Landscapes in Mouse Brain Using ChlP-seq Data. Bio-Protocol 7, e2123.
- Peña, C.J., Kronman, H.G., Walker, D.M., Cates, H.M., Bagot, R.C., Purushothaman, I., Issler, O., Loh, Y.-H.E., Leong, T., Kiraly, D.D., et al. (2017). Early life stress confers lifelong stress susceptibility in mice via ventral tegmental area OTX2. Science 356, 1185–1188.
- Shen, L. (2017a). End-to-end Training for Whole Image Breast Cancer Diagnosis using An All Convolutional Design. ArXiv:1708.09427 [Cs, Stat].
- Shen, L. (2017b). Breast cancer diagnosis using deep residual nets and transfer learning.
- Wan, L., Wen, H., Li, Y., Lyu, J., Xi, Y., Hoshii, T., Joseph, J.K., Wang, X., Loh, Y.-H.E., Erb, M.A., et al. (2017). ENL links histone acetylation to oncogenic gene expression in acute myeloid leukaemia. Nature 543, 265–269.
- Bagot, R.C., Cates, H.M., Purushothaman, I., Lorsch, Z.S., Walker, D.M., Wang, J., Huang, X., Schluter, O.M., Maze, I., Pena, C.J., et al. (2016). Circuit-wide Transcriptional Profiling Reveals Brain Region-Specific Gene Networks Regulating Depression Susceptibility. Neuron 90, 969–983.
- Cahill, M.E., Bagot, R.C., Gancarz, A.M., Walker, D.M., Sun, H., Wang, Z.-J., Heller, E.A., Feng, J., Kennedy, P.J., Koo, J.W., et al. (2016). Bidirectional Synaptic Structural Plasticity after Chronic Cocaine Administration Occurs through Rap1 Small GTPase Signaling. Neuron 89, 566–582.
- Damez-Werno, D.M., Sun, H., Scobie, K.N., Shao, N., Rabkin, J., Dias, C., Calipari, E.S., Maze, I., Pena, C.J., Walker, D.M., et al. (2016). Histone arginine methylation in cocaine action in the nucleus accumbens. PNAS 113, 9623–9628.
- Lepack, A.E., Bagot, R.C., Peña, C.J., Loh, Y.-H.E., Farrelly, L.A., Lu, Y., Powell, S.K., Lorsch, Z.S., Issler, O., Cates, H.M., et al. (2016). Aberrant H3.3 dynamics in NAc promote vulnerability to depressive-like behavior. PNAS 113, 12562–12567.
- Loh, Y.-H.E., Koemeter-Cox, A., Finelli, M.J., Shen, L., Friedel, R.H., and Zou, H. (2016). Comprehensive mapping of 5-hydroxymethylcytosine epigenetic dynamics in axon regeneration. Epigenetics 0, 1–16.
- Pfau, M.L., Purushothaman, I., Feng, J., Golden, S.A., Aleyasin, H., Lorsch, Z.S., Cates, H.M., Flanigan, M.E., Menard, C., Heshmati, M., et al. (2016). Integrative Analysis of Sex-Specific microRNA Networks Following Stress in Mouse Nucleus Accumbens. Front. Mol. Neurosci. 9.
- Shen, E.Y., Jiang, Y., Javidfar, B., Kassim, B., Loh, Y.-H.E., Ma, Q., Mitchell, A.C., Pothula, V., Stewart, A.F., Ernst, P., et al. (2016). Neuronal Deletion of Kmt2a/Mll1 Histone Methyltransferase in Ventral Striatum is Associated with Defective Spike-Timing-Dependent Striatal Synaptic Plasticity, Altered Response to Dopaminergic Drugs, and Increased Anxiety. Neuropsychopharmacology 41, 3103–3113.
- Sun, H., Martin, J.A., Werner, C.T., Wang, Z.-J., Damez-Werno, D.M., Scobie, K.N., Shao, N.-Y., Dias, C., Rabkin, J., Koo, J.W., et al. (2016). BAZ1B in Nucleus Accumbens Regulates Reward-Related Behaviors in Response to Distinct Emotional Stimuli. The Journal of Neuroscience 36, 3954–3961.
- Wang, T., Santos, J.H., Feng, J., Fargo, D.C., Shen, L., Riadi, G., Keeley, E., Rosh, Z.S., Nestler, E.J., and Woychik, R.P. (2016). A Novel Analytical Strategy to Identify Fusion Transcripts between Repetitive Elements and Protein Coding-Exons Using RNA-Seq. PLoS ONE 11, e0159028.
- Yong-Hwee Eddie Loh, and Shen, L. (2016). Analysis and Visualization of ChIP-Seq and RNA-Seq Sequence Alignments Using ngs.plot. In Data Mining Techniques for the Life Sciences, O. Carugo, and F. Eisenhaber, eds. (New York, NY: Springer New York), pp. 371–383.
Shen, L. (2015) Automatic genome segmentation with HMM-ANN hybrid models. Workshops on Machine Learning in Computational Biology at the Annual Conference on Neural Information Processing Systems (NIPS). bioRxiv, http://dx.doi.org/10.1101/034579.
Sun, H., Damez-Werno, D.M., Scobie, K.N., Shao, N.Y., Dias, C., Rabkin, J., Koo, J.W., Korb, E., Bagot, R.C., Ahn, F.H., Cahill, M.E., Labonte, B., Mouzon, E., Heller, E.A., Cates, H., Golden, S.A., Gleason, K., Russo, S.J., Andrews, S., Neve, R., Kennedy, P.J., Maze, I., Dietz, D.M., Allis, C.D., Turecki, G., Varga-Weisz, P., Tamminga, C., Shen, L. and Nestler, E.J. (2015) ACF chromatin-remodeling complex mediates stress-induced depressive-like behavior, Nat Med, 21, 1146-1153.
Mitsi, V., Terzi, D., Purushothaman, I., Manouras, L., Gaspari, S., Neve, R.L., Stratinaki, M., Feng, J., Shen, L. and Zachariou, V. (2015) RGS9-2–controlled adaptations in the striatum determine the onset of action and efficacy of antidepressants in neuropathic pain states, Proceedings of the National Academy of Sciences, 112, E5088-E5097.
Ferguson, D., Shao, N., Heller, E., Feng, J., Neve, R., Kim, H.-D., Call, T., Magazu, S., Shen, L. and Nestler, E.J. (2015) SIRT1-FOXO3a Regulate Cocaine Actions in the Nucleus Accumbens, The Journal of Neuroscience, 35, 3100-3111.
Ding, J., Huang, X., Shao, N., Zhou, H., Lee, D.F., Faiola, F., Fidalgo, M., Guallar, D., Saunders, A., Shliaha, P.V., Wang, H., Waghray, A., Papatsenko, D., Sanchez-Priego, C., Li, D., Yuan, Y., Lemischka, I.R., Shen, L., Kelley, K., Deng, H., Shen, X. and Wang, J. (2015) Tex10 Coordinates Epigenetic Control of Super-Enhancer Activity in Pluripotency and Reprogramming, Cell Stem Cell, 16, 653-668.
Feng, J., Shao, N., Szulwach, K.E., Vialou, V., Huynh, J., Zhong, C., Le, T., Ferguson, D., Cahill, M.E., Li, Y., Koo, J.W., Ribeiro, E., Labonte, B., Laitman, B.M., Estey, D., Stockman, V., Kennedy, P., Courousse, T., Mensah, I., Turecki, G., Faull, K.F., Ming, G.-l., Song, H., Fan, G., Casaccia, P., Shen, L., Jin, P. and Nestler, E.J. (2015) Role of Tet1 and 5-hydroxymethylcytosine in cocaine action, Nat Neurosci, 18, 536-544.
Hodes, G.E., Pfau, M.L., Purushothaman, I., Ahn, H.F., Golden, S.A., Christoffel, D.J., Magida, J., Brancato, A., Takahashi, A., Flanigan, M.E., Ménard, C., Aleyasin, H., Koo, J.W., Lorsch, Z.S., Feng, J., Heshmati, M., Wang, M., Turecki, G., Neve, R., Zhang, B., Shen, L., Nestler, E.J. and Russo, S.J. (2015) Sex Differences in Nucleus Accumbens Transcriptome Profiles Associated with Susceptibility versus Resilience to Subchronic Variable Stress, The Journal of Neuroscience, 35, 16362-16376.
Maze, I., Wenderski, W., Noh, K.-M., Bagot, Rosemary C., Tzavaras, N., Purushothaman, I., Elsässer, Simon J., Guo, Y., Ionete, C., Hurd, Yasmin L., Tamminga, Carol A., Halene, T., Farrelly, L., Soshnev, Alexey A., Wen, D., Rafii, S., Birtwistle, Marc R., Akbarian, S., Buchholz, Bruce A., Blitzer, Robert D., Nestler, Eric J., Yuan, Z.-F., Garcia, Benjamin A., Shen, L., Molina, H. and Allis, C.D. (2015) Critical Role of Histone Turnover in Neuronal Transcription and Plasticity, Neuron, 87, 77-94.
Noh, K.M., Maze, I., Zhao, D., Xiang, B., Wenderski, W., Lewis, P.W., Shen, L., Li, H. and Allis, C.D. (2015) ATRX tolerates activity-dependent histone H3 methyl/phos switching to maintain repetitive element silencing in neurons, Proc Natl Acad Sci U S A, 112, 6820-6827.
Roadmap Epigenomics, Consortium (2015) Integrative analysis of 111 reference human epigenomes, Nature, 518, 317-330.
Dias, C., Feng, J., Sun, H., Shao, N.y., Mazei-Robison, M.S., Damez-Werno, D., Scobie, K., Bagot, R., LaBonte, B., Ribeiro, E., Liu, X., Kennedy, P., Vialou, V., Ferguson, D., Pena, C., Calipari, E.S., Koo, J.W., Mouzon, E., Ghose, S., Tamminga, C., Neve, R., Shen, L. and Nestler, E.J. (2014) [bgr]-catenin mediates stress resilience through Dicer1/microRNA regulation, Nature, advance online publication.
Heller, E.A., Cates, H.M., Pena, C.J., Sun, H., Shao, N., Feng, J., Golden, S.A., Herman, J.P., Walsh, J.J., Mazei-Robison, M., Ferguson, D., Knight, S., Gerber, M.A., Nievera, C., Han, M.H., Russo, S.J., Tamminga, C.S., Neve, R.L., Shen, L., Zhang, H.S., Zhang, F. and Nestler, E.J. (2014) Locus-specific epigenetic remodeling controls addiction- and depression-related behaviors, Nat Neurosci, 17, 1720-1727.
Maze, I., Shen, L., Zhang, B., Garcia, B.A., Shao, N., Mitchell, A., Sun, H., Akbarian, S., Allis, C.D. and Nestler, E.J. (2014) Analytical tools and current challenges in the modern era of neuroepigenomics, Nat Neurosci, 17, 1476-1490.
Feng, J., Wilkinson, M., Liu, X., Purushothaman, I., Ferguson, D., Vialou, V., Maze, I., Shao, N., Kennedy, P., Koo, J., Dias, C., Laitman, B., Stockman, V., LaPlant, Q., Cahill, M., Nestler, E. and Shen, L. (2014) Chronic cocaine-regulated epigenomic changes in mouse nucleus accumbens, Genome Biology, 15, R65.
Shen, L., Shao, N., Liu, X. and Nestler, E. (2014) ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases, BMC Genomics, 15, 284.
Scobie, K.N., Damez-Werno, D., Sun, H., Shao, N., Gancarz, A., Panganiban, C.H., Dias, C., Koo, J., Caiafa, P., Kaufman, L., Neve, R.L., Dietz, D.M., Shen, L. and Nestler, E.J. (2014) Essential role of poly(ADP-ribosyl)ation in cocaine action, Proceedings of the National Academy of Sciences, 111, 2005-2010.
Maze, I., Chaudhury, D., Dietz, D.M., Von Schimmelmann, M., Kennedy, P.J., Lobo, M.K., Sillivan, S.E., Miller, M.L., Bagot, R.C., Sun, H., Turecki, G., Neve, R.L., Hurd, Y.L., Shen, L., Han, M.H., Schaefer, A. and N
- Ferguson, D., Koo, J.W., Feng, J., Heller, E., Rabkin, J., Heshmati, M., Renthal, W., Neve, R., Liu, X., Shao, N., et al. (2013). Essential Role of SIRT1 Signaling in the Nucleus Accumbens in Cocaine and Morphine Action. The Journal of Neuroscience 33, 16088–16098.
- Shen, L., Shao, N.-Y., Liu, X., Maze, I., Feng, J., and Nestler, E.J. (2013a). diffReps: Detecting Differential Chromatin Modification Sites from ChIP-seq Data with Biological Replicates. PLoS ONE 8, e65598.
- Shen, L., Choi, I., Nestler, E.J., and Won, K.-J. (2013b). Human Transcriptome and Chromatin Modifications: An ENCODE Perspective. Genomics Inform 11, 60–67.
- Wang, T., Liu, J., Shen, L., Tonti-Filippini, J., Zhu, Y., Jia, H., Lister, R., Whitaker, J.W., Ecker, J.R., Millar, A.H., et al. (2013). STAR: an integrated solution to management and visualization of sequencing data. Bioinformatics 29, 3204–3210.
- Warren, B.L., Vialou, V.F., Iñiguez, S.D., Alcantara, L.F., Wright, K.N., Feng, J., Kennedy, P.J., LaPlant, Q., Shen, L., Nestler, E.J., et al. (2013). Neurobiological Sequelae of Witnessing Stressful Events in Adult Mice. Biological Psychiatry 73, 7–14.
- Kurita, M., Holloway, T., Garcia-Bea, A., Kozlenkov, A., Friedman, A.K., Moreno, J.L., Heshmati, M., Golden, S.A., Kennedy, P.J., Takahashi, N., et al. (2012). HDAC2 regulates atypical antipsychotic responses through the modulation of mGlu2 promoter activity. Nature Neuroscience 15, 1245–1254.
- Maze, I., Feng, J., Wilkinson, M.B., Sun, H., Shen, L., and Nestler, E.J. (2011). Cocaine dynamically regulates heterochromatin and repetitive element unsilencing in nucleus accumbens. Proceedings of the National Academy of Sciences 108, 3035–3040.
- Sun, H., Maze, I., Dietz, D.M., Scobie, K.N., Kennedy, P.J., Damez-Werno, D., Neve, R.L., Zachariou, V., Shen, L., and Nestler, E.J. (2012). Morphine Epigenomically Regulates Behavior through Alterations in Histone H3 Lysine 9 Dimethylation in the Nucleus Accumbens. The Journal of Neuroscience 32, 17454–17464.
- Hawkins, R.D., Hon, G.C., Lee, L.K., Ngo, Q., Lister, R., Pelizzola, M., Edsall, L.E., Kuan, S., Luu, Y., Klugman, S., et al. (2010). Distinct Epigenomic Landscapes of Pluripotent and Lineage-Committed Human Cells. Cell Stem Cell 6, 479–491.
- Shen, L., Liu, J., and Wang, W. (2008). GBNet: Deciphering regulatory rules in the co-regulated genes using a Gibbs sampler enhanced Bayesian network approach. BMC Bioinformatics 9, 395.
- Shen, L., Chepelev, I., Liu, J., and Wang, W. (2010). Prediction of quantitative phenotypes based on genetic networks: a case study in yeast sporulation. BMC Systems Biology 4, 128.
- Won, K.-J., Agarwal, S., Shen, L., Shoemaker, R., Ren, B., and Wang, W. (2009). An Integrated Approach to Identifying Cis-Regulatory Modules in the Human Genome. PLoS ONE 4, e5501.
- Shen, L., and Tan, E.C. (2004). Gene Selection for Cancer Classification from Microarray Data using PLS-RLSC. pp. 73–76.
- Shen, L., and Tan, E.C. (2005a). PLS and SVD Based Penalized Logistic Regression for Cancer Classification Using Microarray Data. (Imperial College Press), pp. 219–228.
- Shen, L., and Tan, E.C. (2005b). Dimension Reduction-Based Penalized Logistic Regression for Cancer Classification Using Microarray Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 02, 166–175.
- Shen, L., and Tan, E.C. (2005c). Combining Kernel Dimension-Reduction with Regularized Classifiers for Tissue Categorization. Online Journal of Bioinformatics 6, 91–98.
- Shen, L., and Tan, E.C. (2006a). A Generalized Output-Coding Scheme with SVM for Multiclass Microarray Classification. (Imperial College Press), pp. 179–186.
- Shen, L., and Tan, E.C. (2006b). Reducing multiclass cancer classification to binary by output coding and SVM. Computational Biology and Chemistry 30, 63–71.
- Eric Nestler
- Yasmin Hurd
- Schahram Akbarian
- Ian Maze
- Venetia Zachariou
- Hongyan Jenny Zou
- Pinxian Xu
- Anne Schaefer
- David Allis