Tomorrow's Health, Today's Research

Xuekui Zhang

Assistant Professor, Canada Research Chair in Biostatistics and Bioinformatics (Tier 2), Deparment of Mathematics and Statistics
This email address is being protected from spambots. You need JavaScript enabled to view it.
Phone: 250-721-7455
Department Page:
Research area: Statistical genomics,machine learning, design of clinical trials,neuroimaging

____________________________________________________________

Research Profile:

From A to G: How to read genetic code

Our genetic code holds the answers to a lot of questions. Cutting-edge sequencing techniques enable researchers to determine the precise order of nucleotides within millions of DNA fragments from whole genomes in a single sample – both quickly and economically. But how does one understand the meaning held within the massive set of data generated, a combination of just four letters: A, T, C, and G?

Ask Dr. Xuekui Zhang, a biostatistician in UVic’s Department of Mathematics and Statistics, who says that reading genetic code is as much about numbers, as it is about letters. Zhang spends much of his time developing algorithms and software that help biomedical researchers make sense of the big data generated from sequencing entire genomes.

 “I’m not a biologist. I’m not a geneticist. I’m a statistician,” says Zhang. “Because my research is problem-oriented, I need collaborators to come to me with good data and good questions, questions that I can help answer by developing novel statistical methods.”

Zhang wasn’t always a statistician, but he most certainly was always a mathematician. “I was in the Math Olympics from elementary school on,” says Zhang. Making the Chinese Mathematical Olympiad secured him a seat in a unique undergraduate program in mathematics at Nankai University. Zhang went on to complete a master’s degree in statistics at UBC, before joining a small start-up biotech company called Sirius Genomics.

“At Sirius Genomics, we were trying to figure out which biomarkers might predict the efficacy of a given drug on a given individual, so that patients could receive personalized medical treatment based upon their genetic makeup.”

 Statistical Genomics. From his work at Sirius, Zhang discovered a passion for applied biomedical statistics that drove him to complete a PHD in statistical genomics (also known as bioinformatics or computational biology) with a focus on high throughput sequencing (HTS). HTS allows hundreds of millions of DNA fragments to be sequenced simultaneously, producing a complete genetic blueprint more quickly, and at a much lower cost, than ever before.

“I found that it’s better to connect mathematics to other fields,” says Zhang. “If my work can be used by others to solve their own research problems, that’s more rewarding.”

A postdoctoral fellowship at Johns Hopkins University saw Zhang turning his attention to “time-course” HTS, a method that captures genetic changes over time. This added one more dimension – time – to an already huge and complex set of genomics data.

In response, Zhang has developed algorithms in epigenomics that enable researchers to better understand the interactions that occur between DNA and protein, and to identify the genome regions where protein biomarkers (such as transcription factors and histone modifications) bind on DNA.

Zhang spent more than 3 years working with Eli Lilly and AstraZeneca, before deciding to follow his heart and make the switch to academia, a change that allows him to pick the topics he wants to focus on. “I have seen both sides. I have some industry connections and I understand what they are looking for, which is important for conducting research that really benefits end-users.”

 Proteomics. Zhang’s research topic of choice at UVic continues to be next generation sequencing and big data analysis, along with proteomics, which involves looking at the whole protein, rather than DNA, complement.

“Looking at proteomics means that I can do more to help the people here at UVic, a lot of whom are studying proteins,” says Zhang. “The underlying algorithm should be very similar. It is simply the application that changes.”

Neuroimaging. Zhang is currently working with UVic’s Dr. Farouk Nathoo on integrated genomics and neuroimaging data. “We are trying to figure out which genes or DNA fragments are associated with functional changes in the brain by looking at the genomic variances in individuals. Is there a particular biomarker, or a distinct pattern, that we can associate with neurological changes?”

Another beneficiary of Zhang’s work is the BC Cancer Agency (BCCA), where a whole lot of next generation sequencing data is produced. “They tell me what isn’t working for them or what they don’t like about the currently available software and I develop novel algorithms to address their concerns.”

Collaboration is everything. “Its hard to get access to good data. You need to have good collaborators,” says Zhang. “…and be lucky.” And Zhang has been lucky.

So far, his collaborative work has included successful partnerships with: BCCA, Seattle’s Fred Hutchinson Cancer Research Centre, John’s Hopkins, BC Children’s Hospital, St. Paul’s Hospital/UBC Medical School, and colleagues at UVic.

All of Zhang’s collaborators are interested in understanding the mechanisms that are at play within genomics, asking questions that might include:

•    What are the biomarkers that are associated with lung diseases, such as COPD and cancer?
•    Which genome regions (DNA segments) are enriched by the binding of a given transcription factor or histone modification?
•    What is the nature of the relationship between specific protein-binding events and the expression levels of genes? A question that is akin to asking, “Which came first? The chicken or the egg?”

 “I want to have more and more software and see this number going up and up,” says Zhang, pointing to the download status of PICS: Probabilistic inference for ChIP-seq – one of the software programs he has developed.

“Ultimately, my goal is to build novel statistical methods for the genomics and proteomics research areas so that more researchers can access valuable software.”