Grand Prismatic Spring Lab: 2007

Sunday, November 25, 2007

How to read and write NTFS partition in Mac OS X 10.5 Leopard

Boot Camp in Mac OS X 10.5 Leopard enables user to easily install Windows XP together with Mac OS X on a intel-based apple computer. However Mac OS X 10.5 can natively read, but not write, NTFS partition which is commonly used by Windows XP. In order to both read and write Windows partition in Mac OS, we can format Windows partition as FAT. Here is another approach to bypass this limitation.

1. Download and install MacFUSE for Mac OS X 10.5
2. Download and install NTFS-3g for Mac OS
3. Restart! If you are lucky you can try read and write your Windows' NTFS partition now.

This approach work on my Mac OS X 10.5 + 10.5.1 updates and Windows XP SP2 with NTFS partition on MacBook Pro (Model Identifier MacBookPro3,1).

Before you decide to proceed, google "read write NTFS Mac OS X leopard" to be informed of newest advancement and check for latest versions of MacFUSE and NTFS-3g.

Reference
1. MacFUSE http://code.google.com/p/macfuse/
2. NTFS-3g http://www.ntfs-3g.org/
3. NTFS-3g for Mac OS http://macntfs-3g.blogspot.com/
4. Filesystem in Userspace http://en.wikipedia.org/wiki/Filesystem_in_Userspace
5. Filesystem in Userspace http://fuse.sourceforge.net/wiki/index.php/FileSystems
6. NTFS on your Mac http://www.tuaw.com/2007/11/19/ntfs-on-your-mac-two-ways/

Monday, March 12, 2007

Modeling Biomedical Networks

steady state vs. equilibrium
If the rate of change of all variables (concentrations of matters) are constant we get a steady state. If Additionally all reactions fluxes are zero, we have an equilibrium.

Calculating steady state
There are several numerical methods to calculate steady state, such as improved Newton method, forward integration and backward integration. However none of them are perfect even to find a steady state in complex systems, which may have several steady states.

Metabolic Control Analysis
MCA describes how the systems reacts to changes of parameters. Elasticities describes how the reaction rates depend on the metabolite concentrations. Control coefficients describes how the systems behavior depend on the reaction rates

References:
http://projects.eml.org/downloads/copasi/CopasiTutorial.pdf

Wednesday, March 07, 2007

Install Matlab R2006b

I decide to reinstall MATLAB R2006b mostly because of a new toolbox SymBiology

SimBiology extends MATLAB with tools for modeling, simulating, and analyzing biochemical pathways. You can create your own block diagram model using predefined blocks. You can manually enter in species, parameters, reactions, rules, kinetic laws, and units, or read in Systems Biology Mark-Up Language (SBML) models. SimBiology lets you simulate a model using stochastic or deterministic solvers and analyze your pathway with tools such as parameter estimation and sensitivity analysis.

First get the following MATLAB ISO images at ftp://pxe/software/Matlab2006b (perhaps only available for LAN of USTC)

[Mathworks.Matlab].Mathworks.Matlab.R2006b.UNIX.ISO-TBE-CD1.iso [Mathworks.Matlab].Mathworks.Matlab.R2006b.UNIX.ISO-TBE-CD2.iso [Mathworks.Matlab].Mathworks.Matlab.R2006b.UNIX.ISO-TBE-CD3.iso [Mathworks.Matlab].Mathworks.Matlab.R2006b.UNIX.ISO-TBE.nfo

mount these images and enter the directory where you want to install matlab, create a matlab directory ($MATLAB).

Copy the license file from the first CD. There two license files in CD1/crack license_locked.dat license_server.dat. I copy license_locked.dat to $MATLAB and rename it license.dat. Enter $MATLAB
run CD1/install. The graphic interface is easy to complete.

When I finished the normal install and tried to run matlab. It poped a very lengthy error message java.lang.ExceptionInInitializerError at com.mathworks.mde.filebrowser.FileBrowser.(FileBrowser.java:92) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
and collapsed thereafter. But if I run matlab -nojvm, it worked normally.

Solution: the java compiler that comes together with MATLAB caused the above error. Replace it with my own version of java (jre1.5.0_06)

cd $MATLAB/sys/java
mv java java-backup
ln -s path_of_your_own_java java

And then MATLAB works now. Bingo!

PS: kkk recommended another standalone software, Copasi, to build and simulate biomedical networks. Have a look at it.

COPASI is a software application for simulation and analysis of biochemical networks. COPASI — a COmplex PAthway SImulator. Bioinformatics 22, 3067-74.

Current Features:
Stochastic and deterministic time course simulation
Steady state analysis (including stability)
Metabolic control analysis / sensitivity analysis
Elementary mode analysis
Mass conservation analysis
Calculation of Lyapunov exponents
Parameter scans
Optimization of arbitrary objective functions
Parameter estimation using data from time course and/or steady state experiments
Sliders for interactive parameter changes
Global parameter to change multiple kinetic rates at once
Imports and exports SBML (export only in level 2 version 1, import all levels)
Loads Gepasi files
Export in Berkeley Madonna format and C source code of the ODE system generated from the chemical reactions
Versions for MS Windows, Linux, OS X, and Solaris SPARC
Command line version for batch processing
Visit this page often, new releases will contain many more features!

Still No Sense of Signaling Network Research

As the time of graduation is approaching, I still have no a clear sense of my research subject-insulin signaling network. I would like to admit my laziness and it is mostly because it is a very new and unclear research area. If I have also started with a traditional research, cell culture, gene cloning and purification of proteins, I would mostly finish my research. And now it is too late to switch to an easy topic and it is stupid to do that. Thank that I have read many enlightening papers in this area and learn to use some softwares, why should I give up. It won't be very difficult to graduate no matter what research you have did. It is just a try.

After I realized the above idea, I decided to read systematically publications in this area. Today I am reading the Science STKE Signaling Breakthroughs of the Year. And now another list of paper to be read (The number of papers in this list is increasing expotentially, I don't know when can I have my sense of them)

[1]G. Altan-Bonnet, R. N. Germain, Modeling T cell antigen discrimination based on feedback control of digital ERK responses. PLoS Biol. 3, e356 (2005).[CrossRef][Medline]

[2]J. R. Pomerening, S. Y. Kim, J. E. Ferrell, Jr., Systems-level dissection of the cell-cycle oscillator: Bypassing positive feedback produces damped oscillations. Cell 122, 565–578 (2005).[CrossRef][Medline]

[3]O. Brandman, J. E. Ferrell, Jr., R. Li, T. Meyer, Interlinked fast and slow positive feedback loops drive reliable cell decisions. Science 310, 496–498 (2005).[Abstract/Free Full Text]

Friday, March 02, 2007

Paper Analysis -2007-03-02

Reconstruction of Cellular Signaling Networks and Analysis of Their Properties Nature Reviews Molecular Cell Biology 6, 99-111 (2005); doi:10.1038/nrm1570

A NETWORK RECONSTRUCTION includes a chemically accurate representation of all of the biochemical events that are occurring within a defined signalling network, and incorporates the interconnectivity and functional relationships that are inferred from experimental data.

This article give a enlightening theoretical analysis of signal transduction networks: the order of magnitude of numbers of network components (receptor, kinase, phophatase), the order of magnitude of interconnectivity(~2.5 degree of interconnectivity per component). We can use Combinatorial Complexity to characterize this idea. The catalog of network components without post-translational modification can be inferred from the results the genome annotation. The spectrom of network components after PTM and protein-protein interaction during varies states of the network is expected to be assayed with future proteomic experimental techniques (though I feel passive with expectation). But what use or what consequences of these large potential spectrum of various network components means?

The following paper it refers may be worth reading.

[1]
Papin, J. A. & Palsson, B. O. The JAK–STAT signaling network in the human B-cell: an extreme signaling pathway analysis. Biophys. J. 87, 37–46 (2004).

[2]
Resat, H., Wiley, H. S. & Dixon, D. A. Probability-weighted dynamic Monte Carlo method for reaction kinetics simulations. J. Phys. Chem. B 105, 11026–11034 (2001)

[3]
Bhalla, U. S. & Iyengar, R. Emergent properties of networks of biological signaling pathways. Science 283, 381–387 (1999).
Describes some of the first large-scale analyses of signalling reactions.

[4]
Hoffmann, A., Levchenko, A., Scott, M. L. & Baltimore, D. The IkappaB–NF-kappaB signaling module: temporal control and selective gene activation. Science 298, 1241–1245 (2002).
Shows the powerful integration of mathematical modelling with experimental investigation

[5]
Lee, E., Salic, A., Kruger, R., Heinrich, R. & Kirschner, M. W. The roles of APC and Axin derived from experimental and theoretical analysis of the Wnt pathway. PLoS Biol. 1, 116–132 (2003).

[6]
Prill, R., Iglesias, P.A. and Levchenko, A. Dynamic Properties of Small Regulatory Motifs Contribute to Biological Network Organization. PLoS Biology 3(11): e343 (2005)

[7]
Sivakumaran, S., Hariharaputran, S., Mishra, J. & Bhalla, U. S. The database of quantitative cellular signaling: management and analysis of chemical kinetic models of signaling networks. Bioinformatics 19, 408–415 (2003)

Thursday, March 01, 2007

Omics is Just a Startup

When I was listening the report titled Using Genomics to Explore the Microbial World by Prof. James Tiedje this afternoon, an idea had been daunting in my mind all the time. "Omics is dead" -I forgot where I read this remarks, but it stroke me then and now. Omics is like listing all the components of a computer. However, due to technique limitations and time constraints, we will never be able to get a full list of genes and proteins, though genomics and proteomics optimisticly promised. Even if we could get the full catalogue of human machine, we still can not understand how human body functions and malfunctions, as knowing all the components of a computer does not necessarily imply understanding its working.

Now besides proteomics and genomics, here comes the metabolomics, with similar promising declarations. As the lates Nature essay (Meet the human metabolome)states,

Metabolomics is the study of the raw materials and products of the body's biochemical reactions, molecules that are smaller than most proteins, DNA and other macromolecules. The aim is to be able to take urine, blood or some other body fluid, scan it in a machine and find a profile of tens or hundreds of chemicals that can predict whether an individual is on the road to a disease, say, or likely to experience side-effects from a particular drug.

In fact, researchers in metabolomics are even more optimistic, declaring that

Small changes in the activity of a gene or protein (which may have an unknown impact on the workings of a cell) often create a much larger change in metabolite levels particular concentrations and combinations can reveal something about drugs or disease

However, I am suspecious about their promise. First, considering the great diversity of metabolites in human fluids, we still have not a powerful enough assay to identify the all metabolite in a high-throughout manner and measure their concentrations. Second, the changes in the metabolome is more susceptible to enviromental factors, thus it will be difficult to tell significant changes related to human diseases from temporal fluctuations.

Anyway, let be a little optimistic, omics is just a startup!

Monday, February 05, 2007

Owe Ohler

Owe Ohler's research focus on sequence analysis. His previous ans current research projects include: Regulation of gene expression in Arabidopsis root development; Prediction and validation of skipped mammalian exons; Analysis of transcription start sites in fungal genomes; Motif finding with Bayesian approaches; Identification of core promoter elements in Drosophila; Post-transcriptional regulation with RNA-binding proteins; Regulation of neuronal gene expression in C elegans Pavel Tomancak; Embryonic expression patterns in Drosophila. It is worth mentioning that his research also deal with gene expression analysis, but I am not familiar with his thoughts and methods in this field. So I will focus on part of his research: alternative splicing site identification and promoter prediction.

Ohler U, Shomron N, Burge CB (2005) Recognition of Unknown Conserved Alternatively Spliced Exons. PLoS Comput Biol 1(2): e15 doi:10.1371/journal.pcbi.0010015

Ohler has scientific collaboration with Christopher B. Burge, from MIT, probably a BIG guy in this area. Pay attention to him.

What use is the identification of alternative splicing sites of. The author says that "The identification of such variants has until recently relied solely on the sequencing and comparison of expressed sequence tags (ESTs), but the number of available ESTs is not large enough to cover all variants under all conditions" According a Nature Genetics Review, which I reviewed in last post, the development of microarray platform for finding unknown exons are on the way. Probably, even a microarray experiment can not still covers all variants under all conditions. Thus a preliminary computational prediction gives many possible alternative splicing sites, among which many may be false positive, which can be tested by a microarray experiment. Such prediction may also help the design of the array.

Method: pair hidden Markov model

Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification RNA (2004), 10:1309-1322

Quantification of transcription factor expression from Arabidopsis images Bioinformatics 2006 22(14):e323-e331; doi:10.1093/bioinformatics/btl228

In spite of the great success of microarray technique in gene expression profiling, it fails to detect spatial features of gene expression, thus the confocal microscopy can also provide quantitative information of gene expression with greater spatial and temporal resolution. This paper describes a software protocol of analyzing confocal microscopy images. (How the high-throughput is achieved?)

imagine registration
GFP transcriptional fusion GFP serves as marker of mRNA expression level
GFP translational fusion

Monday, January 29, 2007

Paper Analysis: Microarray technology: beyond transcript profiling and genotype analysis

Microarray technology: beyond transcript profiling and genotype analysis
Nature Reviews Genetics 7, 200-210 (March 2006) | doi:10.1038/nrg1809

I have spent nearly three days reading this review on microarray. It is partly because this paper involves too many new concepts for me to digest, partly because, I have to admit, I have wasted too much time on BBS, films and music ^_^. Even until now I still cannot declare to absorb all materials in this paper, but i think it is better to take some notes here for it may urge me to concentrate on research.

This paper describe the following microarray development

Process	Status^*
^*From most to least developed: mature, in progress, under development, early stages, pilot phase, idea. CGH, comparative genomic hybridization; ChIP-on-chip, on-chip chromatin immunoprecipitation.
Transcriptional profiling	Mature, but still to be improved
Genotyping	Mature, but still to be improved
Splice-variant analysis	In progress
Identification of unknown exons	Early stages
DNA-structure analysis	Pilot phase
ChIP-on-chip	In progress
Protein binding	Under development
Protein–RNA interaction	Idea
Chip-based CGH	In progress
Epigenetic studies	Under development
DNA mapping	Mature
Resequencing	In progress
Large-scale sequencing	Under development
Gene/genome synthesis	Early stages
RNA/RNAi synthesis	Pilot phase
Protein–DNA interaction	Under development
On-chip translation	Under development
Universal microarray	Under development

He thoughts transcriptional profiling is relative in technique but the data analysis and interpretation. Some organization are take effect in this path, such as Microarray Gene Expression Data (MGED) Society, Gene Ontology Consortium and Bioconductor.

Expanding RNA studies the transcried RNA profile is a mixture of pre-mRNA, various form of alternative spliced mature mRNA, non-coding RNA and regualatory RNA. If we think about the effect of alternative splicing, it is possible that we may ignorant other forms and exons in the genome sequence which is not seen in our experiement samples. Then how to know other exons and what condition they are retained in mature mRNA, we can built an array consisting of oligonucleotide representing all known exons from genome annotation analysis. This array can then be used for the above condition.

Another question arising is that how can we find exons that escape the notice of genome annotation analysis. "One option is to synthesize oligonucleotides that correspond to the sequences at the exon–intron boundaries with their 5' ends attached to the chip surface "

Another approach is the entire genome microarray (tiling path), but the fragment is rather long which may miss some active sites of interest.

ChIP-on-chip on-chip chromatin immunoprecipitation. But, how this technique get high throughput if only one kind of protein can be precipitated due to the specificity of antibody binding? Needs more reading to understand this technique.

The author also predicted that " all analyses that are carried out with DNA are feasible at the level of RNA also."

comparative genomic hybridization (CGH), a method that is used to analyse variations in DNA copy number

The following part of this paper describes on demand sythesis based on microfluidic microarray, such as probe production (parallel production of large amount of different of oligomers), gene synthesis, RNAi production and protein in situ synthesis. Finally he introduced universal microarray platform based on L-DNA with great enthusiasm.

Conclusions:
1. To some extent, microarray technique means a new data-driven method e.g placing data production before intellectual concepts. This method is different from traditional hypothesis driven research in biology but is successful in physics.
2. The global view obtained by microarray approaches might lead researchers to appreciate more complexity of biological systems.
3. Experimental multiplexing by analysing different processes on a single system platform will become important. The in vitro systems biology will emerge competing (or complementing) in silico systems biology.

Here is a list of notable research project about microarray analysis http://filtr.blogspot.com/2007/02/research-projects-on-microarray.html

Monday, January 22, 2007

Xianghong Zhou's Papers

If you know the enemy and know yourself, you need not fear the result ofa hundred battles.

--Sun Tze, the Art of War

Comments on Zhou's papers:
1. Gene Aging Nexus: A Web Database and Data Mining Platform for Microarray Data on Aging
keywords:
meta-analysis: by first extracting expression patterns form individual microarray datasets and then identifying recurrent signals, these approaches may enhance signal-noise separation.
differential expression analysis:
co-expression analysis: Zhou proposed a new method to mine regulatory modules in previous papers Mining dense subgraphs across massive biological networks for functional discovery.
no major biological breakthrough.

2. Integrative missing value estimation for microarray data
Question Answered:
Due to the inherent noise and the limitation of experimental systems, a microarray dataset on average has more than 5% missing values, affecting more than 60% of the genes. Such missing values made some subsequent analysis methods inapplicable or greatly decrease their performance. Thus the question of missing value estimation.

Basic Idea:
How to choose neighboring genes when not enough information is available in internal microarray dataset. Intuitively, if a set of genes frequently show expression similarity to the target gene over multiple data sets, they constitute a robust neighborhood which tend to show expression co-variations with the target gene.

other concepts:
LLS Local Least Square
Bayesian principle component analysis
singular value decomposition
support vector machines

Grand Prismatic Spring Lab