Saturday, November 08, 2008

<R>andom Notes

1. how to estimate the running time of a R function?

R has a function proc.time()
sample code
## a way to time an R expression: system.time is preferred
> ptm <- proc.time()
> for (i in 1:50) mad(stats::runif(500))
> proc.time() - ptm
user system elapsed 
0.039 0.001 0.052 
## End(Not run)

2. string manipulation in R

define a string
> s = "some characters"

convert other type into a string
> s = as.character(some_variable_in_other_type)

Convert a string into numbers
> pi = as.numeric("3.14159")

string length

string concatenation
> s1 = "string1"
> s2 = "string2"
> paste(s1, s2, sep = "")

given a vector of strings, vs, return a string that is the concatenation of vs's elements
> vs = c("song", "qiang")
> paste(vs, collapse = "")
 "song qiang"

string splicing
suppose s is a string, how do we slice a substring of the s given starting position and ending position?
we use the following function. there is no default value for stop. it the value of stop is larger the the total
length of string, it is truncated to the length of the string
> substr(s, first = 1, stop = 12)

string split

> strsplit("song qiang", split=" ")
[1] "song" "qiang"

3. when making figures with legend box, the text expand out of legend box when we use dev.copy2eps()  to convert  the figure image to a eps file

This problem comes from the different specification of font sizes in difference devices. A ugly way to solve this problem is to specify text.width=strwidth("some string"),
where "some string"  refers to the longest legend text plus some extra characters. The optimal number of extra characters should be determined by trial and error.

4. How to handle exceptions in R?
Read about two functions try and tryCatch (R FAQ 7.32). An example with try is shown below:
for(i in 1:16)
   result <- try(nonlinear_modeling(i));
   if(class(result) == "try-error") next;

GNU/Linux Notes

GNU/Linux Notes

1. How to speed up my Linux booting?
See Bootchart
and remove unnecessary services in the booting process

2. One important thing to remember when creating a SVN repository
In Subversion 1.1, a repository is created with a Berkeley
DB back-end by default. This behavior may change in future
releases. Regardless, the type can be explicitly chosen with
the --fs-type argument:
$ svnadmin create --fs-type fsfs /path/to/repos
$ svnadmin create --fs-type bdb /path/to/other/repos

Do not create a Berkeley DB repository on a network
share—it cannot exist on a remote
filesystem such as NFS, AFS, or Windows SMB. Berkeley DB
requires that the underlying filesystem implement strict POSIX
locking semantics, and more importantly, the ability to map
files directly into process memory. Almost no network
filesystems provide these features. If you attempt to use
Berkeley DB on a network share, the results are
unpredictable—you may see mysterious errors right away,
or it may be months before you discover that your repository
database is subtly corrupted.
If you need multiple computers to access the repository,
you create an FSFS repository on the network share, not a
Berkeley DB repository. Or better yet, set up a real server
process (such as Apache or svnserve), store
the repository on a local filesystem which the server can
access, and make the repository available over a network.
Chapter 6, Server Configuration covers this process in
3. count file numbers in a directory and its directory

total number of files
find . some_directory|wc -l

list number of files in each directory in detail
#! /usr/bin/python

import os
import sys

def count(p):
if not os.path.isdir(p):
print "%s\t%d" % (p, 1)
return 1

pls = os.listdir(p)
s = 0
for d in pls:
if os.path.isdir(d):
s += count(d)
s += 1

print "%s\t%d " % (p, s)
return s

p = sys.argv[1]

4.  Ubuntu DNS Server Problem
Problem Description:  I run Ubuntu 9.04 on my computer and use Wicd (Wired and Wireless Network Manager) to configure network settings. However, sometimes when I use wireless network, Wicd is able to connect to routers (pingable), but it fails to parse domain names. There is something wrong with DNS server.

Tentative Solution: 1) First disable all settings related to DNS inside Wicd, i.e. do not use either static or global DNS server; 2) edit /etc/resolv.conf, add available DNS servers; 3) restart computer. 4) [Optional] sometimes if we configure wicd to automatically connect and use static DNS server, Wicd freezes while setting static server. In this case, we can edit /etc/wireless-settings.conf to disable automatic connection and static DNS server.

5. How to rename files or directories in order to remove white spaces in the filename?
for i in $(ls -1 *|grep " "); do
     mv "$i" $(echo $i|sed 's/ /-/g');

6. How to backup files (or directories) with tar and 7-zip?
First we create tar balls with the tar utility and then compress the tar balls with the 7z program.  If the content of the file is sensitive, you can encrypt it with the internal encryption option in 7z or with GPG. The code is as following:
for i in *; do
     tar cfv "$i.tar" "$i" && \
     7z a "$i.tar.7z" "$i.tar" && \
     # rm -rf "$i" && \
     # rm -rf "$i.tar"; done

7. how do I output the matching regex pattern in a line?
use grep -o PATTERN.