Home

Files in the top-level directory in any check-in

  • CharisSIL-plot.png
  • README.md

May 3, 2021

An easy way to get semi-transparent colors in R is to use adjustcolor:

barplot(c(1, 2, 3), col = adjustcolor("firebrick", 0.5),
        main = "Firebrick bar plot with alpha=0.5")

April 26, 2021

Create a simple SQLite data base:

sqlite3 test.sqlite3
sqlite> .databases -- Show the databases

  main: /home/weigand/fossils/tilor/test.sqlite3

ATTACH DATABASE 'test.sqlite3' as 'project';

.databases

main: /home/weigand/fossils/tilor/test.sqlite3
   project: /home/weigand/fossils/tilor/test.sqlite3

DROP TABLE IF EXISTS person;
CREATE TABLE IF NOT EXISTS project.person(
  id INT PRIMARY KEY NOT NULL,
  date_birth CHAR(10),
  sex CHAR(2) CHECK(sex == 'M' OR sex == 'F')
);

INSERT INTO project.person VALUES(1, '2010-01-01', 'M');
INSERT INTO project.person VALUES(2, '2011-11-11', 'F');

SELECT * FROM main.person;

   1|2010-01-01|M
   2|2011-11-11|F

April 14, 2021

When using smbclient on Linux to put a file on a Windows share I wasn't specifying the local file path correctly. I learned about the lcd command in smbclient to set the local current directory and then I could put the filename without a full or relative path. Like this:

PASSWD=`grep -woi -m 1 '^machine mymachine login .* password .*$' ~/.netrc | cut -d ' ' -f 6`
smbclient //share.example.com/DirectoryX \
  -W MYWORKGROUP  \
  -E       \
  -U $USER $PASSWD \
  --command 'cd DirA/DirB ;
             lcd /local/working/directory ;
             put localfile.csv ;
             exit'

March 17, 2021

This is how to get interval estimates for random effects from a linear mixed effects model fit with lme4::lmer in R using the unexported lme4:::asDf0 function.

library(lme4)
library(lattice)
fit <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

dotplot(ranef(fit)) # Caterpillar plot shows intervals

out <- lme4:::asDf0(ranef(fit), "Subject")
out$lcl <- out$values - 1.96 * out$se
out$ucl <- out$values + 1.96 * out$se

out[c(1:2, 35:36), ]

values         ind .nn   se
1    2.259 (Intercept) 308 12.1
2  -40.399 (Intercept) 309 12.1
35  -0.988        Days 371  2.3
36   1.284        Days 372  2.3

March 10, 2021

On Linux & runs a command in the background. For example:

emacs README.md &

From PowerShell, the equivalent seems to be:

Start-Process -NoNewWindow emacs.exe README.md

The above is from a Web search where I landed on https://ariefbayu.xyz/run-background-command-in-powershell-8ea86436684e. That post shows how to make a general run-it-in-the-background command.

function bg() {
     Start-Process -NoNewWindow @args
}

And then I might use:

bg emacs.exe README.md

March 3, 2021

The Journal NeuroImage uses an open source font called Charis SIL. This can be downloaded and on my department's Linux system, I just have to put the *.ttf files in ~/.fonts and then I can create figures in "native" NeuroImage style:

library(ggplot2)

ggplot(cars, aes(x = speed, y = dist)) +
    geom_point() +
    labs(title = "Scatter plot in Charis SIL font used by NeuroImage",
         x = "Speed (miles per hour)",
         y = "Distance (feet)") +
    theme_light(base_family = "CharisSIL")

Scatter plot of distance versus speed demonstrating CharisSIL font

December 12, 2020

This is how I used Dropbox's Python API to upload my son's piano recital video. (My daughter's video was smaller and uploaded fine through the Dropbox web interface before timing out.)

My guide was Dropbox for Python Developers.

  1. Set-up a Python virtual environment and install dropbox

    • python3.7 -m venv ~/venvs/recital
    • source ~/venvs/recital/bin/activate
    • pip install --upgrade pip
    • pip install dropbox
  2. Linked my account in Dropbox to an app

    • Go to the app console
    • Created and named the app "recital". To keep it simple I gave the app read and write permissions across my whole Dropbox account.
    • Generated a short-lived and quite long access token (which lasts for four hours)
  3. Wrote Python code to create a dropbox instance, read the file into an object, and upload the object

import dropbox
from dropbox.files import WriteMode


dbx = dropbox.Dropbox('AVerySpecial138CharacterAccessToken')

with open('/home/weigand/recital/piano.m4v', 'rb') as f:
        data = f.read()

dbx.files_upload(data,                         # My file
                 '/recital-folder/piano.m4v',  # Full path of destination
                 mode=WriteMode('overwrite'))  # Want to overwrite

This obviously isn't "professional grade" Python. It works but doesn't handle errors and hard codes the filename.

December 10, 2020

CSVY format

The data.table package in R has a fast way to read CSV files in the form of data.table::fread and as of version 1.12.4 it has support for reading CSVY files. The CSVY format is a text format where there is a YAML header at the top of the file which defines the data "schema" and below the header are the data lines. Here is the general idea of a CSVY file.

schema:
  fields:
  - name: name
    type: string
  - name: age
    type: integer
  - name: date
    type: number
name, age, date
Maria, 44, 2010-01-01
Roberto, 43, 2009-01-01

Tukey HSD

A colleague likes using Tukey's honest significant difference for pairwise comparisons. I never use it but maybe I should. The example for stats::TukeyHSD in R is as follows.

R> summary(fm1 <- aov(breaks ~ wool + tension, data = warpbreaks))
            Df Sum Sq Mean Sq F value Pr(>F)
wool         1    451     451    3.34 0.0736 .
tension      2   2034    1017    7.54 0.0014 **
Residuals   50   6748     135
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R> TukeyHSD(fm1, "tension", ordered = TRUE) # Still need to understand 'ordered'
  Tukey multiple comparisons of means
    95% family-wise confidence level
    factor levels have been ordered

Fit: aov(formula = breaks ~ wool + tension, data = warpbreaks)

$tension
    diff   lwr upr p adj
M-H  4.7 -4.63  14  0.45
L-H 14.7  5.37  24  0.00
L-M 10.0  0.65  19  0.03

I understand penalization as a better way to handle multiple comparisons but this is quick and easy. It would be interesting to better understand Tukey HSD and perhaps see if I can borrow the automatically generated pairwise comparisons code.

December 4, 2020

Indent a region five spaces in Emacs

C-u 5 C-x TAB is what I wanted.

R's base and grid graphics in one plot

This is one way to combine base and grid graphics in one plot:

 library(ggplot2)
 library(ggplotify)
 library(gridExtra)

 ## Using `::` in a few places to indicate the package
 p1 <- ggplotify::as.grob(~plot(dist ~ speed,  # Notice the unusual `~`
								data = cars,
								main = "Base graphics scatter plot"))

 p2 <- ggplot(cars) +
	 aes(x = speed, y = dist) +
	 geom_point() +
	 ggtitle("ggplot scatter plot")

 gridExtra::grid.arrange(p1, p2, nrow = 1)

Org and TODO states

I had to remind myself how to change the TODO states in an Org file. Here is an example Org file with the information.

 #+TODO: ASAP SOON WAITING SOMEDAY | DONE CANCELED

 * ASAP Learn about Org TODO states
 - It seems easiest to put the TODO state sequence in the file itself
   by having a ~#+TODO:#~ line at the top of the file.
 - See [[https://orgmode.org/org.html#Per_002dfile-keywords][per-file TODO states]]
 - The vertical bar separates not-done versus done states.
 - If you change your TODO states in a file go to the line with the states
   and do ~C-c C-c~ and you'll see "Local setup has been refreshed."
 - As a reminder, the TODO states can be advanced via ~C-c C-t~.

November 28, 2020

I know of two disk space programs on Linux:

  1. du says it is to estimate file space usage
  2. df says it is to report file system disk space usage but I found it's better to think of it as reporting free space.

I think df is more what I want since it shows

If no file name is given, the space available on all currently mounted file systems is shown

On this Raspberry Pi, df -h gives the following:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        14G  5.4G  7.1G  43% /
devtmpfs        459M     0  459M   0% /dev
tmpfs           464M   60M  404M  13% /dev/shm
tmpfs           464M  6.3M  457M   2% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           464M     0  464M   0% /sys/fs/cgroup
/dev/mmcblk0p6   71M   23M   49M  32% /boot
tmpfs            93M  4.0K   93M   1% /run/user/1000
/dev/mmcblk0p5   30M  398K   28M   2% /media/pi/SETTINGS

A Web search gives me this nice link from Adafruit:

https://learn.adafruit.com/an-illustrated-shell-command-primer/checking-file-space-usage-du-and-df

This led to free for memory usage. On the Pi, free -h gives (with some whitespace removed):

         total    used    free  shared  buff/cache   available
Mem:      926M    392M     47M     67M        486M        487M
Swap:      99M    5.8M     94M

November 25, 2020

From the Linux Documentation Project (via Stack Exchange):

/etc/skel/ The default files for each new user are stored in this directory. Each time a new user is added, these skeleton files are copied into their home directory. An average system would have: .alias, .bash_profile, .bashrc and .cshrc files. Other files are left up to the system administrator.

On the Raspberry Pi that I am working on righ now, I see .bash_logout, .bashrc, and .profile.

And per a comment in this .profile, Bash reads ~/.profile if there is no ~/.bash_profile and no ~/.bash_login so I don't want to have the latter two files. At least on this Pi the process is:

  • Read ~/.profile which has my environmental variables

  • ~/.profile sources ~/.bashrc if it exists. The latter has local Bash configurations.

  • ~/.bashrc sources ~/.bash_aliases if it exists. Having aliases in a separate file is a "separation of concerns" idea.

November 5, 2020

Here are two LaTeX-like macros for gpp that define two levels of RTF list.

This macro has the bullet flush left with the margin and the text indented as a block by 360/1440 twips or 1/4 inch.

\define{\bulletpoint}{
\paragraph{\fi-360\li360\bullet\tab #1}
}

This second-level bullet has a "white bullet" (specified in unicode decimal format) indented 1/4 inch and the rest of the body indented 1/4 inch more.

\define{\subbulletpoint}{
\paragraph{\fi-360\li720{\u9702-}\tab #1}
}

Both macros depend on \paragraph{} so that the bullet points inherit the paragraph spacing and other formatting. An example paragraph macro is:

\define{\paragraph}{{\pard \ql \sa60 \sb60 \f0 \fs24 \kerning12
#1
\par}}

October 16, 2020

SAS has special missing data codes which represent

a type of numeric missing value that enables you to represent different categories of missing data by using the letters A-Z or an underscore.

R doesn't have this built in. But here is prototype of how to have missing data codes. The data frame has three rows and two variables. Then first variable is an integer ID variable and the second is trails_a which holds the time in seconds a person takes on part A of a test called Trails Making Test. If the person does not have a score on the test, we can record the reason.

I don't think data.frame can generate this object so I use structure with row.names and class attributes. The trails_a variable itself is a list with two elements each of length 3. In a way this is like an embedded data frame.

d <- structure(list(id = c(11L, 22L, 33L),
                    trails_a = structure(list(score = c(50L,
                                                        NA_integer_,
                                                        NA_integer_),
                                              reason = c(NA_character_,
                                                         "Exceeded time limit",
                                                         "Too severe to test")),
                                         class = "trailsList"),
                    random = runif(3)),
               row.names = c(NA, -3L),
               class = "data.frame")

This doesn't print properly (and gives a "corrupt data frame" warning) unless we create a format method for our trailsList class.

format.trailsList <- function(x, ...){
    ifelse(is.na(x$score),
           sprintf("NA (%s)", x$reason),
           sprintf("%ds", x$score))
}

But with a nice format method we get this:

  id                 trails_a random
1 11                      50s  0.061
2 22 NA (Exceeded time limit)  0.664
3 33  NA (Too severe to test)  0.826

To use a data frame with a trailsList object we need to define a number of other methods including operations like == for subsetting. I don't know all that is involved. But an example to look at would be the methods for Surv in the survival package.

Conclusion

I don't think most analysts would want to deal with this complexity. It's more straightforward and understandable to just use two columns in the data frame: the score and if no score, the reason for NA.

Postscript

Having a variable in a data frame that is of POSIXlt class is an "established" case that is very similar. This is a data frame with two observations and three variables clinic, date, and ldate.

d <- data.frame(clinic = 1:2,
                date = as.Date(c("2010-01-01", "2020-01-05")))
d$ldate <- as.POSIXlt(d$date)
dput(d)
structure(list(id = 1:2,
               date = structure(c(14610, 18266), class = "Date"),
               ldate = structure(list(sec = c(0, 0),
                                      min = c(0L, 0L),
                                      hour = c(0L, 0L),
                                      mday = c(1L, 5L),
                                      mon = c(0L, 0L),
                                      year = c(110L, 120L),
                                      wday = c(5L, 0L),
                                      yday = c(0L, 4L),
                                      isdst = c(0L, 0L)),
                                 class = c("POSIXlt", "POSIXt"),
                                 tzone = "UTC"),
               row.names = c(NA, -2L),
               class = "data.frame")

September 21, 2020

PowerShell

I used to look down my nose at PowerShell but now I realize there are lots of shells that do a good job. I need to learn the equivalent of aliases in PowerShell to make navigating the Windows filesystem faster but for now if I navigate to a directory with a file that I want to open (say a Word document) then I can open quickly with

 ii myfile.docx

Here, ii is a shortening of Invoke-Item.

September 15, 2020

Lattice graphics

A StackOverflow question asked about removing tickmarks on the right and top margins of a plot.

https://stackoverflow.com/questions/62945745/removing-the-ticks-on-the-3rd-and-4th-axes-in-plots-from-library-effects-in-r

The context was

library(effects)
m <- lm(Fertility ~ ., data = swiss)
plot(allEffects(m), rug = FALSE)

And plot from the effects package didn't allow the user much control. For example, the usual scales argument wasn't handled. This was my solution:

trellis.par.set(axis.components = list(top = list(tck = 0),
                                       right = list(tck = 0))
plot(allEffects(m), rug = FALSE)

Another thing I learned was to to control the font family easily. Using trellis.par.set outside of a plot cal or using par.settings inside a plot call, one of the elements is grid.pars. See ?gpar.

str(get.gpar())
xyplot(dist ~ speed, data = cars,
       par.settings = list(grid.pars = list(fontfamily = "Open Sans")))

August 27, 2020

Polymode

In a RMarkdown file edited with Emacs and Polymode, the command polymode-eval-buffer-from-beg-to-point is bound to both M-n v <up> and M-n v u.

The command polymode-eval-buffer is bound to M-n v b.

The command polymode-eval-region-or-chunk is bound to M-n v v.

August 26, 2020

Lattice graphics

  • Axis tick marks on log scale with "normal" numbers: list(log = 2, equispaced.log = FALSE)
  • Gridlines to match tick marks: panel.grid(h = -1, v = -1)

August 20, 2020

The data.table package has fread which is great. One difference between fread and read.csv is how a field like ,"", is handled. Here is an example file:

 "id","date1","date2","code"
 "1","2019-01-01","","3"
 "1","2019-01-01","2018-03-18","3"

Using read.csv with defaults gives:

 R> str(read.csv("test-fread.csv", stringsAsFactors = FALSE))
 'data.frame':	2 obs. of  4 variables:
 $ id   : int  1 1
 $ date1: chr  "2019-01-01" "2019-01-01"
 $ date2: chr  "" "2018-03-18"   <---- Empty string
 $ code : int  3 3

Using na.strings = "" gives

 R> str(read.csv("test-fread.csv", stringsAsFactors = FALSE, na.strings = ""))
 'data.frame':	2 obs. of  4 variables:
 $ id   : int  1 1
 $ date1: chr  "2019-01-01" "2019-01-01"
 $ date2: chr  NA "2018-03-18"   <---- Beter
 $ code : int  3 3

But I do not know a way to get the same from fread:

 R> str(fread("test-fread.csv", stringsAsFactors = FALSE, na.strings = ""))
 Classes 'data.table' and 'data.frame':	2 obs. of  4 variables:
 $ id   : int  1 1
 $ date1: chr  "2019-01-01" "2019-01-01"
 $ date2: chr  "" "2018-03-18"
 $ code : int  3 3
 - attr(*, ".internal.selfref")=<externalptr>

By design, fread wants to be smart about allowing an NA field to be data (e.g., the string "NA". It also wants to allow zero length strings. I don't know if there is a way around this.

For example, this doesn't work:

 R> str(fread("test-fread.csv", colClasses = c("date1" = "Date", "date2" = "Date")))
 Classes 'data.table' and 'data.frame':	2 obs. of  4 variables:
 $ id   : int  1 1
 $ date1: Date, format: "2019-01-01" "2019-01-01"
 $ date2: chr  "" "2018-03-18"
 $ code : int  3 3
 - attr(*, ".internal.selfref")=<externalptr>
 Warning message:
 Column 'date2' was requested to be 'Date' but fread encountered the following error:
 character string is not in a standard unambiguous format
 so the column has been left as type 'character'

This comes up with ADNI data and the PTDEMOG.csv. Maybe the root of the problem is having everything quoted in the source CSV.