Files in the top-level directory in any check-in

  • CharisSIL-plot.png
  • README.md

May 18, 2022

This is how I made something of a platform independent Makefile to run either rcmd.exe on Windows or R CMD on Linux.

# This file assumes that when called from Windows
# there will be a bash.exe at c:\scoop\shims\bash.exe

ifeq ($(OS), Windows_NT)
        R_CMD := rcmd.exe
        SHELL = c:\scoop\shims\bash.exe
        R_CMD := R CMD

May 6, 2022

This is how I used scp to copy a directory from Linux to Windows:

scp -r username@server.com:/file/to/syncdir .
                                            ^ syncdir created locally here

May 5, 2022

How I updated my Pi to Python 3.10.4: https://stackoverflow.com/questions/64718274/how-to-update-python-in-raspberry-pi

sudo apt-get install -y build-essential tk-dev \
  libncurses5-dev libncursesw5-dev libreadline6-dev libdb5.3-dev \
  libgdbm-dev libsqlite3-dev libssl-dev libbz2-dev libexpat1-dev \
  liblzma-dev zlib1g-dev libffi-dev

wget https://www.python.org/ftp/python/3.10.4/Python-3.10.4.tgz
sudo tar zxf Python-3.10.4.tgz
cd Python-3.10.4.tgz
sudo ./configure --enable-optimizations
sudo make -j4
sudo make altinstall

April 12, 2022

I had to work with text files with a non-ASCII character representing a superscript 2. When checking these into Fossil, Fossil warned of invalid UTF-8. I think the problem was that the file had the superscript 2 in LATIN-1. I told Fossil to convert the file and opened the converted file and original file in Emacs and turned on hexl-mode. I forgot to finish this post but I think the difference was between b2 for LATIN-1 and 00b2 for UTF-8.

March 7, 2022

I like the "generic preprocessor" program called gpp (https://github.com/logological/gpp/) for use as a general macro processor. (See https://en.wikipedia.org/wiki/Preprocessor and some links therein.) On Linux gpp is easy to install on with the usual ./configure, make, etc.

Below is how I got it working on Windows without using ./configure or make. (I have Make but I don't think I have sh needed by configure. I'm not sure, but it wasn't hard to compile "by hand".)

These are the preliminary steps:

/* Header created by Stephen Weigand to avoid configure on Windows */
#define PACKAGE "gpp"
#define PACKAGE_BUGREPORT "tristan@logological.org"
#define PACKAGE_STRING "GPP 2.27"
#define PACKAGE_TARNAME "gpp"
#define PACKAGE_URL ""
#define PACKAGE_VERSION "2.27"

#define HAVE_STRDUP 1
#define HAVE_FNMATCH_H 0

The HAVE_<something> reflects that my GCC can find the C functions strdup and strcasecp but not fnmatch. Now edit gpp.c as follows.

Add the include for stephen.h near the top. Note the quotes around stephen.h which says look in the current directory for the include file.

/* Added by Stephen Weigand March 7, 2022 */
# include "stephen.h"

Now make a few changes based on not having fnmatch. Lines 1880-1902 need to be like this:

    if (SpliceInfix(buf, pos1, pos2, "=~", &spl1, &spl2)) {
        bug("globbing support has not been compiled in");
#else <--- I had to convert this from #endif to #else
        if (!DoArithmEval(buf, pos1, spl1, &result1)
                || !DoArithmEval(buf, spl2, pos2, &result2)) {
            char *str1, *str2;

            /* revert to string comparison */
            while ((pos1 < spl1) && isWhite(buf[spl1 - 1]))
            while ((pos2 > spl2) && isWhite(buf[spl2]))
            str1 = strdup(buf + pos1);
            str1[spl1 - pos1] = '\0';
            str2 = strdup(buf + spl2);
            str2[pos2 - spl2] = '\0';
            *result = (fnmatch(str2, str1, 0) == 0);
        } else
            *result = (result1 == result2);
        return 1;
#endif <--- This was moved down to here.

With this change (which may be a bug fix, really) I could then do the old-fashioned way of compiling the program.

gcc.exe -Wall -Os -o gpp.exe gpp.c

Then I moved it to somewhere in my path.

February 22, 2022

Commands for setting up a new Fossil

This is my set of commands for starting a Fossil project. I create the repository, set my password, add another user (otheruser) and set their password, and then give them administrator (a) capabilities.

cd /path/to/fossils
fossil new myrepo.fossil
fossil user password myname mypAssW0rd -R myrepo.fossil
fossil user new otheruser otheruser@example.com the1rPasSw0rd -R myrepo.fossil
fossil user capabilities otheruser a -R myrepo.fossil

Then I start the UI and change a few settings:

fossil ui myrepo.fossil

Once the UI opens, click on Admin then Configuration and set the following:

  • Project name. I keep this short like a filename since it displays in my bash prompt.

  • Project description. This is longer.

  • Index page. I use /dir?ci=tip so that when I click on Home in the UI it takes me to the list of files.

I also add a deposit target in a Makefile which will "deposit" and unpack the repository to a location on disk that colleagues who don't use Fossil can look at.

MYPATH := /path/to/shared/directory
.PHONY: deposit
        fossil zip trunk source-code-repository.zip --name source-code-repository
        mv source-code-repository.zip $(MYPATH)
        cd $(MYPATH)/; unzip -o source-code-repository.zip
        rm -f $(MYPATH)/source-code-repository.zip
        echo "WARNING: This is a read-only copy of files under Fossil version control for this project." > $(MYPATH)/source-code-repository/WARNING.txt

February 14, 2022

Building R packages

I'm building my R package on my Raspberry Pi and did this

cd ~/Subversions/myproject-branches/
R CMD build mydir-wip-branch

But I needed texi2dvi and so did:

sudo apt install texinfo

But I needed texlive so I did:

sudo apt-install texlive

But I needed inconsolata.sty so I did:

sudo apt-install texlive-fonts-extra

And then I got R CMD build and R CMD check to both be happy and the latter command even got my vignette built.

Subversion branches

I have a checkout in ~/Subversions/myproject and made a branch like this.

First I used the svn copy command

svn copy ^/mypkg/mydir ^/mypkg/mydir-wip-branch -m "Making a work in progress branch of mydir"

Here the ^ means the "root" part of the repository. Then I did

mkdir ~/Subversions/myproject-branches
cd ~/Subversions/myproject-branches
svn checkout svn+ssh://username@example.com/svnroot/myproject/mypkg/mydir-wip-branch

I now have a work-in-progress branch of a subdirectory of the repository. I will need to merge it someday.

January 28, 2022


This is from https://everyday.codes/linux/how-passwordless-ssh-login-works/

  • I generate a private key which is a long sequence of bits/bytes/characters.
  • A public is another sequence that is derived from this.
  • I can get the public from the private but with the public I can't go back to the private.
  • My public key is available for everyone and stored on my PC and the server

Using the public key, you can encrypt (or sign) any message, and it will only be possible to decrypt it using the private key. In other words, anyone with your public key can send you encrypted messages that only you will be able to read.

The server can authenticate me like this:

  1. Encrypt a message using the public key I've stored on the server and send the message to my PC
  2. My PC decrypts the message using my private key and sends it back
  3. If the server sees I got it right then I am authenticated.

I'm live tiloring my steps to set up passwordless login for R-forge.

This is a good resource it seeks https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_keymanagement

I will have a private key and a public key. I place my public key on the R-forge server. When I ssh to r-forge Microsoft says:

When using key authentication with an SSH server, the SSH server and client compare the public key for a user name provided against the private key. If the server-side public key cannot be validated against the client-side private key, authentication fails.

PS> ssh-keygen.exe -t ed25519
Generating public/private ed25519 key pair.
Enter file in which to save the key (C:\Users\USERNAME/.ssh/id_ed25519):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in C:\Users\USERNAME/.ssh/id_ed25519.
Your public key has been saved in C:\Users\USERNAME/.ssh/id_ed25519.pub.


Now you have a public/private Ed25519 key pair in the location specified.

    Directory: C:\Users\WEIGAND\.ssh

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----        1/28/2022   9:38 PM            464 id_ed25519
-a----        1/28/2022   9:38 PM            104 id_ed25519.pub

Remember that private key files are the equivalent of a password should be protected the same way you protect your password. To help with that, use ssh-agent to securely store the private keys within a Windows security context, associated with your Windows login. To do that, start the ssh-agent service as Administrator and use ssh-add to store the private key.

December 16, 2021

I forgot how to control what applications start up automatically in Windows 10. A while back I set it so PowerShell started up automatically and disabled the same for Skype. But I needed to disable Microsoft Teams from starting up automatically. Here's how I did it:

I typed "Startup" in the search bar. This opened the Settings app and took me to the Startup setting. Then I disabled Teams.

Obviously this was trivial but I lived with Teams starting up automatically for a couple weeks so I added a note about it to cement it in my memory. (I was remembering it as complicated and involving a start-up folder and shortcuts. I guess not.)

December 1, 2021

To fully detach a package in R I do this:

R> detach("package:mypackage", character.only = TRUE, unload = TRUE)

One thing that seems to be a pain is if you "attach" something like ggplot2 (i.e., do any of library(ggplot2), library("ggplot2"), require(ggplot2), etc.) then your session will get a lot of packages loaded "via a namespace" but not "attached"):

> sessionInfo() # after library(ggplot2)
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
attached base packages:
[1] stats     graphics  utils     datasets  grDevices methods   base     

other attached packages:
[1] ggplot2_3.3.5

loaded via a namespace (and not attached):
[1] fansi_0.4.2      withr_2.4.2      assertthat_0.2.1 dplyr_1.0.7     
 [5] crayon_1.4.1     utf8_1.2.2       grid_4.0.3       R6_2.5.1        
 [9] DBI_1.1.1        lifecycle_1.0.1  gtable_0.3.0     magrittr_2.0.1  
[13] scales_1.1.1     pillar_1.6.3     rlang_0.4.11     generics_0.1.0  
[17] vctrs_0.3.8      ellipsis_0.3.2   splines_4.0.3    tools_4.0.3     
[21] glue_1.4.2       purrr_0.3.4      munsell_0.5.0    compiler_4.0.3  
[25] pkgconfig_2.0.3  colorspace_2.0-2 tidyselect_1.1.1 tibble_3.1.2    

There are still 28 packages "loaded via a namespace (and not attached)" after I detach ggplot2. (But I’m not sure what that means exactly.) The way to get rid of these hangers-on is via the unloadNamespace() function.

But they seem to need to be detached based on what's at the top of the dependency tree. Here is an illustration of the problem

> unloadNamespace("fansi") # try to unload first in the list
Error in unloadNamespace("fansi") : 
  namespace 'fansi' is imported by 'pillar' so cannot be unloaded

> unloadNamespace("pillar") # OK, I'll unload 'pillar' first.
Error in unloadNamespace("pillar") : 
  namespace 'pillar' is imported by 'tibble', 'dplyr' so cannot be unloaded

> unloadNamespace("tibble") # OK, unload 'tibble'.
Error in unloadNamespace("tibble") : 
  namespace 'tibble' is imported by 'dplyr' so cannot be unloaded

> unloadNamespace("dplyr") # OK, here's an order that works.
> unloadNamespace("tibble")
> unloadNamespace("pillar")
> unloadNamespace("fansi") 

I think there are packages that help you detach and unload packages but I am liking the idea of not attaching ggplot2 in the first place.

November 29, 2021

UTF-8 in C

Here is short C program that works for me on Linux and Windows to print out Hello Aβ42:

#include <stdio.h>

int main(){
  printf("Hello, A\u03b242\n");
  return 0;

On Windows I compile it with gcc.exe -Wall -o hello.exe .\hello.c using GCC 8.3.0 and get no errors or warnings. On Linux I am using GCC 4.8.5 and get

hello.c: In function ‘main’:
hello.c:4:10: warning: universal character names are only valid in C++ and C99 [enabled by default]
   printf("Hello, A\u03b242\n");

I can use GCC 8.3.1 after typing scl enable devtoolset-8 bash and then the warning goes away. But if I add the GCC flag -std=c89 then the warning comes back (which makes sense).

UTF-8 in Windows Terminal

When using Windows Terminal, I clicked on the down arrow in the tab bar and clicked on Settings which opened up a JSON settings file in Notepad. Then I added two arguments to the PowerShell command to tell the terminal to output in UTF-8.

 // Make changes here to the powershell.exe profile.
 "guid": "{61c54bbd-c2c6-5271-96e7-009a87ff44bf}",
 "name": "Windows PowerShell",
 "commandline": "powershell.exe -Noexit -Command chcp 65001",
 "hidden": false                ^^^^^^^^^^^^^^^^^^^^^^^^^^^

July 22, 2021

From my Windows 10 desktop I can clone a repo on a Linux machine with

fossil set ssh-command "ssh"
fossil clone -v ssh://weigand@server.com/./fossils/filename.fossil filename.fossil

This is good except I should be able to specify the SSH command using -c ssh. Also, I wonder why it doesn't work with plink.exe which allows commands like this

plink.exe -ssh weigand@server.com ls

July 21, 2021

When exporting in Org mode to prevent underscores from being treated as subscripts, I did this:

#+OPTIONS: ^:{}

July 1, 2021

From https://endlessparentheses.com/debugging-emacs-lisp-part-1-earn-your-independence.html I learned several things which may help with my RTF backend for Org mode.

  1. The command find-function will take me to the definition of a function. If I am inside a function, I can invoke it and it will ask me what function to go to with the default being the one I'm in. It will then take me to where in the source that function is defined and put point at the start of the function. By default it doesn't look like their are key bindings. I might want to make an alias so I only need to do M-x ff.

  2. The command describe-function is bound to C-h f or <f1> f and shows the full documentation of the function. If point is in (or next two?) a function, issuing the command will set the default to the function point is in/near and so just press enter to see the help for the function.

  3. I just realized that <f1> is the help function. So naturally <f1> f will give the help for a function, <f1> v the help for a variable, etc.

  4. The Edebug part is that I either find or describe the function and then do C-u C-M-x. This is eval-defun with a prefix according to the manual. My only problem is I can't get this to work except for a function that is buggy.

June 15, 2021

In R on Windows and Linux, source("clipboard") will source the contents of the clipboard. On MacOS the equivalent is source(pipe("pbpaste")).

June 13, 2021

From https://devblogs.microsoft.com/scripting/powertip-use-powershell-to-display-windows-path/ I can get my path on PowerShell via

$env:path -split ";"

I turned this into a function:

function path() {
  $env:path -split ";"

June 12, 2021

In Emacs I can get an interactive SQLite session via M-x sql-sqlite and then I just have to specify a data base filename.

May 3, 2021

An easy way to get semi-transparent colors in R is to use adjustcolor:

barplot(c(1, 2, 3), col = adjustcolor("firebrick", 0.5),
        main = "Firebrick bar plot with alpha=0.5")

April 26, 2021

Create a simple SQLite data base:

sqlite3 test.sqlite3
sqlite> .databases -- Show the databases

  main: /home/weigand/fossils/tilor/test.sqlite3

ATTACH DATABASE 'test.sqlite3' as 'project';


main: /home/weigand/fossils/tilor/test.sqlite3
   project: /home/weigand/fossils/tilor/test.sqlite3

  date_birth CHAR(10),
  sex CHAR(2) CHECK(sex == 'M' OR sex == 'F')

INSERT INTO project.person VALUES(1, '2010-01-01', 'M');
INSERT INTO project.person VALUES(2, '2011-11-11', 'F');

SELECT * FROM main.person;


April 14, 2021

When using smbclient on Linux to put a file on a Windows share I wasn't specifying the local file path correctly. I learned about the lcd command in smbclient to set the local current directory and then I could put the filename without a full or relative path. Like this:

PASSWD=`grep -woi -m 1 '^machine mymachine login .* password .*$' ~/.netrc | cut -d ' ' -f 6`
smbclient //share.example.com/DirectoryX \
  -E       \
  --command 'cd DirA/DirB ;
             lcd /local/working/directory ;
             put localfile.csv ;

March 17, 2021

This is how to get interval estimates for random effects from a linear mixed effects model fit with lme4::lmer in R using the unexported lme4:::asDf0 function.

fit <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

dotplot(ranef(fit)) # Caterpillar plot shows intervals

out <- lme4:::asDf0(ranef(fit), "Subject")
out$lcl <- out$values - 1.96 * out$se
out$ucl <- out$values + 1.96 * out$se

out[c(1:2, 35:36), ]

values         ind .nn   se
1    2.259 (Intercept) 308 12.1
2  -40.399 (Intercept) 309 12.1
35  -0.988        Days 371  2.3
36   1.284        Days 372  2.3

March 10, 2021

On Linux & runs a command in the background. For example:

emacs README.md &

From PowerShell, the equivalent seems to be:

Start-Process -NoNewWindow emacs.exe README.md

The above is from a Web search where I landed on https://ariefbayu.xyz/run-background-command-in-powershell-8ea86436684e. That post shows how to make a general run-it-in-the-background command.

function bg() {
     Start-Process -NoNewWindow @args

And then I might use:

bg emacs.exe README.md

March 3, 2021

The Journal NeuroImage uses an open source font called Charis SIL. This can be downloaded and on my department's Linux system, I just have to put the *.ttf files in ~/.fonts and then I can create figures in "native" NeuroImage style:


ggplot(cars, aes(x = speed, y = dist)) +
    geom_point() +
    labs(title = "Scatter plot in Charis SIL font used by NeuroImage",
         x = "Speed (miles per hour)",
         y = "Distance (feet)") +
    theme_light(base_family = "CharisSIL")

Scatter plot of distance versus speed demonstrating CharisSIL font

December 12, 2020

This is how I used Dropbox's Python API to upload my son's piano recital video. (My daughter's video was smaller and uploaded fine through the Dropbox web interface before timing out.)

My guide was Dropbox for Python Developers.

  1. Set-up a Python virtual environment and install dropbox

    • python3.7 -m venv ~/venvs/recital
    • source ~/venvs/recital/bin/activate
    • pip install --upgrade pip
    • pip install dropbox
  2. Linked my account in Dropbox to an app

    • Go to the app console
    • Created and named the app "recital". To keep it simple I gave the app read and write permissions across my whole Dropbox account.
    • Generated a short-lived and quite long access token (which lasts for four hours)
  3. Wrote Python code to create a dropbox instance, read the file into an object, and upload the object

import dropbox
from dropbox.files import WriteMode

dbx = dropbox.Dropbox('AVerySpecial138CharacterAccessToken')

with open('/home/weigand/recital/piano.m4v', 'rb') as f:
        data = f.read()

dbx.files_upload(data,                         # My file
                 '/recital-folder/piano.m4v',  # Full path of destination
                 mode=WriteMode('overwrite'))  # Want to overwrite

This obviously isn't "professional grade" Python. It works but doesn't handle errors and hard codes the filename.

December 11, 2020

This is how I managed to print to a WiFi-enabled printer from my Raspberry Pi.

I first went to the support page the Brother MFC-J870DW printer and selected Downloads and then did the following:

  1. Select OS Family as Linux
  2. Select OS Version as Linux (deb)
  3. Clicked on the Driver Install Tool
  4. Agreed to the End User Licence Agreement (EULA) and clicked Download
  5. Saved the file linux-brprinter-installer-2.2.2-1.gz to my ~/Downloads directory

In the shell I did:

cd ~/Downloads
gunzip linux-brprinter-installer-2.2.2-1.gz
sudo bash linux-brprinter-installer-2.2.2-1

  Input model name ->MFC-J870DW

  You are going to install following packages.
  OK? [y/N] ->
  ... <say yes to a few additional things>
  Will you specify the Device URI? [Y/n] ->y

  select the number of destination Device URI. ->

December 10, 2020

CSVY format

The data.table package in R has a fast way to read CSV files in the form of data.table::fread and as of version 1.12.4 it has support for reading CSVY files. The CSVY format is a text format where there is a YAML header at the top of the file which defines the data "schema" and below the header are the data lines. Here is the general idea of a CSVY file.

  - name: name
    type: string
  - name: age
    type: integer
  - name: date
    type: number
name, age, date
Maria, 44, 2010-01-01
Roberto, 43, 2009-01-01

Tukey HSD

A colleague likes using Tukey's honest significant difference for pairwise comparisons. I never use it but maybe I should. The example for stats::TukeyHSD in R is as follows.

R> summary(fm1 <- aov(breaks ~ wool + tension, data = warpbreaks))
            Df Sum Sq Mean Sq F value Pr(>F)
wool         1    451     451    3.34 0.0736 .
tension      2   2034    1017    7.54 0.0014 **
Residuals   50   6748     135
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R> TukeyHSD(fm1, "tension", ordered = TRUE) # Still need to understand 'ordered'
  Tukey multiple comparisons of means
    95% family-wise confidence level
    factor levels have been ordered

Fit: aov(formula = breaks ~ wool + tension, data = warpbreaks)

    diff   lwr upr p adj
M-H  4.7 -4.63  14  0.45
L-H 14.7  5.37  24  0.00
L-M 10.0  0.65  19  0.03

I understand penalization as a better way to handle multiple comparisons but this is quick and easy. It would be interesting to better understand Tukey HSD and perhaps see if I can borrow the automatically generated pairwise comparisons code.

December 4, 2020

Indent a region five spaces in Emacs

C-u 5 C-x TAB is what I wanted.

R's base and grid graphics in one plot

This is one way to combine base and grid graphics in one plot:


 ## Using `::` in a few places to indicate the package
 p1 <- ggplotify::as.grob(~plot(dist ~ speed,  # Notice the unusual `~`
								data = cars,
								main = "Base graphics scatter plot"))

 p2 <- ggplot(cars) +
	 aes(x = speed, y = dist) +
	 geom_point() +
	 ggtitle("ggplot scatter plot")

 gridExtra::grid.arrange(p1, p2, nrow = 1)

Org and TODO states

I had to remind myself how to change the TODO states in an Org file. Here is an example Org file with the information.


 * ASAP Learn about Org TODO states
 - It seems easiest to put the TODO state sequence in the file itself
   by having a ~#+TODO:#~ line at the top of the file.
 - See [[https://orgmode.org/org.html#Per_002dfile-keywords][per-file TODO states]]
 - The vertical bar separates not-done versus done states.
 - If you change your TODO states in a file go to the line with the states
   and do ~C-c C-c~ and you'll see "Local setup has been refreshed."
 - As a reminder, the TODO states can be advanced via ~C-c C-t~.

November 28, 2020

I know of two disk space programs on Linux:

  1. du says it is to estimate file space usage
  2. df says it is to report file system disk space usage but I found it's better to think of it as reporting free space.

I think df is more what I want since it shows

If no file name is given, the space available on all currently mounted file systems is shown

On this Raspberry Pi, df -h gives the following:

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        14G  5.4G  7.1G  43% /
devtmpfs        459M     0  459M   0% /dev
tmpfs           464M   60M  404M  13% /dev/shm
tmpfs           464M  6.3M  457M   2% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           464M     0  464M   0% /sys/fs/cgroup
/dev/mmcblk0p6   71M   23M   49M  32% /boot
tmpfs            93M  4.0K   93M   1% /run/user/1000
/dev/mmcblk0p5   30M  398K   28M   2% /media/pi/SETTINGS

A Web search gives me this nice link from Adafruit:


This led to free for memory usage. On the Pi, free -h gives (with some whitespace removed):

         total    used    free  shared  buff/cache   available
Mem:      926M    392M     47M     67M        486M        487M
Swap:      99M    5.8M     94M

November 25, 2020

From the Linux Documentation Project (via Stack Exchange):

/etc/skel/ The default files for each new user are stored in this directory. Each time a new user is added, these skeleton files are copied into their home directory. An average system would have: .alias, .bash_profile, .bashrc and .cshrc files. Other files are left up to the system administrator.

On the Raspberry Pi that I am working on righ now, I see .bash_logout, .bashrc, and .profile.

And per a comment in this .profile, Bash reads ~/.profile if there is no ~/.bash_profile and no ~/.bash_login so I don't want to have the latter two files. At least on this Pi the process is:

  • Read ~/.profile which has my environmental variables

  • ~/.profile sources ~/.bashrc if it exists. The latter has local Bash configurations.

  • ~/.bashrc sources ~/.bash_aliases if it exists. Having aliases in a separate file is a "separation of concerns" idea.

November 5, 2020

Here are two LaTeX-like macros for gpp that define two levels of RTF list.

This macro has the bullet flush left with the margin and the text indented as a block by 360/1440 twips or 1/4 inch.

\paragraph{\fi-360\li360\bullet\tab #1}

This second-level bullet has a "white bullet" (specified in unicode decimal format) indented 1/4 inch and the rest of the body indented 1/4 inch more.

\paragraph{\fi-360\li720{\u9702-}\tab #1}

Both macros depend on \paragraph{} so that the bullet points inherit the paragraph spacing and other formatting. An example paragraph macro is:

\define{\paragraph}{{\pard \ql \sa60 \sb60 \f0 \fs24 \kerning12

October 16, 2020

SAS has special missing data codes which represent

a type of numeric missing value that enables you to represent different categories of missing data by using the letters A-Z or an underscore.

R doesn't have this built in. But here is prototype of how to have missing data codes. The data frame has three rows and two variables. Then first variable is an integer ID variable and the second is trails_a which holds the time in seconds a person takes on part A of a test called Trails Making Test. If the person does not have a score on the test, we can record the reason.

I don't think data.frame can generate this object so I use structure with row.names and class attributes. The trails_a variable itself is a list with two elements each of length 3. In a way this is like an embedded data frame.

d <- structure(list(id = c(11L, 22L, 33L),
                    trails_a = structure(list(score = c(50L,
                                              reason = c(NA_character_,
                                                         "Exceeded time limit",
                                                         "Too severe to test")),
                                         class = "trailsList"),
                    random = runif(3)),
               row.names = c(NA, -3L),
               class = "data.frame")

This doesn't print properly (and gives a "corrupt data frame" warning) unless we create a format method for our trailsList class.

format.trailsList <- function(x, ...){
           sprintf("NA (%s)", x$reason),
           sprintf("%ds", x$score))

But with a nice format method we get this:

  id                 trails_a random
1 11                      50s  0.061
2 22 NA (Exceeded time limit)  0.664
3 33  NA (Too severe to test)  0.826

To use a data frame with a trailsList object we need to define a number of other methods including operations like == for subsetting. I don't know all that is involved. But an example to look at would be the methods for Surv in the survival package.


I don't think most analysts would want to deal with this complexity. It's more straightforward and understandable to just use two columns in the data frame: the score and if no score, the reason for NA.


Having a variable in a data frame that is of POSIXlt class is an "established" case that is very similar. This is a data frame with two observations and three variables clinic, date, and ldate.

d <- data.frame(clinic = 1:2,
                date = as.Date(c("2010-01-01", "2020-01-05")))
d$ldate <- as.POSIXlt(d$date)
structure(list(id = 1:2,
               date = structure(c(14610, 18266), class = "Date"),
               ldate = structure(list(sec = c(0, 0),
                                      min = c(0L, 0L),
                                      hour = c(0L, 0L),
                                      mday = c(1L, 5L),
                                      mon = c(0L, 0L),
                                      year = c(110L, 120L),
                                      wday = c(5L, 0L),
                                      yday = c(0L, 4L),
                                      isdst = c(0L, 0L)),
                                 class = c("POSIXlt", "POSIXt"),
                                 tzone = "UTC"),
               row.names = c(NA, -2L),
               class = "data.frame")

September 21, 2020


I used to look down my nose at PowerShell but now I realize there are lots of shells that do a good job. I need to learn the equivalent of aliases in PowerShell to make navigating the Windows filesystem faster but for now if I navigate to a directory with a file that I want to open (say a Word document) then I can open quickly with

 ii myfile.docx

Here, ii is a shortening of Invoke-Item.

September 15, 2020

Lattice graphics

A StackOverflow question asked about removing tickmarks on the right and top margins of a plot.


The context was

m <- lm(Fertility ~ ., data = swiss)
plot(allEffects(m), rug = FALSE)

And plot from the effects package didn't allow the user much control. For example, the usual scales argument wasn't handled. This was my solution:

trellis.par.set(axis.components = list(top = list(tck = 0),
                                       right = list(tck = 0))
plot(allEffects(m), rug = FALSE)

Another thing I learned was to to control the font family easily. Using trellis.par.set outside of a plot cal or using par.settings inside a plot call, one of the elements is grid.pars. See ?gpar.

xyplot(dist ~ speed, data = cars,
       par.settings = list(grid.pars = list(fontfamily = "Open Sans")))

August 27, 2020


In a RMarkdown file edited with Emacs and Polymode, the command polymode-eval-buffer-from-beg-to-point is bound to both M-n v <up> and M-n v u.

The command polymode-eval-buffer is bound to M-n v b.

The command polymode-eval-region-or-chunk is bound to M-n v v.

August 26, 2020

Lattice graphics

  • Axis tick marks on log scale with "normal" numbers: list(log = 2, equispaced.log = FALSE)
  • Gridlines to match tick marks: panel.grid(h = -1, v = -1)

August 20, 2020

The data.table package has fread which is great. One difference between fread and read.csv is how a field like ,"", is handled. Here is an example file:


Using read.csv with defaults gives:

 R> str(read.csv("test-fread.csv", stringsAsFactors = FALSE))
 'data.frame':	2 obs. of  4 variables:
 $ id   : int  1 1
 $ date1: chr  "2019-01-01" "2019-01-01"
 $ date2: chr  "" "2018-03-18"   <---- Empty string
 $ code : int  3 3

Using na.strings = "" gives

 R> str(read.csv("test-fread.csv", stringsAsFactors = FALSE, na.strings = ""))
 'data.frame':	2 obs. of  4 variables:
 $ id   : int  1 1
 $ date1: chr  "2019-01-01" "2019-01-01"
 $ date2: chr  NA "2018-03-18"   <---- Beter
 $ code : int  3 3

But I do not know a way to get the same from fread:

 R> str(fread("test-fread.csv", stringsAsFactors = FALSE, na.strings = ""))
 Classes 'data.table' and 'data.frame':	2 obs. of  4 variables:
 $ id   : int  1 1
 $ date1: chr  "2019-01-01" "2019-01-01"
 $ date2: chr  "" "2018-03-18"
 $ code : int  3 3
 - attr(*, ".internal.selfref")=<externalptr>

By design, fread wants to be smart about allowing an NA field to be data (e.g., the string "NA". It also wants to allow zero length strings. I don't know if there is a way around this.

For example, this doesn't work:

 R> str(fread("test-fread.csv", colClasses = c("date1" = "Date", "date2" = "Date")))
 Classes 'data.table' and 'data.frame':	2 obs. of  4 variables:
 $ id   : int  1 1
 $ date1: Date, format: "2019-01-01" "2019-01-01"
 $ date2: chr  "" "2018-03-18"
 $ code : int  3 3
 - attr(*, ".internal.selfref")=<externalptr>
 Warning message:
 Column 'date2' was requested to be 'Date' but fread encountered the following error:
 character string is not in a standard unambiguous format
 so the column has been left as type 'character'

This comes up with ADNI data and the PTDEMOG.csv. Maybe the root of the problem is having everything quoted in the source CSV.