Untitled

[2024] [2023]

hey, I’m Ashok.

Email / Google Scholar / GitHub / LinkedIn

27.08.24

lapack is an open source library for linear algebra operations on cpus.
gpus have their own libraries for linear algebra operations such as cuBLAS for nvidia gpus, rocBLAS for amd gpus and mkl for intel gpus.
lapack code is written in fortran.

26.08.24

weight decay is used to prevent overfitting by penalizing weights over time in the loss function.
the weight update equation becomes $w = w - α (\nabla_{w} L - λ * w)$ where $λ$ is the weight decay.
some weights such as layer normalization weights are not decayed since overfitting is not a concern for them.

24.08.24

for error AttributeError: module 'numpy' has no attribute 'bool8'. Did you mean: 'bool', run pip3 install mxnet-mkl==1.6.0 numpy==1.23.1.

23.08.24

scp is helpful for copying files to the slurm scratch space.

example scp command:

scp -r a.py user@slurm_cluster_ip:/scratch/user/

slurm clusters have multiple login nodes yet a single ip which leads to:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

to resolve this, add StrictHostKeyChecking no to ~/.ssh/config or

ssh -o UserKnownHostsFile=/dev/null user@slurm_cluster_ip

21.08.24

setting up ssh keys for github on linux machine:

ssh-keygen -t rsa -b 4096 -C "email"
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub

then copy the key to github.

website dependencies (Node >= v20 and npm >= v9.3.1):

sudo apt install npm
sudo npm install -g n
sudo n latest
node --version

website build command: npx quartz build --serve.

03.08.24

Ctrl+M switches the theme between dark and light in wandb dashboard.

02.08.24

conda can be loaded on the Slurm cluster using module avail conda from the login node.
User directories should be created in /scratch/<user_id> rather than on the login nodes, as login nodes are connected in a round-robin fashion, hence data may not persist across sessions.
/scratch/ folders are generally deleted after 3 weeks so routine backups are important.

29.07.24

$l_{p}$ norm of a vector $x$ is defined as $∥ x ∥_{p} = (\sum_{i = 1}^{n} ∣ x_{i} ∣^{p})^{1/ p}$ , it calculates the size of the vector depending upon the value of $p$ .
$l_{2}$ norm is more commonly used and is also called the euclidean norm.
Geometrically, in 2D space, $l_{1}$ forms a diamond shape whereas $l_{2}$ forms a circle.
$l_{1}$ norm is sparse since it makes the coefficients zero whereas $l_{2}$ norm makes them small but non-zero.

26.07.24

Currently, groq provides free inference to Llama models using API keys.
model quantization reduces the precision of weights and biases to run the LLM models faster.d
8-bit and 4-bit quantization are commonly used for reducing the model size.

23.07.24

send a new slurm job using: sbatch job.sh where job.sh contains the job commands.
view queued jobs using squeue | grep gpu which will show the jobs using gpu nodes.
jobs have an upper time limit of 24 hours, so model checkpointing should be used to resume from a specific checkpoint for jobs that take longer.

20.07.24

slurm clusters have two types of nodes: login node vs worker node.
login nodes can access internet but worker nodes cannot.
wandb requires access to api.wandb.ai but since worker nodes do not provide that, an option is to store the data locally using export WANDB_MODE=offline and then run wandb sync --sync-all --include-synced ./wandb later on from the login node which will sync the data.

17.07.24

using wandb API key, multiple GPU nodes can send their logs to the central website.
tmux commands for:
- detach: <Ctrl-b> d
- attach: tmux a -t 0
- horizontal split: <Ctrl-b> %
- vertical split: <Ctrl-b> "

for the error:

libGL error: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
libGL error: failed to load driver: swrast

running,

cd /home/$USER/anaconda3/envs/$ENV/lib
mkdir backup
mv libstd* backup
cp /usr/lib/x86_64-linux-gnu/libstdc++.so.6  ./
ln -s libstdc++.so.6 libstdc++.so
ln -s libstdc++.so.6 libstdc++.so.6.0.19

fixes the issue.

16.07.24

unknown traceback:

Traceback (most recent call last):
  File "run.py", line 333, in <module>
    run_experiment(parser.parse_args())
  File "run.py", line 115, in run_experiment
    agent.train()
  File "/home/cse/Desktop/ashok/agents/model.py", line 352, in train
    norm = torch.nn.utils.clip_grad_norm_(
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/utils/clip_grad.py", line 20, in _no_grad_wrapper
    return func(*args, **kwargs)
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/utils/clip_grad.py", line 76, in clip_grad_norm_
    raise RuntimeError(
RuntimeError: The total norm of order 2.0 for gradients from `parameters` is non-finite, so it cannot be clipped. To disable this error and scale the gradients by the non-finite norm anyway, set `error_if_nonfinite=False`

gradient clipping is used to avoid exploding gradient problem.
yet, if the gradients are Inf or NaN then it returns the above traceback. Overflow in numbers due to insufficient data type can also cause this.

06.07.24

import gym versions:
- 0.18 for from gym.env.classic_control import rendering as visualize.
- 0.21 for change in rendering from pyglet to pygame.

05.07.24

AttributeError: module 'wandb.proto.wandb_internal_pb2' has no attribute 'Result' can be resolved by:
```
pip install protobuf==3.20
pip install wandb==0.16.6
```
AttributeError: partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import) can be resolved by:
```
pip install --force-reinstall charset-normalizer==3.1.0
```

04.07.24

setting up ssh on a new machine:

sudo apt install openssh-server
sudo systemctl start ssh
sudo systemctl enable ssh

01.07.24

gym > 0.26.0 replaces done with terminated and truncated. terminated indicates that the episode ended due to a terminal state, whereas truncated indicates that the episode ended due to a time limit.

29.06.24

isort to sort imports in python files, alphebetically.
pylint to check for code quality.

28.06.24

torchvision.datasets.ImageFolder can be used to load images from a folder when the folder structure is:
```
root/dog/xxx.png
root/dog/xxy.png
root/cat/123.png
root/cat/456.png
```

25.06.24

wandb sync can be used to upload the training directory from local to cloud.

24.06.24

for error: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers, run:
```
pip install setuptools==65.5.0 pip==21
pip install wheel==0.38.0
```
wandb login --relogin --cloud for logging through online account.

22.06.24

to clone a specific branch, git clone --single-branch --branch <branch_name> <repo_url>.
for AnyDesk error Failed to load module "canberra-gtk-module", install libcanberra-gtk-module using sudo apt-get install libcanberra-gtk-module libcanberra-gtk3-module.
nvidia_gpu_exporter (:9835) exports the metrics on /metrics which is scraped by prometheus (:9090) and visualized by grafana (:3000).

grafana (:3000) installation commands:

sudo apt-get install -y adduser libfontconfig1 musl
wget https://dl.grafana.com/oss/release/grafana_11.0.0_amd64.deb
sudo dpkg -i grafana_11.0.0_amd64.deb

nvidia_gpu_exporter (:9835) installation commands:

sudo dpkg -i nvidia-gpu-exporter_1.1.0_linux_amd64.deb

prometheus (:9090) installation commands:

wget https://github.com/prometheus/prometheus/releases/download/v2.45.6/prometheus-2.45.6.linux-amd64.tar.gz
tar -xvf prometheus-2.45.6.linux-amd64.tar.gz
sudo groupadd --system prometheus
sudo useradd -s /sbin/nologin --system -g prometheus prometheus
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
cd prometheus-2.45.6.linux-amd64
sudo mv prometheus /usr/local/bin
sudo mv promtool /usr/local/bin
sudo mv console* /etc/prometheus
sudo mv prometheus.yml /etc/prometheus
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
sudo chown prometheus:prometheus /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
sudo chown -R prometheus:prometheus /var/lib/prometheus

then edit the sudo nano /etc/prometheus/prometheus.yml file to:

global:
scrape_interval: 15s
evaluation_interval: 15s
 
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
 
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']  # Adjust the target to your node exporter endpoint
 
  - job_name: 'nvidia_exporter'
    static_configs:
      - targets: ['localhost:9835']  # Adjust the target to your node exporter endpoint

next, run sudo nano /etc/systemd/system/prometheus.service and add the following to it:

[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
 
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
 
[Install]
WantedBy=multi-user.target

lastly,

sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus

21.06.24

torch.bmm is used for batch matrix multiplication. It expects the input tensors to be of shape (batch, n, m) and (batch, m, p) and returns a tensor of shape (batch, n, p).

torch.nn._scaled_dot_product_attention is used in transformers for scaled dot product attention. Torch code for

Q

(64, 10, 128),

K

(64, 12, 128) and

V

(64, 12, 128):

def torch.nn._scaled_dot_product_attention(Q, K, V):
  attn_weights = torch.bmm(Q, K.transpose(1, 2)) / (128 ** 0.5)
  attn = torch.bmm(F.softmax(attn_weights, dim=-1), V)

attn_weights = $Q$ (64, 10, 128) x $K^{T}$ (64, 128, 12) = (64, 10, 12)
attn = attn_weights(64, 10, 12) x $V$ (64, 12, 128) = (64, 10, 128)
$Q$ .shape = attn.shape = (64, 10, 128), $K$ .shape = $V$ .shape = (64, 12, 128).

20.06.24

if the error on boot is bad shim signature, you need to load the kernel first, disable Secure Boot in BIOS and try again.
boot error: ata7: COMRESET failed (errno=-32) is merely a warning and the system does boot up.

19.06.24

setting up a dual node gpu cluster from scratch is a daunting task.

18.06.24

grafana can be used for visualizing data from prometheus. Usally, prometheus has a node exporter that collects system metrics and sends them to prometheus. Grafana can then be used to visualize these metrics.
node_exporter for nvidia gpu can be found here.

17.06.24

to preserve code changes to a folder on the gpu server, I archive the folder contents excluding the wandb folder using:

#!/bin/bash
 
# Check if the archive name is provided as an argument
if [ -z "$1" ]; then
  echo "Usage: $0 archive_name.tar"
  exit 1
fi
 
# Archive name from the first argument
ARCHIVE_NAME="$1"
 
# Exclude the wandb folder and create the tar archive
tar --exclude='./wandb' -cvf "$ARCHIVE_NAME" ./
 
echo "Archive created: $ARCHIVE_NAME"

16.06.24

manually installing .deb files: sudo dpkg -i <filename>.deb.

15.06.24

When running jupyter notebook, if you get the error ImportError: cannot import name 'contextfilter' from 'jinja2' (/home/user/anaconda3/lib/python3.8/site-packages/jinja2/__init__.py), switch to base conda environment and run pip install jinja2==3.0.3 nbconvert==6.4.4.
Error: Missing optional dependency 'pytables'. Use pip or conda to install pytables. can be resolved by running pip install tables.

To convert csv of format:

Time,Sensor1,Sensor2,Sensor3
00:00,10,15,5
01:00,12,18,8
02:00,14,20,7

to csv of format:

  timestep location  value
0    00:00  Sensor1     10
1    01:00  Sensor1     12
2    02:00  Sensor1     14

use,

long_format_data = data.melt(id_vars=['Time'], var_name='location', value_name='value')
long_format_data.rename(columns={'Time': 'timestep'}, inplace=True)

13.06.24

google provides the Secure Shell extension for ssh in chrome.

accessing gpu through docker requires installation of nvidia-container-toolkit using:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

followed by:

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit  nvidia-docker2
sudo systemctl restart docker

followed by:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
systemctl --user restart docker

12.06.24

Double Deep Q Network (DDQN) ¹ uses two networks to decouple the target $θ^{-}$ and estimation networks $θ$ .
the target network $θ^{-}$ is updated less frequently than the estimation network $θ$ . This approach helps in stabilizing the training of the q-network.
equation for DDQN: $Δ θ = α (r + γ Q (s^{'}, ar g a^{'} max Q (s^{'}, a^{'}; θ); θ^{-}) - Q (s, a; θ)) \nabla_{θ} Q (s, a; θ)$

05.06.24

a higher learning rate and a lower learning rate, both can cause the model to diverge.
a higher learning rate can cause the model to overshoot the minima.
a lower learning rate can cause the model to get stuck in a local minima.

01.06.24

rich python library can be used for rich text and beautiful formatting in the terminal.

28.05.24

SARSA is an on-policy algorithm, whereas Q-learning is an off-policy algorithm.
Equation for SARSA: $Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α (r_{t + 1} + γ Q (s_{t + 1}, a_{t + 1}) - Q (s_{t}, a_{t}))$
Equation for Q-learning: $Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α (r_{t + 1} + γ a^{'} max Q (s_{t + 1}, a^{'}) - Q (s_{t}, a_{t}))$
Off-policy algorithms require a replay buffer to store the transitions.

24.05.24

If you get this error:

hp@admin ~/D/D/RPS [1]> pip install --upgrade pip                       (py312) 
Traceback (most recent call last):
  File "/home/hp/miniconda3/envs/py312/bin/pip", line 5, in <module>
    from pip._internal.cli.main import main
  File "/home/hp/miniconda3/envs/py312/lib/python3.12/site-packages/pip/_internal/cli/main.py", line 8, in <module>
    from pip._internal.cli.autocompletion import autocomplete
  File "/home/hp/miniconda3/envs/py312/lib/python3.12/site-packages/pip/_internal/cli/autocompletion.py", line 9, in <module>
    from pip._internal.cli.main_parser import create_main_parser
  File "/home/hp/miniconda3/envs/py312/lib/python3.12/site-packages/pip/_internal/cli/main_parser.py", line 7, in <module>
    from pip._internal.cli import cmdoptions
  File "/home/hp/miniconda3/envs/py312/lib/python3.12/site-packages/pip/_internal/cli/cmdoptions.py", line 22, in <module>
    from pip._internal.cli.progress_bars import BAR_TYPES
  File "/home/hp/miniconda3/envs/py312/lib/python3.12/site-packages/pip/_internal/cli/progress_bars.py", line 9, in <module>
    from pip._internal.utils.logging import get_indentation
  File "/home/hp/miniconda3/envs/py312/lib/python3.12/site-packages/pip/_internal/utils/logging.py", line 14, in <module>
    from pip._internal.utils.misc import ensure_dir
  File "/home/hp/miniconda3/envs/py312/lib/python3.12/site-packages/pip/_internal/utils/misc.py", line 20, in <module>
    from pip._vendor import pkg_resources
  File "/home/hp/miniconda3/envs/py312/lib/python3.12/site-packages/pip/_vendor/pkg_resources/__init__.py", line 58, in <module>
    from pip._vendor.six.moves import urllib, map, filter
ModuleNotFoundError: No module named 'pip._vendor.six.moves'

then run curl -sS https://bootstrap.pypa.io/get-pip.py | python3 to reinstall pip.

sudo pacman -Syu xfce4 xfce4-goodies to install xfce4 desktop environment on arch linux.

to auto start xfce4 on boot, add:

if [ -z "$DISPLAY" ] && [ "$XDG_VTNR" = 1 ]; then
  exec startxfce4
fi

to ~/.bash_profile.

23.05.24

arch install script inside live arch iso: archinstall
almost all linux distros require internet connection

use iwctl to connect to wifi before running archinstall using:

iwctl
station list
station wlan0 get-networks
station wlan0 connect <SSID>
exit
ping google.com

22.05.24

rnn hidden state equation:

$h_{t} = tanh (x_{t} W_{x}^{T} + b_{x} + h_{t - 1} W_{h}^{T} + b_{h})$

where,
- $h_{t - 1}$ is the hidden state at time $t - 1$ ,
- $x_{t}$ is the input at time $t$ ,
- $W_{x}$ is the input-to-hidden weight matrix,
- $b_{x}$ is the input-to-hidden bias,
- $W_{h}$ is the hidden-to-hidden weight matrix,
- $b_{h}$ is the hidden-to-hidden bias.
architecture of a simple rnn:
how can agents remember the past in reinforcement learning?

20.05.24

if conda activate returns an error:

  usage: conda [-h] [-v] [--no-plugins] [-V] COMMAND ...
  conda: error: argument COMMAND: invalid choice: 'activate' (choose from 'clean', 'compare', 'config', 'create', 'info', 'init', 'install', 'list', 'notices', 'package', 'remove', 'uninstall', 'rename', 'run', 'search', 'update', 'upgrade', 'build', 'content-trust', 'convert', 'debug', 'develop', 'doctor', 'index', 'inspect', 'metapackage', 'render', 'skeleton', 'env', 'verify', 'server', 'pack', 'token', 'repo')

then run source ~/anaconda3/etc/profile.d/conda.sh to export conda functions.

03.05.24

linear quadratic regulator (LQR) for optimal control.

02.05.24

model quantization for fpga

01.05.24

fpga mindmap:

    %%{init: {'theme': 'default', 'themeVariables': { 'fontSize': '18px', 'fontFamily': 'Montserrat'}}}%%
    mindmap
      root((fpga))
        $$$
        logic gates
          nand
          nor
        language
          verilog
          vhdl
        boards
          xilinx
          altera
          digilent
          pynq Z2
          DE10
            linux lxde
          tang nano 9k
        machine learning
          model quantization
          hls4ml
          CNN
            mnist dataset
              live testing using camera
          Transformers

29.04.24

types of low-level chips: microcontroller, fpga

microcontroller mindmap:

    %%{init: {'theme': 'default', 'themeVariables': { 'fontSize': '20px', 'fontFamily': 'Montserrat'}}}%%
    mindmap
      root((microcontroller))
        $
        8-bit
          arduino
            uno r3
              atmega328p
                SMD (surface mount device)
                DIP (dual in-line package)
                  easier to solder (through hole)
              16MHz crystal
          bare-metal atmega328p
            breadboard    
            crystal - 16MHz
            capacitors
            resistors
            switch, leds  
        32-bit
          stm32 
          esp32
          rpi pico
        Language
          C
          Rust
          CircuitPython and Micropython
        Tools
          avrdude

TinyMaix for inference of MNIST on atmega328p.

26.04.24

in partially observable markov decision process (POMDP), observation $\neq =$ state.
mermaid.js for creating diagrams such as mindmaps in markdown.
uno r3 input voltage is 7-12V, output voltage is 5V.

25.04.24

\ldots is for low dots in latex whereas \cdots is for center dots.
\hrulefill in latex creates a horizontal line that fills the width of the page.

24.04.24

strikethrough text in latex using:

\usepackage{soul}
\st{Strike through this text}

mac doesn’t support multiple hdmi to usb-c adapter. Only one hdmi can be connected at a time.

22.04.24

ssh port forwarding: ssh -L 8080:localhost:8080 user@remotehost forwards the remote port 8080 to the local port 8080.

20.04.24

sparse rewards lead to unstable training, whereas dense rewards lead to faster convergence.
sparse rewards are more realistic but harder to learn.
reward shaping through imitation learning can help in learning sparse rewards.

19.04.24

jumanji connector-v2 environment does not guarantee solvability.
get the version of ubuntu using lsb_release -a.

installing nle using pip can be a chore since the error messages are not helpful. steps to install on ubuntu 22.04:

sudo apt-get install -y build-essential autoconf libtool \
    pkg-config python3-dev python3-pip python3-numpy git \ 
    flex bison libbz2-dev
wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | sudo apt-key add -
sudo apt-add-repository 'deb https://apt.kitware.com/ubuntu/ jammy main'
sudo apt-get update && apt-get --allow-unauthenticated install -y \
    cmake \
    kitware-archive-keyring
conda create -n py38 python=3.8
conda activate py38
pip install nle

18.04.24

xla is a compiler for machine learning models. It performs better on GPU and TPU.
jax uses xla as a backend and has syntax similar to numpy.
mava is a marl library that uses jax as a backend.

17.04.24

in sb3, batch_size can be changed for DQN and PPO.
what’s the right batch size to use?

16.04.24

sbx now supports custom activation functions. resolved with PR#41. Now, it works with TD3, PPO, SAC, DDPG and DQN.
policy $π$ specifies $P (a ∣ s)$ .
In on-policy, behaviour policy $=$ estimation policy, whereas, in off-policy, behaviour policy $\neq =$ estimation policy. PPO is on-policy and DQN is off-policy.
Generally, on-policy is used for fast environments and off-policy is used for slow environments.
tmux:
- prefix key: Ctrl-b by default.
- New window: <prefix>c
- Next window: <prefix>n
- Previous window: <prefix>p.
tui file manager: nnn.

15.04.24

sumo charging station on the road using:

<additional>
  <chargingStation chargeDelay="2" chargeInTransit="0" power="200000" efficiency="0.95"  startPos="10" endPos="25" id="cS_2to19_0a" lane="2to19_0"/>
</additional>

dynamic time warping (DTW) (Berndt & Clifford 1994) is used to check the similarity between two time series.
change target vehicle color to red: traci.vehicle.setColor(vehID, (255, 0, 0))

14.04.24

to get the color of the plotted line in matplotlib:

p = plt.plot(x,x, x,x*2, x,x*3)
colors = [line.get_color() for line in p]

default plt.figsize is (6.4, 4.8) inches.

13.04.24

multi-armed bandit is a simpler version of reinforcement learning, regret equation:

$ρ = T μ^{*} - t = 1 \sum T \overset{r}{^}_{t}$

where,
- $ρ$ is the cumulative regret,
- $T$ is the number of time steps,
- $μ^{*}$ is the optimal reward at each time step,
- $\overset{r}{^}_{t}$ is the reward at time step $t$ .
can multi-armed bandit perform better than reinforcement learning in some cases?

12.04.24

spatio-temporal dataset: pems08.
ctrl-tab in vscode to switch between open files.
policy gradient is on-policy whereas q-learning is off-policy.
equation for policy gradient: $θ \leftarrow θ + α \nabla_{θ} J (θ)$ which uses stochastic gradient ascent.

06.04.24

pure param embeddings are randomly initialized and learned during training. they are not tied to any input token.
cross attention with a pure param embedding is getting common.

05.04.24

pwnagotchi runs on rpi zero w and uses a wifi adapter to capture handshakes.
it uses rl to learn the best way to capture handshakes.
what’s the bill-of-materials for a pwnagotchi?

04.04.24

neovim distributions exist such as lazyvim and spacevim.
they come with pre-installed plugins and configurations.
lazygit is a terminal based git client with cool UI.
lazydocker is a terminal based docker client with cool UI.

03.04.24

rpi pico w has two cores.
code for printing even numbers on core0 and odd numbers on core1:

from time import sleep
import _thread
def core0_thread():
    counter = 0
    while True:
        print(counter)
        counter += 2
        sleep(1)
def core1_thread():
    counter = 1
    while True:
        print(counter)
        counter += 2
        sleep(1)
second_thread = _thread.start_new_thread(core1_thread, ())
core0_thread()

02.04.24

if importing both torch and tensorflow in the same script, and you get an error:

F ./tensorflow/core/kernels/random_op_gpu.h:246] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch<Distribution>, num_blocks, block_size, 0, d.stream(), key, counter, gen, data, size, dist) status: Internal: invalid configuration argument

then import tensorflow before torch.

sbx doesn’t support custom activation functions yet. work in progress. link to github issue.

01.04.24

tensorboard logs only 1000 steps by default to preserve memory but this results in exported csv files lacking data.
To increase the number of steps in tensorboard logs, use --samples_per_plugin=scalars=10000 in the tensorboard command.
rgb smd led model: KY-009. Working voltage: 2.8 for red, 3.2 for green, 3.2 for blue. Forward current: 20mA.

29.03.24

setting up a new rpi pico w with micropython requires downloading the micropython firmware.
thonny is the preferred editor. It is available in standard ubuntu repo.
if you get an error Unable to connect to /dev/ttyACM0: [Errno 13] could not open port /dev/ttyACM0: [Errno 13] Permission denied: '/dev/ttyACM0' try, sudo usermod -a -G dialout <username> and then logout or reboot.
thonny in the ubuntu repos is kinda outdated and doesn’t have native support for pico w. download the latest using:

wget -O thonny-latest.sh https://thonny.org/installer-for-linux   
chmod +x thonny-latest.sh
./thonny-latest.sh

in thonny, go to Run -> Interpreter -> Micropython (Raspberry Pi Pico) -> install or update micropython.
hold the BOOTSEL button and then plug the micro-usb to get the mcu into filesystem mode.
pico w code for blinking on-board led is different from pico because its connected to a gpio on the wireless chip instead.
code for blinking on-board led on pico w:

import machine
led = machine.Pin("LED", machine.Pin.OUT)
led.off()
led.on()

save the file as main.py on the pico w filesystem to make it run on boot.
images:
1. rpi pico gpio pins
2. rpi pico in its packaging
3. rpi pico alongside arduino for size comparison
4. rpi pico led blink

rpi pico alongside arduino for size comparison

28.03.24

overleaf docker container: github link.
texstudio also works well, sudo apt install texstudio.

27.03.24

trajectory stitching involves piecing together parts of the different trajectories.
it helps offline rl match the performance of online rl.
sub-optimal algorithms can be stitched to perform better than them.

26.03.24

in latex, \include{} adds a new page, instead use \input{}.
embedded firmware just means the arduino code.

24.03.24

rpi pico supports micropython and its only 6$. ¯\_(ツ)_/¯
its also dual core so it can multi-task.
simpla package in SUMO does not work with libsumo.

23.03.24

STM32F103RB has 128KB flash and 72MHz clock speed. It was about 14$.
micropython requires a minimum of 256KB flash.
micro:bit v2 has 512KB flash, 128KB RAM, and 64MHz clock speed. It has nRF52 chip.
micro:bit can be programmed using micropython.
python access index using enumerate:

for index, element in enumerate(['a', 'b', 'c']):
    print(index, element)

22.03.24

db9 is a serial port connector. db15 is a vga connector. T_T

21.03.24

pdf on how to use rplidar on windows to scan the environment.

20.03.24

nvim config is stored at ~/.config/nvim/init.vim.
minimal vim/nvim config:

syntax on
set tabstop=4
set shiftwidth=4
set expandtab
set autoindent
set number
set ruler

for error: AttributeError: module 'tensorflow_probability' has no attribute 'substrates' use import tensorflow_probability.substrates.jax as tfp.
Parse local TensorBoard data into pandas DataFrame

19.03.24

MAE loss is less sensitive to outliers.
MSE loss penalises large errors.
MAE is not differentiable whereas huber loss is better because its differentiable.
images:
1. mae vs mse vs huber
2. huber at different values of $δ$ can become MSE or MAE.

$L (θ) = E [(r + γ max_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a))^{2}]$
in vim, switch between splits: Ctrl-W + [hjkl].
and reload the current file using :e.
ai inference hardware is getting better. tenstorrent sells e150 for 75k inr (shipping included).
quantization reduces the size of the model and makes it less memory hungry.

18.03.24

rpi pins max output is 3.3v.
how to monitor the rpi temperature?
is gpio cleanup necessary?

16.03.24

gpio pin layout is actually this way:

5v to 3.3v converter: HW-122 (AMS1117-3.3).
the converter can be used for rpi to arduino serial communication.

15.03.24

ring attention is useful for increasing the context size.
miniforge works better on raspberry pi.
pinout.xyz for pin layout.

13.03.24

UART is a serial communication protocol.
Enabling serial on RPi 4:
- sudo raspi-config
- Interfacing Options > Serial > No > Yes
- Reboot
GPIO connections:
- TX of RPi to RX of USB to TTL
- RX of RPi to TX of USB to TTL
- GND of RPi to GND of USB to TTL
minicom can be used to access the serial console of RPi. (sudo apt install minicom)
minicom -b 115200 -o -D /dev/ttyUSB0 to start minicom with baud rate 115200 and device /dev/ttyUSB0
disable hardware flow control in minicom using Ctrl+A > O > Serial port setup > F > No

12.03.24

the notes belong to different categories, can I use a LLM to classify them without any labels? Each bullet point is a note and the category is the label.
the categories could be:
1. Embedded
2. ML
3. GPU/Infra
4. Programming
5. Latex
6. Unlabelled

11.03.24

to reduce matplotlib xticks:

num_xticks = 5  # Number of x-ticks to show
step = len(time_steps) // num_xticks
plt.xticks(time_steps[::step], rotation=45, fontsize=15)  # Set x-axis ticks to show only selected time steps

usb-c power delivery (pd) can deliver variable voltage and current using software negotiation.
power delivery trigger board can be used to negotiate power delivery and get a fixed voltage and current.
\usepackage{graphicx} and \usepackage{subcaption} for subfigures in latex.

10.03.24

how to flash a blank stm32f030f4p6 chip?
blinking led is the hello world of embedded systems
today’s commit deletes the old format files.

nvidia-driver-350 is compatible with cuda-11.8.
nvidia-driver-250 is compatible with cuda-11.5.
to switch display driver from nvidia to intel, use nvidia-prime:

sudo apt install nvidia-prime
sudo prime-select intel

install cuda 11.8:

wget https://developer.download.nvidia.com/compute/cuda/repos/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run

and update path using:

$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\
                         {LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

when building cuda libraries using ninja if you get an error:

/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’

then install gcc-10 and g++-10:

sudo apt install gcc-10 g++-10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 10
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 10

and update version:

Ubuntu 22.04.1 LTS
Cuda compilation tools, release 11.8, V11.8.89
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
g++ (Ubuntu 9.5.0-1ubuntu1~22.04) 11.3.0

bash_aliases is a file to store aliases for bash commands such as export PATH and export LD_LIBRARY_PATH.
To install pytorch with cuda support:

conda install pytorch=*=*cuda* cudatoolkit -c pytorch

09.03.24

there’s no desktop ARM processors.
a usb to ttl converter pl2303hx can be used to access the serial console of a raspberry pi.
ssh gives virtual console whereas serial console gives physical console.
serial console doesn’t require wifi or hdmi.
arm is also risc.

08.03.24

embedded languages: c, c++, rust
rust can run bare metal on raspberry pi using no_std and no_main crate-level attributes
bare metal can be used to run code without an operating system

07.03.24

lora is duplex by default. It can send and receive at the same time.
analog pins on arduino can be used as digital pins too.
arduino D0 and D1 pins although set aside for TX and RX can also be used as digital pins.

05.03.24

nvidia display driver is different from nvidia cuda driver.
cuda version in nvidia-smi is not the installed version.
nvcc --version gives the installed cuda version.

04.03.24

neo6m gps module connects to the satellite and gives the location in NMEA format.
it has a cold start time of 27s and a hot start time of 1s. on my desk, it took 2-5 minutes to get a fix.
once fixed, it saves it to the eeprom and can be retrieved on the next boot.
the eepron battery is a coin cell.

03.03.24

einsum is cool. It uses the Einstein summation convention to perform matrix operation.
torch.einsum('ij,jk->ik', a, b) is equivalent to torch.matmul(a, b)
its drawbacks are that its not optimized on gpu (yet). Also doesn’t allow brackets in the expression.

>>> a = torch.rand(3, 5)
>>> a
tensor([[0.7912, 0.6213, 0.6479, 0.2060, 0.9857],
        [0.9950, 0.7826, 0.6850, 0.6712, 0.0524],
        [0.4367, 0.8872, 0.9622, 0.0159, 0.4960]])
>>> b = torch.rand(5, 3)
>>> b
tensor([[0.4560, 0.9680, 0.1179],
        [0.9072, 0.8982, 0.2926],
        [0.5526, 0.2779, 0.5810],
        [0.4366, 0.8061, 0.0065],
        [0.4744, 0.6915, 0.5326]])
>>> torch.einsum('ij,jk -> ik', a,b)
tensor([[1.8401, 2.3517, 1.1779],
        [1.8601, 2.4338, 0.7766],
        [1.7780, 1.8429, 1.1344]])
>>> torch.matmul(a, b)
tensor([[1.8401, 2.3517, 1.1779],
        [1.8601, 2.4338, 0.7766],
        [1.7780, 1.8429, 1.1344]])

stm32f030f4p6 as per the naming convention means:
- stm32 is the family of microcontrollers
- f is the series = General purpose
- 0 is the core count = ARM Cortex-M0
- 30 is the line number
- f is the pin count = 20
- 4 is the flash size = 16KB
- p is the package type = TSSOP
- 6 is the temperature range = -40 to 85 degree celsius

02.03.24

The stm32f030f4p6 chip is SMD and in TSSOP-20 footprint.
I also bought SMD to THT adapters which are called breakout boards and soldered the chip to it.
STM32 nucleo boards come with a built-in st-link programmer and debugger.

images:

stm32f030f4p6 soldered onto a breakout board
stm32f030f4p6 with rpi v4 for scale

01.03.24

v100s has 5120 cuda cores and 640 tensor cores
quadro rtx 5000 has 3072 cuda cores and 384 tensor cores
tensor cores are more important for deep learning than cuda cores
installing miniconda:

# install miniconda
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
source ~/.bashrc

installing nvidia gpu drivers:

# install nvidia drivers
sudo apt update && sudo apt upgrade
sudo apt autoremove nvidia* --purge
ubuntu-drivers devices
sudo apt install nvidia-driver-525
sudo reboot
nvidia-smi
# install pytorch with cuda support
pip install torch torchvision torchaudio

ICs come in different packages: DIP, SOP, QFP, TQFP

29.02.24

softmax suffers from numerical instability due to floating point precision error

>>> import torch
>>> m = torch.nn.Softmax(dim=1)
>>> a = torch.tensor([[ 0.4981e3,  0.5018, -0.7310]])
>>> m(a)
tensor([[1., 0., 0.]])

normalization is a way to solve numerical instability

>>> torch.nn.functional.normalize(a)
tensor([[ 1.0000,  0.0010, -0.0015]])
>>> m(torch.nn.functional.normalize(a))
tensor([[0.5762, 0.2122, 0.2117]])

28.02.24

color sensors (TCS34725, TCS3200) can detect intensity of R,G,B individually
because of open source, risc v is cheaper than arm and runs linux too
microcontroller (arduino, stm32) vs single board computer (raspberry pi, beaglebone)
models perform better when data is gaussian

27.02.24

warmup_step hyperparameter lowers the learning rate for the first few steps and then increases it
transformer = encoder + decoder + attention
K is the context window size in the attention mechanism which is the number of tokens that each token attends to.
attention in transformers has quadratic time complexity $O (K^{2})$
flash attention has linear time complexity $O (K)$
An Attention Free Transformer also has linear time complexity $O (K)$
wandb can be self-hosted too inside the docker container

26.02.24

cpu architectures: x86, x86_64, arm, arm64, risc-v
famous arm dev board: stm32
risc-v is open source and is gaining popularity
LuckFox Pico Plus RV1103 is a risc-v dev board with ethernet and can run linux
softmax not summing to 1 T_T
how to make LoRa full duplex?

25.02.24

rl implementations: stable-baselines3
cleanrl has single file implementations of rl algorithms
tianshou is a pytorch based rl library
Through Hole Technology (THT) vs Surface Mount Technology (SMT)

24.02.24

Found this Machine Learning Theory Notes GDrive

Deep Reinforcement Learning with Double Q-learning ↩

[2024] [2023] §

27.08.24 §

26.08.24 §

24.08.24 §

23.08.24 §

21.08.24 §

03.08.24 §

02.08.24 §

29.07.24 §

26.07.24 §

23.07.24 §

20.07.24 §

17.07.24 §

16.07.24 §

06.07.24 §

05.07.24 §

04.07.24 §

01.07.24 §

29.06.24 §

28.06.24 §

25.06.24 §

24.06.24 §

22.06.24 §

21.06.24 §

20.06.24 §

19.06.24 §

18.06.24 §

17.06.24 §

16.06.24 §

15.06.24 §

13.06.24 §

12.06.24 §

05.06.24 §

01.06.24 §

28.05.24 §

24.05.24 §

23.05.24 §

22.05.24 §

20.05.24 §

03.05.24 §

02.05.24 §

01.05.24 §

29.04.24 §

26.04.24 §

25.04.24 §

24.04.24 §

22.04.24 §

20.04.24 §

19.04.24 §

18.04.24 §

17.04.24 §

16.04.24 §

15.04.24 §

14.04.24 §

13.04.24 §

12.04.24 §

06.04.24 §

05.04.24 §

04.04.24 §

03.04.24 §

02.04.24 §

01.04.24 §

29.03.24 §

28.03.24 §

27.03.24 §

26.03.24 §

24.03.24 §

23.03.24 §

22.03.24 §

21.03.24 §

20.03.24 §

19.03.24 §

18.03.24 §

16.03.24 §

15.03.24 §

13.03.24 §

12.03.24 §

11.03.24 §

10.03.24 §

09.03.24 §