April 4, 2026
My attempt to understand TurboQuant
Google announced a new quantization algorithm called TurboQuant that enables massive compression of KV cache on a LLM without compromising quality. My attempt here is to understand as best as I can what TurboQuant is, given that I don’t know too much about LLMs.
What is vector quantization and why it is needed?
Suppose you have a vector of 32-bit floats with dimension d, each vector coordinate takes 4 bytes. If d is very high, as is usually the case in LLMs, you face a memory problem. The solution to that problem is to reduce its overall size by compressing it. That is called vector quantization.
The simplest way of doing that is by mapping each coordinate of the vector to a known prespecified set, called codebook. For example, let’s say our vector is [0.76, -0.28, 0.10, 0.57] and we defined the codebook as [-1.0, -0.5, 0.0, 0.5, 1.0], then we can map each coordinate to the index of the codebook whose value is closest to the coordinate. In this case, that would result in the following vector [4, 1, 2, 3] . Because we only need 3 bits to represent the indices of the codebook, we can say we reduced the memory footprint from 128 bits to 12 bits. You can also see that when we try to get our vector back, we lose information: 0.76 became 1.0, -0.28 became -0.5, and so forth. Doing that kind of compression adds a lot of error, so the challenge is figuring out a codebook that minimizes that error. We’re mainly interested in
-
Mean Squared Error (MSE): How different is the reconstructed vector from the original?
-
Inner product error: If we compute the dot product between two vectors, how much does the dot product change after one of the vectors is quantized?
I am not really familiar with vector quantization algorithms but it looks like a common approach is trying to learn from trained model weights using K-means algorithm. Essentially, the codebook is the centroid.
Google’s idea
Instead of trying to figure out the codebook by looking at the data, they randomly transform the input by applying a random matrix. That may seem crazy but what happens is that by doing that kind of transformation you can make it so that your data falls into a very known distribution (in this case Beta distribution). Because your data is well-behaved, you can mathematically compute the optimal quantizer.
Google proved that a b-bit quantizer has an MSE upper-bound of $\frac{\sqrt{3}\times \pi}{2} * \frac{1}{ 4 ^b}$. Specifically, for $b = 1, 2, 3, 4$ we have $0.36, 0.117, 0.03, 0.009$, respectively.
MSE optimized quantizers are biased
The above idea works to get the optimal MSE quantizer. But they found out that those quantizers are biased towards the Inner product error. They figured out a solution with a two stage algorithm that first applies the above idea with a bit-width one less than their target budget and then applies a 1-bit quantizer called Quantized Johnson-Lindenstrauss on the residual error. I am not familiar with QJL, so I’ll keep it as a black box for now. But the essence is applying two quantization strategies, one with $(b-1)$ bits to get the optimal MSE quantizer and the other with 1 bit to fix the bias produced by the first one.
References
- https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
- https://x.com/TheVixhal/status/2039770532657377656
- https://arxiv.org/pdf/2504.19874
- https://gist.github.com/aaaddress1/a226e5e401b02a935805fabc97552db1
#google #llm #quantization #turbo quant
Tried gemma4 on llama.cpp
After my failed attempts with ollama and LM Studio (Gibberish results running gemma4 on my integrated GPU and Tried gemma4 on LM Studio), this time I tried llama.cpp as the inference engine. To my surprise it ran just fine on my integrated GPU. I mean, slow, but fine without errors. I ran the gemma:e2b version with
./llama-cli -hf ggml-org/gemma-4-E2B-it-GGUF
You can control how many layers are offloaded to GPU with the -ngl flag. For example, if we set it to 10:
./llama-cli -hf ggml-org/gemma-4-E2B-it-GGUF -ngl 10 -v
You can see offloaded 10/36 layers to GPU in the logs. You can set it to use all:
./llama-cli -hf ggml-org/gemma-4-E2B-it-GGUF -ngl all -v
and you’ll see offloaded 36/36 layers to GPU in the logs. GPU usage could be checked with intel_gpu_top program.
I uploaded an image of a kitten to the model. And it was able to identify it just fine.
#ollama #gemma4 #lamma.cpp #lm studio
Tried gemma4 on LM Studio
I also tried gemma4 on LM Studio, but it failed to load the model, not really sure why. See Gibberish results running gemma4 on my integrated GPU on my failed attempt at running gemma4 on ollama.
The Phi3.5 3.8B model worked though. I did a test by setting GPU offloading to 0. You can see the usage of GPU going from 100% to a very low number using intel_gpu_top.
April 3, 2026
Gibberish results running gemma4 on my integrated GPU
I’ve never really tried running LLM models on my machine. Decided to see if I could run gemma4. So, I’ve installed ollama, and the first thing I noticed was a warning that it would run the models on my CPU, not my GPU, because it did not detect any NVIDIA GPU. So I went on a journey trying to run gemma4 on my Intel Meteor Lake integrated GPU.
It turned out not to be that hard. You must set an environment flag on your ollama service.
Edit the service:
sudo systemctl edit ollama
And add:
[Service]
Environment="OLLAMA_VULKAN=1"
When I ran the model ollama run gemma4 and gave a prompt, it responded with a lot of gibberish. So, I removed the flag and ran again, and the model behaved just fine.
I wasn’t sure whether the flag worked. Trying to find that out, I came across ollama ps. That command showed how much compute was going to the CPU and how much to the GPU. With OLLAMA_VULKAN=1, it showed a 68%/32% split (in which 68% going to the CPU). And with the flag disabled, it showed 100% CPU usage. That meant that, for some reason, Ollama decided to split the model, which forced the data to move between CPU and GPU, and in that movement, something must have gone wrong.
To test whether the split was the problem, I ran a smaller model ollama run phi3.5:3.8b . And this time ollama ps showed compute was 100% GPU, and the model worked just fine.
I’d love to know why ollama decided split like that. I have 54 GiB of memory in my system, maybe ollama could not see through Vulkan driver how much of that was being allocated to GPU and it thought it would not have enough? And the second question is, even with the split, it should be working; there must be a bug somewhere that introduces the gibberish.
September 11, 2025
C++ concepts
I implemented a templated class in C++. One of its methods had an equality check, which made me wonder how C++ handled the fact that my generic type T might not support the == operator.
Let’s consider the following class to illustrate:
template <typename T> class A {
public:
A(T v) { value = v; }
bool is(T v) { return value == v; }
private:
T value;
};
You see the equality check value == v there. But v can be of any type. How does C++ handle that? Is there a way of enforcing that T must support ==?
So let’s consider the following struct Foo:
struct Foo {
std::string s;
};
So, the program below compiles just fine.
int main()
{
A<Foo> foo1 = Foo{"Hello"};
return 0;
}
Which is weird because we know that if we call foo1.is the program will likely break.
Let’s call foo1.is, then:
int main()
{
A<Foo> foo1 = Foo{"Hello"};
std::cout << foo1.is(Foo{"Hello"}) << "\n";
return 0;
}
Now the program fails to compile:
main.cpp: In instantiation of ‘bool A<T>::is(T) [with T = Foo]’:
main.cpp:35:25: required from here
main.cpp:20:20: error: no match for ‘operator==’ (operand types are ‘Foo’ and ‘Foo’)
20 | return value == v;
| ~~~~~~^~~~
So, it is only when we call the function that executes the unsupported operation that the compiler complains. I wondered if there was a way to enforce this requirement at the time of object instantiation, and there is.
C++20 introduced concepts, a way of specifying requirements explicitly. We can add the concept std::equality_comparable to our template:
#include <concepts>
template <std::equality_comparable T>
class A {
public:
A(T v) {
value = v;
}
bool is(T v) {
return value == v;
}
private:
T value;
};
and we get a compilation error at the instantiation:
main.cpp: In function ‘int main()’:
main.cpp:35:10: error: template constraint failure for ‘template requires equality_comparable class A’
35 | A<Foo> foo1 = Foo{"Hello"};
| ^
main.cpp:35:10: note: constraints not satisfied
If we now add support for the == operator in Foo, the program compiles:
struct Foo {
std::string s;
bool operator==(const Foo& other) const {
return s == other.s;
}
};
September 8, 2025
Look for bugs
From Look Out For Bugs:
The key is careful, slow reading. What you actually are doing is building the mental model of a program inside your head. Reading the source code is just an instrument for achieving that goal. I can’t emphasize this enough: programming is all about building a precise understanding inside your mind, and then looking for the diff between your brain and what’s in git.
August 12, 2025
Automatic database migrations with pg_advisory_lock
When you are implementing a Continuous Delivery pipeline, one of the issues you have to deal with is database migrations. You want to be able to automatically run the migrations. But if you have multiple instances of your app running, you have to make sure only one instance is able to run that migration. If you’re on Postgres, an easy way to solve this is to use pg_advisory_lock.
You acquire a lock by providing an integer that can represent anything, run the migration logic, then release the lock:
SELECT pg_advisory_lock(42);
-- run migration logic
SELECT pg_advisory_unlock(42);
Any concurrent process that attempts to run the same logic will be blocked until the lock is released.
August 4, 2025
Tail call optimization
I have always accepted the fact that a recursive implementation can be easily optimized by the compiler if the recursive call is in the tail position, but I never understood why. That can be easily understood in C if we reimplement a recursive function using goto. For example, let’s implement a recursive function that counts the number of numbers between two numbers:
int count(int start, int end, int n) {
if (start > end) return n;
return count(start + 1, end, n + 1);
}
Refactoring that to use goto, we can see that by having a recursive count call as the last action, it’s easy to optimize to keep the same stack frame while transitioning the state across iterations.
int count(int start, int end, int n) {
iterate:
if (start > end) return n;
start = start + 1;
n = n + 1;
goto iterate;
}
July 18, 2025
Create an image file from an image copied to the clipboard
Here’s way of creating an image file from an image copied to the clipboard
xclip -selection clipboard -t image/png -o > image.png
July 8, 2025
The Plight of the Misunderstood Memory Ordering
The plight of the misunderstood memory ordering. Interesting article that explains a common misunderstanding about the purpose of memory ordering when using atomics. The point of the article is that memory ordering is not about the atomic value being shared across threads, but about what to expect to happen around the atomic value being accessed. The purpose is to synchronize the relative ordering between atomic accesses made to one atomic value with memory accesses made to any other value.
July 7, 2025
Good Enough Programming
Good Enough Programming. Interesting post about the reasons we modify code and when we can call a software good enough.
Let’s say that last week you wrote an app that people love and use every day. What are the reasons you would touch it again? Why would you go back in and change anything?
There are only three:
- You want to add new things so that people like it more in the future
- You want to fix broken things, things people currently do not like
- You want to make your work easier. It looks difficult to maintain if one day you have to go back to fix or add things
Good enough software is that which delivers the value and you can walk away from it. Essentially, 1 and 2 are taken care of.
July 1, 2025
Bloom Filters By Example
Bloom Filters by Example. I’ve heard of Bloom Filters a bunch of times but never took the time to understand how it works. For some reason I thought it would be hard, so I never really tried. Decided to read this article today and it turned out to be a pretty simple data structure. A Bloom Filter is a Bit Vector, let’s use a bit vector of size 16 as an example:
Indexes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Bits: [ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ]
We add items to our set by applying n hash functions and using the results to set the bits of the bit vector.
For example, suppose we have a hash function h1. If we hash “apple” with h1, we might get a large number like:
h1("apple") = 74329
But our bit array only has 16 positions. So we reduce it to a valid index with:
index = 74329 % 16 = 9
This means: Set bit at index 9 to 1.
Indexes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Bits: [ ][ ][ ][ ][ ][ ][ ][ ][ ][✖][ ][ ][ ][ ][ ][ ]
So, if we use 3 hashes in our Bloom Filter implementation, we may end up with the following configuration:
Indexes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Bits: [ ][✖][✖][ ][ ][ ][ ][ ][ ][✖][ ][ ][ ][ ][ ][ ]
To check if an item is in the set, hash it with the same functions and see if all the resulting bit positions are set to 1 in the array. If they aren’t you know for sure, the item is not in the set.
June 26, 2025
Custom Query Parser in Express
I was looking for a way to support dot notation in the parameters of my query string in my HTTP endpoint. For example, /search?user.name=Bruno&car.name=Honda+Civic. I wanted req.query to be parsed in a structured way so that I could do
const {
user: { name: userName } = {},
car: { name: carName } = {},
} = req.query;
It turns out you can set a custom query parser in your Express app. Additionally, qs, a query string parser already used in Express, supports the dot notation. Here’s how to implement it:
const app = express();
// Use custom query parser
app.set('query parser', (str: string) => qs.parse(str, { allowDots: true }));
Here’s a simple demo implementation:
import express, { Request, Response } from 'express';
import * as qs from 'qs';
const app = express();
// Use custom query parser
app.set('query parser', (str: string) => qs.parse(str, { allowDots: true }));
interface SearchQuery {
user?: {
name?: string;
};
car?: {
name?: string;
};
}
app.get('/search', (req: Request<{}, {}, {}, SearchQuery>, res: Response) => {
const {
user: { name: userName } = {},
car: { name: carName } = {},
} = req.query;
if (userName) {
console.log('User name:', userName);
}
if (carName) {
console.log('Car name:', carName);
}
res.json({ userName, carName });
});
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`Server is running on port ${PORT}`);
});
#express #nodejs #typescript #query-string
Avoiding Redirects In Axios
I had a failing test when Axios made a request to one of my API endpoints. The test reported a failure to connect to the server. That was weird because I was sure the server was up and running. It turned out the endpoint I was testing implemented an HTTP redirect to an unavailable location. Axios was following the redirect and trying to connect to that location, which is why I was seeing a failure to connect to the server.
To avoid that, we can configure Axios not to follow redirects
test('should redirect to correct location', async () => {
const response = await axios.get('http://example.com', {
maxRedirects: 0,
validateStatus: null
});
expect(response.status).toBeGreaterThanOrEqual(300);
expect(response.status).toBeLessThan(400);
expect(response.headers['location']).toBe('https://expected.com/target');
});
June 24, 2025
Error Handling in TypeScript
I’m not very familiar with TypeScript patterns for handling errors. I was wondering what would be an interesting way to define domain-specific error types so that I can be very specific on what happened inside a function execution.
Consider a createUser function that creates a user. We can create a generic Rust-like Result type:
type Result<T, E> =
| { success: true; value: T }
| { success: false; error: E };
Then define domain-specific error types as a discriminated union:
type CreateUserError =
| { type: 'EmailAlreadyExists'; email: string }
| { type: 'InvalidEmailFormat'; email: string }
| { type: 'WeakPassword'; reason: string };
Our createUser function becomes:
function createUser(email: string, password: string): Result<User, CreateUserError> {
if (!isValidEmail(email)) {
return { success: false, error: { type: 'InvalidEmailFormat', email } };
}
if (!isStrongPassword(password)) {
return { success: false, error: { type: 'WeakPassword', reason: 'Too short' } };
}
if (emailExists(email)) {
return { success: false, error: { type: 'EmailAlreadyExists', email } };
}
const user = new User(email, password);
return { success: true, value: user };
}
June 18, 2025
TIL: codespell
Today I learned about codespell. A CLI tool for checking and fixing misspellings. I can check and fix if any of my blog posts have any misspellings with
codespell -f -w _posts
June 17, 2025
Lock-Free Rust: Building a Rollercoaster While It's on Fire
Lock-Free Rust: How to Build a Rollercoaster While It’s on Fire. In this article a lock-free array is built in Rust using atomics and memory ordering control. It’s useful to understand that lock-free algorithms are not so easy to build. You have to understand memory ordering semantics and how to apply them.
#rust #lock-free #atomics #memory-ordering
Why locks typically have worse performance than atomics?
Why locks typically have worse performance than atomics?
The main reason is that locks can rely on syscalls like futex to put threads to sleep when there’s contention, which introduces overhead such as context switches. In contrast, atomic operations are low-level CPU instructions executed entirely in user space, avoiding these costly transitions. Additionally, locks tend to serialize access to larger critical sections, while atomics enable more fine-grained concurrency, reducing contention and improving performance in many scenarios.
June 16, 2025
Atomics And Concurrency
Atomics And Concurrency. This article explains the importance of memory ordering when writing concurrent programs using atomics. Essentially, data races can occur because compilers and CPUs may reorder instructions. As a result, threads operating on shared data might observe operations in an unintended order.
Some programming languages, such as C++ and Rust, give you finer control over the memory model by exposing detailed options through their atomics APIs. In C++, for example, the memory models include:
- Relaxed: no ordering guarantees
- Sequentially consistent: enforces ordering on paired operations for specific variables
- Release–Acquire: introduces a global ordering barrier
Other languages, like Go, don’t provide this level of control. Instead, Go implements a sequentially consistent memory model under the hood.
Russ Coss does a great job explaining hardware memory models, how different programming languages exposes memory models control and Go’s memory model in the following articles:
June 13, 2025
Embeddings are underrated
Embeddings are underrated. Blog post on how underrated embeddings is for technical writers.
I’m still not very familiar with the world of embeddings, it was nice to see concepts. Essentially embeddings is a way of semantically representing text as a multidimensional vector of floats, making it easier to compare similarity across texts.
Word embeddings was introduced in the foundational paper Word2Vec, and is also how Large Language Models represent words and capture semantic relationships, although in more complex and advanced way.
The Illustrated Word2vec illustrates the inner workings of Word2Vec.
#embeddings #ml #nlp #word2vec
Systems Correctness Practices at Amazon Web Services
Systems Correctness Practices at Amazon Web Services. Article on the portifolio of formal methods used across AWS.
Our experience at AWS with TLA+ revealed two significant advantages of applying formal methods in practice. First, we could identify and eliminate subtle bugs early in development—bugs that would have eluded traditional approaches such as testing. Second, we gained the deep understanding and confidence needed to implement aggressive performance optimizations while maintaining systems correctness.
Here’s a list of techniques they use:
- P programming language to model and specify distributed systems. It was used, for example, on migrating Simple Storage Service (S3) from eventual to strong read-after-write consistency.
- Dafne programming language to prove that the Cedar authorization policy language implementation satisfies a variety of security properties
- A tool called Kani was used by the Firecracker team to prove key properties of security boundaries
- Fault Injection Service that injects simulated faults, from API errors to I/O pauses and failed instances
- Also property-based testing, deterministic simulation, and continuous fuzzing or random test-input generation
June 11, 2025
Compiler Explorer and nsjail
I read How Compiler Explorer Works in 2025 and a lightweight process isolation tool called nsjail caught my eye.
June 5, 2025
AI for Coding Tweet
Interesting tweet that resonates a lot with how I feel about the use of AI for coding. I can type faster, but not sure if I can deliver faster.
June 2, 2025
Switching away from OOP | Casey Muratori
Switching away from OOP | Casey Muratori. Casey Muratori always has strong takes against OOP. I thought it was worth making a note about this one:
The lie is if something is object oriented it will be easier for someone else to integrate, because it’s all encapsulated. The truth is the opposite. The more walled off something is the harder it is for someone to integrate because there’s nothing they can do with it. The only things they can do are things you’ve already thought of and provided an interface for and anything you forgot, they’re powerless. They have to wait for an update.
#oop #programming-paradigms #casey-muratori
How to Build an Agent
How to Build an Agent. I went through this tutorial today. It is very good for grasping the basics of how a coding agent works.
I really like how he presents what an agent is:
An LLM with access to tools, giving it the ability to modify something outside the context window. An LLM with access to tools? What’s a tool? The basic idea is this: you send a prompt to the model that says it should reply in a certain way if it wants to use “a tool”. Then you, as the receiver of that message, “use the tool” by executing it and replying with the result. That’s it. Everything else we’ll see is just abstraction on top of it.
May 30, 2025
Thoughts on thinking
Thoughts on thinking. Nice blog post on how the use of AI makes the author feel about his relationship to writing and understanding.
Intellectual rigor comes from the journey: the dead ends, the uncertainty, and the internal debate. Skip that, and you might still get the insight–but you’ll have lost the infrastructure for meaningful understanding. Learning by reading LLM output is cheap. Real exercise for your mind comes from building the output yourself.
Amp Is Now Available. Here Is How I Use It.
Amp Is Now Available. Here Is How I Use It.. A blog post from Thorsten Ball, that works at Amp, describing his use of Amp. I kind of like compiling this kind of of “how I use LLMs” articles. There’s always something new you learn that you can use to refine your coding experience. Here’s a couple of examples that caught my eyes:
Code Review
Run `git diff` to see the code someone else wrote. Review it thoroughly and give me a report
Code search
Find the code that ensures unauthenticated users can view the /how-to-build-an-agent page too
Interact with that database
Update my user account (email starts with thorsten) to have unlimited invites
May 29, 2025
The Biggest 'Lie' in AI? LLM doesn't think step-by-step
The Biggest “Lie” in AI? LLM doesn’t think step-by-step. Interesting video trying to make the point that the process in which a model arrives to a mathematical evaluation answer is not necessarily the process the model describes when asked to describe how it achieved the answer. In other words, the verbalization of the reasoning is not necessarily how they model reason, and it could be the case the verbalization might not even be key to reasoning.
What I found odd about the video is that it kind of makes a claim that that is the reason LLMs don’t think like humans do. However, I’d say humans also can think without verbalizing, and, actually, verbalizing the thought process could even be difficult in some cases.
Cline Browser Testing
Today I learned that Cline is able to open the browser and manually test your web app. I found that amazing. Here’s a demo from Cline’s founder Saoud Rizwan. Seems to be using Puppeteer under the hood.
#cline #browser-testing #puppeteer #ai-tools
Nova JavaScript Engine
Nova. Interesting JavaScript engine written in Rust using data-oriented design and Entity-Component-System architecture.
#javascript #rust #ecs #data-oriented-design
Why Cline Doesn't Index Your Codebase
Why Cline Doesn’t Index Your Codebase (And Why That’s a Good Thing). An interesting blog post by Cline on why they don’t use a RAG-based approach, which is common is similar products such as Cursor, to handle large codebases. In essence, their rational boils down to:
- they don’t think a RAG-based approach offers better codebase search results
- it’s a pain to keep the index up-to-date
- security
They say though that it may make sense for a product charging $20/month.
May 25, 2025
Compiling Rust for RISC-V
Today, I worked on a small example on how to compile a Rust program targeting a RISC-V architecture. Essentially, you add the correct target
rustup target add riscv64gc-unknown-linux-gnu
then configure the linker to use the appropriate GNU GCC linker and also the runner to QEMU, and statically link the C libraries.
[target.riscv64gc-unknown-linux-gnu]
linker = "riscv64-linux-gnu-gcc"
rustflags = ["-C", "target-feature=+crt-static"]
runner = "qemu-riscv64"
You can run the program with
cargo run --target riscv64gc-unknown-linux-gnu
#rust #riscv #cross-compilation
UUIDv7 Comes to PostgreSQL 18
UUIDv7 Comes to PostgreSQL 18. A blog post from Nile that discusses the new UUID version that will come with the next PostgreSQL release.
Essentially, in regard to the use of UUIDs in databases, there are 3 common concerns: sorting, index locality and size. The new version solves the sorting and index locality by using Unix Epoch timestamp as the most significant 48 bits, keeping the other 74 bits for random values.
By calling uuidv7(), a new UUIDv7 can be generated with the timestamp set to current time. An optional interval can be passed to generate a value for a different time
select uuidv7(INTERVAL '1 day');
May 24, 2025
CPU Governor in Pop!_OS
So, I have a Lemur Pro 13 notebook running Pop!_OS. Since I first started using it, I noticed the fan noise gets very loud quite frequently. It took me a while to figure out the cause, but I finally discovered the reason: it was running in maximum performance mode.
The CPU performance is managed by a component of the operating system called the governor, which controls how the CPU frequency is adjusted based on system load.
In Pop!_OS, there are two available governors: performance and powersave. You can check which ones are available with:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
performance powersave
You can check the current governor for each CPU core by running:
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
To change the governor to powersave for all CPUs, run:
echo powersave | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
#pop-os #linux #cpu-governor #performance
You Can Learn RISC-V Assembly in 10 Minutes
You Can Learn RISC-V Assembly in 10 Minutes | Getting Started RISC-V Assembly on Linux Tutorial . I’ve watched this video today to get a sense of how to program a simple thing using RISC-V assembly. It turned out pretty simple. In the video, it writes a simple Hello World! program. I went just a bit further and tried a program that prints the number 0 through 9.
With GNU toolchain for RISC-V, you can easily compile your program
riscv64-linux-gnu-as hello.s -o hello.o
riscv64-linux-gnu-gcc -o hello hello.o -nostdlib -static
and with qemu you can run it
qemu-riscv64 ./hello
Here’s what I ended up with
.section .data
char_buffer:
.byte 0 # Reserve one byte for ASCII character output
.section .text
.global _start
_start:
# -------------------------------
# Initialize loop control
# t0 = counter (0 to 9)
# t1 = limit (10)
# -------------------------------
li t0, 0 # counter = 0
li t1, 10 # limit = 10
# Load address of char_buffer into t2
la t2, char_buffer
loop:
# -------------------------------
# Print current digit as ASCII
# -------------------------------
li a7, 64 # syscall: write
li a0, 1 # fd: stdout
addi t3, t0, 48 # convert digit to ASCII ('0' + t0)
sb t3, 0(t2) # store character into buffer
mv a1, t2 # buffer address
li a2, 1 # length = 1 byte
ecall # make syscall to write digit
# -------------------------------
# Print newline character
# -------------------------------
li a7, 64 # syscall: write
li a0, 1 # fd: stdout
li t3, 10 # ASCII for newline '\n'
sb t3, 0(t2) # store newline into buffer
mv a1, t2 # buffer address
li a2, 1 # length = 1 byte
ecall # make syscall to write newline
# -------------------------------
# Loop control
# -------------------------------
addi t0, t0, 1 # increment counter
bne t0, t1, loop # continue if t0 != t1
# -------------------------------
# Exit program
# -------------------------------
li a7, 93 # syscall: exit
li a0, 0 # exit code 0
ecall
May 16, 2025
Notes on Amp
Some more notes on Amp. I bought five dollars worth of credits, and two prompts have consumed 75% of it. The problem is that it does a lot more than you asked for, consuming lots of credits. Also, there’s no way of bringing your own key.
Tweet that came out:
Gave @AmpCode a spin. Burned through my free credits fast, so I bought more. Two prompts later… five bucks gone 😅
May 15, 2025
EarlyRiders: Bitcoin-Denominated Investment Fund
EarlyRiders is a bitcoin-denominated investment fund.
At Early Riders we raise our fund in Bitcoin, maintain our capital in Bitcoin, require our portfolio companies to maintain Bitcoin reserves, and return capital to our limited partners in Bitcoin. Our goal is to return more Bitcoin to our limited partners than they invested in the fund:
The fund’s core philosophy is that if entrepreneurs are looking through the lens of an asset that appreciates over time, and everything is denominated according to that asset, they’ll need to spend the money with a high level of discernment and scrutiny.
I found that very interesting. I’ve always felt that the ease of raising large amounts of money made misallocating capital in startups a non-event.
#bitcoin #investment-fund #earlyriders
LLMs Get Lost In Multi-Turn Conversation
LLMs Get Lost In Multi-Turn Conversation (via). In this paper, large-scale simulation experiments are performed, and performance degradation is found in multi-turn LLM settings when compared to single-turn settings. From abstract:
Analysis of 200,000+ simulated conversations decomposes the performance degradation into two components: a minor loss in aptitude and a significant increase in unreliability. We find that LLMs often make assumptions in early turns and prematurely attempt to generate final solutions, on which they overly rely. In simpler terms, we discover that when LLMs take a wrong turn in a conversation, they get lost and do not recover.
The main explanations for this effect could be:
- premature and incorrect assumptions early in the conversation
- over-relying on previous incorrect responses, compounding the error
- overly adjusting responses to the first and last turn, forgetting middle turns
- overly verbose responses, muddling the context, and confusing next turns
AlphaEvolve: A Gemini-powered coding agent
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms (via). Google presents AlphaEvolve, an evolutionary coding agent that combines Gemini models, automated evaluators, and an evolutionary framework to design and discover advanced algorithms.
A more technical explanation can be found in the paper AlphaEvolve: A coding agent for scientific and algorithmic discovery.
#alphaevolve #google #ai-agents #gemini
AI and Productivity
I had this thought
There’s a difference between AI writing a percentage of someone’s code and AI making them more productive. That person remains the author - they still need to understand and verify the code. AI might do most of the writing, but productivity may stay the same.
This clarifies what the “vibe” part of “vibe coding” means. The amount you’re vibing is inversely proportional to the amount you’re understanding and verifying.
#ai #productivity #vibe-coding
Trying out Amp
Today I tried out Amp. It’s a VS Code extension AI code agent. It felt a bit less intrusive than Cline, although somewhat slower. Also, I don’t understand the web product proposal where you can have a team and people competing on AI usage.
May 14, 2025
What is npx?
What the heck is npx, which is occasionally used in JavaScript projects? It’s a CLI tool that comes with NodeJS that allows you to run
NodeJS scripts without installing them globally. For example,
npx -y whats-the-weather paris
The current weather in Paris is 'few clouds' with a temperature of 19°°C.
May 13, 2025
First time working with pnpm
First time working in a project with pnpm. pnpm is a JavaScript package manager written in Typescript. It is faster than npm and yarn, and it makes use of a content-addressable filesystem to store files and hard links, avoiding duplication and saving disk space.