home scroll deno

AI learning blog March 2026

March 1, 2026

J. Heaton class_10_3_text_generation

The example in chapter 10.3 uses an LSTM network to generate text. The code first turns a piece of text into sequences of maxlen (in this case 40) characters, and the subsequent character for each sequence is used as the y value for the training.

In the inference/prediction phase, a sequence of characters is fed into the trained model, and the output is an array of probabilities for each of the characters that were present in the original text.

on_epoch_end() function
The function on_epoch_end() is called at the end of each epoch, at which point the model has been trained to the extent that the model variable can already be used for predictions.

First, a seed is generated by randomly taking a string of length maxlen out of the text.

In a loop, the seed is vectorized into the variable x_pred, and the model is used to create a prediction into preds. The function sample() is then called to return the next predicted character.

The next seed is generated by removing the first character, and adding the next predicted character at the end.
The predicted character is printed to the screen, and the loop continues.
sample() function
The sample() function receives the predictions of each character as probabilities.

Next, the probabilities are adjusted, according to the temperature parameter (See below.)

Then the probabilities are normalized:
preds = exp_preds / np.sum(exp_preds)
each value in a dataset is divided by the total sum of all values in that dataset. This results in a normalized list where the sum of the new values equals 1.

Finally, an index for the predicted next character is chosen with the multinomial function(See below.)
temperature parameter
What does the temperature parameter in the language prediction do?
It varies the chances of characters with lower probabilities to be selected.
  • temperature < 1:
    characters with lower probability than the one with the highest will be made even less likely to be selected.
  • temperature = 1:
    no change.
  • temperature > 1:
    characters with lower probability than the one with the highest will be made more likely to be selected
Example of different temperature values
temperature=1.0: (original values)
LSTM_temperature_00

temperature=0.5:
LSTM_temperature_01

temperature=2.0:
LSTM_temperature_02

The following table shows the probability of various characters to be selected:

Index t=0.5
Probability
t=1
Probability
t=2
Probability
1 95% 73% 37%
12 4% 15% 17%

How much more likely is the character with index 1 to be selected versus the character with index 12?
We can look at the ratio of their probabilities p(1)/p(12).

t=0.5
ratio
t=1
ratio
t=2
ratio
23.4 4.8 2.2

We can see that when the temperature is 0.5, the character with index 1 becomes more likely to be selected.
It is obvious that with a temperature < 1, the character with the original greatest probability will become even more likely to be selected, which can be considered the more "safe" approach.
The higher we set the temperature, the more chances the originally less likely characters get to be selected.

Sample Draw: multinomial function
In the final step, the multinomial function returns an index according to the temperature-adjusted probabilities in the prediction.

The distribution of the drawn samples will be such that it matches the probabilities of the indices.
probas = np.random.multinomial(1, preds, 1)

For example, running the inference 100 times with temperature=2.0 will lead to the following distribution:
LSTM_multinomial

March 7, 2026

J. Heaton's class chapter 10.4 includes a link to a tutorial for implementing a transformer from scratch
https://www.tensorflow.org/text/tutorials/transformer


https://research.google/blog/transformer-a-novel-neural-network-architecture-for-language-understanding/

March 13, 2026

Chapter 10.5, "Programming Transformers with Keras"

The code includes lines where a function is called and an expression in parentheses is added right after the parentheses for the function argument, for example:
x = layers.LayerNormalization(epsilon=1e-6)(inputs)
What does this construct do?

Example with f(a)(b):

from "Python functions with multiple parameter brackets"
https://stackoverflow.com/questions/42874825/python-functions-with-multiple-parameter-brackets

f(a)(b) just means that the expression f(a) returns a value that is itself callable.
f(a)(b) calls f with one parameter a, which then returns another function, which is then called with one parameter b
def func(a):
 def func2(b):
  return a + b
 return func2

Note that func returns a function, not a value.


Applying this to the above code:
x = layers.LayerNormalization(epsilon=1e-6)(inputs)
has the same result as
intermediate = layers.LayerNormalization(epsilon=1e-6)
x = intermediate(inputs)

The code continues with
x = layers.Dropout(dropout)(x)

March 17, 2026

J. Heaton class Part 10.5: Programming Transformers with Keras provides a link to a Keras example that was the basis for the transformer code:

Timeseries classification with a Transformer model
https://keras.io/examples/timeseries/timeseries_classification_transformer/

It uses the FordA dataset
https://www.timeseriesclassification.com/description.php?Dataset=FordA

March 23, 2026

Transformers and LLM's learning links

J. Heaton class 10.5:
"If you are interested in implementing a transformer from scratch, Keras provides a comprehensive example."
Neural machine translation with a Transformer and Keras
https://www.tensorflow.org/text/tutorials/transformer

Keras API: MultiHeadAttention layer
https://keras.io/api/layers/attention_layers/multi_head_attention/
from here: Guides and examples using MultiHeadAttention
https://keras.io/examples/nlp/ner_transformers/
from here: Relevant Chapters from Deep Learning with Python
Language models and the Transformer
https://deeplearningwithpython.io/chapters/chapter15_language-models-and-the-transformer/



March 23, 2026

Going through the example at

Neural machine translation with a Transformer and Keras
https://www.tensorflow.org/text/tutorials/transformer

The example refers to the glossary at
https://developers.google.com/machine-learning/glossary

Terms looked up:
- sequence-to-sequence task
- Transformer
- self-attention
- attention

Preliminaries:
Text generation with an RNN
https://www.tensorflow.org/text/tutorials/text_generation

Neural machine translation with attention
https://www.tensorflow.org/text/tutorials/nmt_with_attention




March 25, 2026

The example at

Neural machine translation with attention
https://www.tensorflow.org/text/tutorials/nmt_with_attention


requires the library "tensorflow-text>=2.11"

However the jh_class conda environment is not compatible with this library.

conda create -n tensorflow-text python=3.10
Then activate this environment before installing packages!
conda activate tensorflow-text
Install packages:
conda install -c conda-forge tensorflow=2.19.0 -y
conda install -c conda-forge matplotlib -y
conda install -c conda-forge einops -y
pip install tensorflow-text
Warning appears:
Protobuf gencode version 5.28.3 is exactly one major version older than the runtime version 6.31.1 at tensorflow/core/protobuf/cluster.proto
To get rid of warning:
pip install "protobuf>=5.28.3,<6"

What does the 'b' prefix mean when printing a tensor?
The 'b' prefix in tensor output (e.g., b'Hello') indicates that the value is a byte string (a sequence of octets) rather than a standard Unicode string.

What does this construct do?
np.random.uniform(0,1,50) < 0.8
The expression np.random.uniform(0, 1, 50) < 0.8 generates a 1-dimensional NumPy array of 50 random floating-point numbers drawn from a uniform distribution over the interval [0.0, 1.0) and compares each value to 0.8, returning a boolean array of True or False.

Approximately 80% of the values in the generated array are expected to be True, assuming a large enough sample size.






March 31, 2026

The line
target_text_processor.adapt(train_raw.map(lambda context, target: target))
causes error
tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence

Text Vectorization
API page of tf.keras.layers.TextVectorization:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization
"The vocabulary for the layer must be either supplied on construction or learned via adapt()."

Example on this page:
OOV-token = out of vocabulary token.
When using output_mode='int', index 0 is reserved for masking, and the OOV token is assigned to index 1.

Example with tf.data.Dataset
The API page lists example with text vectorization: https://www.tensorflow.org/tutorials/load_data/text


Follow Me

discord