home scroll deno

AI learning blog August 2025

August 2, 2025

NLP transfer learning
Word Embeddings in NLP: An Introduction
https://hunterheidenreich.com/posts/intro-to-word-embeddings/

Distributional semantics
https://en.wikipedia.org/wiki/Distributional_semantics
distributional hypothesis: linguistic items with similar distributions have similar meanings.

word2vec:
Tool for computing continuous distributed representations of words.
https://code.google.com/archive/p/word2vec/
https://www.tensorflow.org/text/tutorials/word2vec
https://en.wikipedia.org/wiki/Word2vec
Preservation of semantic and syntactic relationships:
Word_vector_illustration.png
Get training data by extracting text from Wikipedia
https://mattmahoney.net/dc/textdata.html


Part 9.3: Transfer Learning for NLP with Keras
The code loads an pretrained embedding model as variable model.
Then it creates an embedding layer as a Keras layer with the parameters from this model, and that has an output shape of 20.
hub_layer = hub.KerasLayer(
model,
output_shape=[20],
input_shape=[],
dtype=tf.string,
trainable=True
)
The embedding layer can convert each of the reviews into a 20 number vector.
For example:
print(hub_layer(train_examples[:1]))
prints
tf.Tensor( [[
1.7657859 -3.882232 3.913424 -1.5557289 -3.3362343 -1.7357956
-1.9954445 1.298955 5.081597 -1.1041285 -2.0503852 -0.7267516
-0.6567596 0.24436145 -3.7208388 2.0954835 2.2969332 -2.0689783
-2.9489715 -1.1315986
]], shape=(1, 20), dtype=float32)
regardless of how many words the input has.

August 3, 2025

The model used in 9.3 has its desctription at
https://www.kaggle.com/models/google/gnews-swivel/code
"This module .. maps from text to 20-dimensional embedding vectors."

August 5, 2025

Word Embedding using Universal Sentence Encoder in Python
https://www.geeksforgeeks.org/python/word-embedding-using-universal-sentence-encoder-in-python/

How to load TF hub model from local system
https://stackoverflow.com/questions/60578801/how-to-load-tf-hub-model-from-local-system

https://xianbao-qian.medium.com/how-to-run-tf-hub-locally-without-internet-connection-4506b850a915

where does tensorflow_hub store model on ubuntu?

module_url ="https://www.kaggle.com/models/google/universal-sentence-encoder/tensorFlow2/universal-sentence-encoder/2?tfhub-redirect=true" print(hub.resolve(module_url))
prints
/tmp/tfhub_modules/3bdf4002a346590d64dd2aee920834f58917f372

https://www.tensorflow.org/hub/caching

Tutorial: Universal Sentence Encoder
https://www.tensorflow.org/hub/tutorials/semantic_similarity_with_tf_hub_universal_encoder

The tutorial compares similarity scores computed using sentence embeddings align with human judgements.
STS Benchmark
http://ixa2.si.ehu.es/stswiki

The benchmark dataset is downloaded:
sts_dataset = tf.keras.utils.get_file(
fname="Stsbenchmark.tar.gz",
origin="http://ixa2.si.ehu.es/stswiki/images/4/48/Stsbenchmark.tar.gz",
extract=True)
The folder location of the downloaded dataset is at
~/.keras/datasets/stsbenchmark
This folder contains a readme.txt.
"The benchmark comprises 8628 sentence pairs."

csv files
column headers:
genre filename year score sentence1 sentence2

Date


Follow Me

discord