atmelino

Mar 01, 2024

Linear Regression Tutorial using Tensorflow and Keras
Youtube video
Linear Regression Tutorial using Tensorflow and Keras
https://www.youtube.com/watch?v=yuAZQJ5BnJk

OpenCV University
https://opencv.org/university/

TensorFlow Bootcamp Example

works

Input:
Boston Housing Data
reduced to number of rooms column

Code:

model = Sequential()
model.add(Dense(units=1, input_shape=(1,)))
model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.005), loss="mse")
history = model.fit( X_train_1d, y_train, batch_size=16, epochs=101, # epochs=10, validation_split=0.3, verbose=0 )

Output:

Predicted price of a home with 3 rooms: $11.4K
Predicted price of a home with 4 rooms: $14.7K
Predicted price of a home with 5 rooms: $18.0K
Predicted price of a home with 6 rooms: $21.4K
Predicted price of a home with 7 rooms: $24.7K

TensorFlow Bootcamp setting with bp - age data

does not work

Input:

age	blood pressure
10	82
22	85
31	89
45	92
52	95
61	97
72	98

Code:

model = Sequential()
model.add(Dense(units=1, input_shape=(1,)))
model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.005), loss="mse")
history = model.fit( bp, age, batch_size=5, epochs=100, # epochs=10, validation_split=0.3, verbose=0 )

Output:
Sometimes gives negative prediction values

Predicted age of a person with 80 blood pressure: [-48.579433]
Predicted age of a person with 90 blood pressure: [-54.777733]
Predicted age of a person with 100 blood pressure: [-60.976032]

Mar 02, 2024

more regression examples

"Basic regression: Predict fuel efficiency "
https://www.tensorflow.org/tutorials/keras/regression

"Linear Regression In Neural Network With Tensorflow 2.0"
https://www.youtube.com/watch?v=XZ4zLgxhgtk
Pathshala

Pathshala code

works
Input:

x y
0 1 4
1 2 6
2 3 8
3 4 10
4 5 12
5 6 14
6 7 16
7 8 18
8 9 20

Code:

model=Sequential()
model.add(Dense(1,input_shape=(1,)))
model.compile(optimizer='sgd', loss="mean_absolute_error")
model.fit(x,y,epochs=100,verbose=0)

Output:

Input: 1 prediction: [2.9726133]
Input: 2 prediction: [5.1563377]
Input: 3 prediction: [7.3400626]
Input: 4 prediction: [22.626135]

Mar 03, 2024

bp and age with Pathshala code

1000 epochs
three consecutive runs, weight w0 value:

Pathshala numbers | bp and age numbers
weight w0= [2.054422] | weight w0= [-0.03433532]
weight w0= [2.0014956] | weight w0= [1.0083628]
weight w0= [2.0016782] | weight w0= [-0.15887064]

Mar 04, 2024

Setting initial weights and bias

Layer weight initializers
https://keras.io/api/layers/initializers/

Dense layer API documentation
https://keras.io/api/layers/core_layers/dense/
- kernel_initializer: Initializer for the kernel weights matrix.
- bias_initializer: Initializer for the bias vector.

Mar 05, 2024

Numerical Optimization for Machine Learning
https://machinelearningmastery.com/optimization-for-machine-learning/

Mar 07, 2024

T81-558: Applications of Deep Neural Networks uses a while loop
to go through multiple cycles for training the neural network
for the XOR function.

The function

model = Sequential()

is called in every iteration of the while loop.

is the model intialized with new random weights in each iteration,
or does the model retain the weights and biases calculated in the previous iteration?

Mar 10, 2024

Nomic’s embedding model (nomic-embed-text-v1) was trained on Paperspace H100s
https://blog.nomic.ai/posts/nomic-embed-text-v1

https://huggingface.co/nomic-ai/nomic-embed-text-v1

Getting started https://docs.nomic.ai/reference/getting-started/

Creating a conda environment for nomic

conda create -n nomic python=3.10

Then activate this environment before installing packages!

conda activate nomic

install official Python bindings

pip install nomic

Nomic Atlas Python Client
https://github.com/nomic-ai/nomic

Login or create your Nomic account:

nomic login

Follow the instructions to obtain your access token

nomic login [token]

Follow the instructions to obtain your access token.

Mar 11, 2024 part 1

ROC Curve

ROC curve - explained
by TileStats
https://www.youtube.com/watch?v=LpdTtX4WsiM

roc_curve_01

roc_curve_02

How to generate an ROC curve

We have a set of items, and each item is classified as either positive or negative.

There is a distinction between the real positive or negative, and the diagnosed positive or negative.

For each item, a value is measured and assigned a continuous variable.

A cut-off value is determined, such that every item with a value below the cut-off is diagnosed as negative, and every item with a value above the cut-off is diagnosed as positive.

There will be true positives, false positives, true negatives, and false negatives.

For the ROC curve, we vary the cut-off value from the lowest to the highest of the possible range.

For each cut-off value, we count:

TP - number of true positives
FP - number of false positives
TN - number of true negatives
FN - number of false negatives

Example:

roc_curve_03

For each cut-off value, we calculate:

sensitivity=TP/(TP+FN)
specificity=TN/(TN+FP)

and enter the calculated values in a table:

roc_curve_04

We plot the ROC curve with Specificity on the x-axis and Sensitivity on the y-axis

roc_curve_05

Mar 11, 2024 part 2

Getting error again, despite using tensorflow version 2.15.0:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

Solution:
https://stackoverflow.com/questions/58636087/tensorflow-valueerror-failed-to-convert-a-numpy-array-to-a-tensor-unsupporte

x_train = np.asarray(x_train).astype(np.float32)
y_train = np.asarray(y_train).astype(np.float32)
x_test = np.asarray(x_test).astype(np.float32)
y_test = np.asarray(y_test).astype(np.float32)

Interesting:
The code splits training and test data according to

x_train, x_test, y_train, y_test = train_test_split( x, y, test_size=0.25, random_state=42)

Print x_test[0:10] starts with

x_test
[[ 0.7720559 -0.68284637 0.83903146 3.6320686 ..
[-0.05720153 1.4147578 1.8865114 14.458124 ..
[ 0.36275634 -0.8710929 -0.20844851 10.160343 ..
[ 0.2674291 -0.70973873 -1.2559285 3.8224766 ..
[ 1.1742831 1.7285022 -1.2559285 12.227631 ..

(This is after converting data to zscore)

Printing the whole dataframe shows that the x_test values are at id's:

1860 1861 0.772056 -0.682846 0.839031 3.632069 353 354 -0.057202 1.414758 1.886511 14.458125 1333 1334 0.362756 -0.871093 -0.208449 10.160344

The according product values are

a b c d e f g 1860 False False True False False False False 353 False True False False False False False 1333 False True False False False False False

in other words, c b b

and y_test shows as

y_test
[[0. 0. 1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0.]

Mar 12, 2024 Part 1

By default, the train_test_split function shuffles the matrices when generating the train and test data.
However, it offers the option to not shuffle the data:

train_test_split(y, shuffle=False)

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

When applying this option to the jh-simple-dataset data, the test data end up being at id 1500:

Printing the whole dataframe:

1500 1501 0.920177 0.195637 -0.208449 5.264137 .. c
1501 1502 -1.729662 -1.014519 -1.255928 3.632069 .. a
1502 1503 0.726135 -0.413923 -0.208449 6.624195 .. c
1503 1504 0.154570 -0.996591 -1.255928 11.112384 .. b
1504 1505 -2.855161 1.208583 1.886511 5.372942 .. a

Printing the test data:

x_test
[[0.9201767396007892 0.19563745517432052 -0.20844851356517272 5.264137437365784
[-1.729661903633492 -1.0145188576178745 -1.255928481229357 3.632068718682892
[0.7261354766498689 -0.4139227616395259 -0.20844851356517272 6.62419470293486
[0.15457040128417068 -0.9965906159468789 -1.255928481229357 11.112383679312812
[-2.855160995051586 1.2085831095855653 1.8865114217631958 5.37294201861131

The y_test data have the True in the correct positions
given that y_test is the product column starting at id=1500:
[c,a,c,b,a]

y_test
[[False False True False False False False]
[ True False False False False False False]
[False False True False False False False]
[False True False False False False False]
[ True False False False False False False]

This confirms, for a sample, that the product test data are the correct values.
The predicted values are

predicted values
[[0. 0.415 0.582 0.001 0.001 0. 0. ]
[0.591 0.087 0.001 0. 0.034 0.24 0.047]
[0. 0.395 0.594 0.006 0.005 0. 0. ]
[0. 0.718 0.27 0. 0.01 0.001 0. ]
[0.743 0.043 0. 0. 0.01 0.189 0.015]

or in argmax form:

[2 0 2 1 0]

So the highest probability for the product is
[c,a,c,b,a]

Showing a few more lines and comparing predicted and true values:

predicted vs true values
[2 0 2 1 0 1 3 1 0 2 1 1 2 0 1 0 1 1 1 1 1 1 1 1 1 2 3 1 0 0 2 1 1 1 2
[2 0 2 1 0 2 3 1 0 3 5 1 2 0 1 0 1 1 2 1 4 1 1 5 1 2 3 1 0 0 2 1 1 1 1

So there are a few incorrect predictions.

Mar 12, 2024 Part 2

Histogram of true and predicted values

https://stackoverflow.com/questions/39841733/matplotlib-histogram-how-to-display-the-count-over-the-bar

https://stackoverflow.com/questions/30112420/histogram-for-discrete-values-with-matplotlib

Mar 13, 2024

t81_558_class_04_3_regression class example uses flatten() function.
This refers to a numpy array:
https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html
The line

chart_regression(pred.flatten(),y_test)

converts pred from an array of arrays to a 1-dimensional array of numbers
which is the format that the chart_regression function requires.

AI learning blog March 2024