= 'Chios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and arid, with a ridge of mountains running the length of the island. The two largest of these mountains, Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the north of the island. The center of the island is divided between east and west by a range of smaller peaks, known as Provatas.'
data
# you can also replace the data with any other txt file you want to use
# data = open('input.txt', 'r').read() # should be simple plain text file - you can use any (small) file in txt format from the web or type your own.
Simple RNN Language Model
Our aim is to predict the next character given a set of previous characters from our data string. For our RNN implementation, we would take a sequence of length 25 characters as inputs to predict the next character.
The notation used here was introduced first here. This minimal character-level Vanilla RNN model was first written by Andrej Karpathy (@karpathy) and was decorated with the forward and backprop equations by students of the CS-GY-6613 course as part of an asignment.
import numpy as np
# creating a vocabulary of unique characters
= list(set(data))
chars = len(data), len(chars)
data_size, vocab_size print('data has %d characters, %d unique.' % (data_size, vocab_size))
data has 508 characters, 43 unique.
# Data pre-processing
# creating a dictionary, mapping characters to index and index to characters
= { ch:i for i,ch in enumerate(chars) }
char_to_ix print(char_to_ix)
= { i:ch for i,ch in enumerate(chars) }
ix_to_char print(ix_to_char)
{'9': 0, '(': 1, ')': 2, 'q': 3, 's': 4, ',': 5, 'a': 6, 'd': 7, 'P': 8, 'E': 9, 'c': 10, 'e': 11, '4': 12, '0': 13, '.': 14, ' ': 15, 'p': 16, 'm': 17, 'f': 18, 'k': 19, 'o': 20, 'l': 21, 'b': 22, '2': 23, '3': 24, 'v': 25, 'y': 26, 'n': 27, 'u': 28, '5': 29, 'h': 30, 't': 31, '7': 32, 'r': 33, 'i': 34, 'T': 35, 'C': 36, 'w': 37, '[': 38, ']': 39, '8': 40, 'g': 41, '1': 42}
{0: '9', 1: '(', 2: ')', 3: 'q', 4: 's', 5: ',', 6: 'a', 7: 'd', 8: 'P', 9: 'E', 10: 'c', 11: 'e', 12: '4', 13: '0', 14: '.', 15: ' ', 16: 'p', 17: 'm', 18: 'f', 19: 'k', 20: 'o', 21: 'l', 22: 'b', 23: '2', 24: '3', 25: 'v', 26: 'y', 27: 'n', 28: 'u', 29: '5', 30: 'h', 31: 't', 32: '7', 33: 'r', 34: 'i', 35: 'T', 36: 'C', 37: 'w', 38: '[', 39: ']', 40: '8', 41: 'g', 42: '1'}
Inputs to RNN - \(x_1\) to \(x_{25}\) is the input sequence of 25 characters, one character given as input to RNN at each time step
Hidden state of RNN - The state consists of a single ‘hidden’ vector h - At every time step, a recurrence function \(f_W\) with parameters \(W_{xh}\), \(W_{hh}\) and \(b_h\) is applied to the input \(x_t\) and the output from the previous hidden state \(h_{t-1}\), to generate \(h_t\)
$ h_t = f_W (h_{t-1},x_t)$
$ ; ; ; ; = (W_{hh}h_{t-1} + W_{xh}x_t + b_h)$
Outputs of the RNN
- \(\hat y_T\) is the character that our network would predict after \(T=25\) time steps
- At each time step, a \(o_t\) is calculated as
$ o_t = W_{hy}h_t + b_y$
- The softmax of \(o_t\) is the set of probabilities of occurance of each unique character in the input data
$ y_t = (o_t)$
- At each time step, from \(t=1\) to \(25\), loss is calculated from the set of predicted probabilities.
$ L_t = (y_t, y_t)$
$ $ where the \(y_t\) is the next character to the input sequence in the data string
- The total loss is the sum of all the losses from the previously unrolled steps
$ L = ∑_{t=0}^{24}L_t$
All the weights \(W_{xh}\), \(W_{hh}\), \(b_h\), \(W_{hy}\) and \(b_y\) are reused at each time step.
Hyperparameters
- the size of hidden state of neurons
- the sequence length or the time steps to unroll, which is 25 in our case
- optimizer we use here is Adagrad
- the learning rate for Adagrad optimizer
# hyperparameters
= 100 # size of hidden state (number of RNN simple neurons)
hidden_size = 25 # number of time steps to unroll the RNN for, taking 25 previous characters to predict the next
seq_length = 1e-1 learning_rate
Dimensions of tensors
Input: - Each character from the data string is pre-processed before being fed then into the RNN - From each sequence of 25 characters (for 25 time-steps) from the data string, we create an ‘inputs’ list of tokenized integer values - Each character is converted to an integer token index using ‘char_to_ix’ function, which maps each character to a number between 0 and 42 (as there are 43 unique characters in our data) - The integer tokens from the ‘inputs’ list are then one-hot encoded in 1-of-k representations, ie, into vectors of size 43 (k=43 unique characters in our data), which are fed as inputs to the RNN
$ $ => dimension of input \(x_t =\) (43,1)
Targets (\(y_t\)): - For each input in the ‘inputs’ list, we create a ‘target’ list consisting of the subsequent character’s integer token - Our targets list, which is used during the cross-entropy loss calculation, is of length 25
Predicted output: - The predicted outputs are the probabilities of the next characters - Since k=43 unique characters, the unnormalized logits for next chars is
$ $ => dimension of output \(o_t =\) (43,1)
The \(softmax(o_t)\) gives the class probabilities for next characters
The probabilities are then converted into one-hot encoded vectors using \(\arg \max\)
The one-hot encoded vectors are converted into integer tokens and then to a single character using ‘ix_to_char’ function
Hidden layers: - Since we have chosen 100 neurons in the hidden layer,
$ $ => dimension of hidden state \(h_t =\) (100,1)
Model parameters:
- Given the hidden_size=100, input x dimension=(43,1) and output y dimension=(43,1):
$ $ => dimension of \(W_{xh} =\) (100,43),
$ $ => dimension of \(W_{hh} =\) (100,100),
$ $ => dimension of \(b_{h} =\) (100,1),
$ $ => dimension of \(W_{hy} =\) (43,100),
$ $ => dimension of \(b_{y} =\) (43,1)
# model parameters
# we set the initial values of the weights randomly from a normal distribution and set all the bias to zero
= np.random.randn(hidden_size, vocab_size)*0.01 # input to hidden, shape = (hidden_size, vocab_size) = (100,43)
Wxh = np.random.randn(hidden_size, hidden_size)*0.01 # hidden to hidden, shape = (hidden_size, hidden_size) = (100,100)
Whh = np.random.randn(vocab_size, hidden_size)*0.01 # hidden to output, shape = (vocab_size, hidden_size) = (43,100)
Why = np.zeros((hidden_size, 1)) # hidden bias, shape = (hidden_size, 1) = (100,1)
bh = np.zeros((vocab_size, 1)) # output bias, shape = (vocab_size, 1) = (43,1) by
Forward Pass
Forward through entire sequence \(x_1\) to \(x_{25}\) to compute loss
Calculate hidden states at each time step
$ h_t = tanh (W_{hh}h_{t-1} + W_{xh}x_t + b_h)$
- Calculate output \(y_t\)
$ o_t = W_{hy}h_t + b_y$
- The softmax of \(o_t\) is the set of probabilities of occurance of each unique character in the input data
$ y_t = softmax(o_t)$
- Calculate loss at each time step
$ L_t = Cross;Entropy(y_t, y_t)$
- Calculate the total loss, which is the negative log likelihood of our model
$ L = ∑_{t=0}^{24}L_t$
$ ; ; ; ; = - ∑t log ; p{model} (y_t | x_1,…,x_t)$
Backpropogation Through Time
- Backward through entire sequence to compute gradient
- The nodes include parameters \(W_{xh}\), \(W_{hh}\), \(b_h\), \(W_{hy}\) and \(b_y\)
- The inputs and outputs of nodes are \(x_t\), \(h_t\), \(y_t\), \(p_t\) and \(L_t\) at time-step \(t\)
- We’ll use the suffix \((i)\) to indicate the \(i^{th}\) sample
Gradients on the internal nodes:
- We’ll be computing the gradients recursively starting with the nodes immediately preceding the final loss
$ = 1 $
- The gradient with respect to the softmax layer would be:
$ = p_{(i)t} -1 $
- At t=25, the gradient with respect to the hidden layer would be:
$ = {W_{hy}}^T $
- We can now iterate backwards from t=24 down to t=1:
$ = ()^T + ()^T $
$ = (W_{hh})^T ; diag(1-(h_{t+1})^2) + (W_{hy})^T $
$ $ where \(diag\bigg(1-(h_{t+1})^2\bigg)\) indicates the diagonal matrix containing the elements \(1-(h_{(i)t+1})^2\)
Gradients on the parameter nodes:
- Gradients with respect to \(W_{hy}\):
$ = _t _i $
$ = _t (h_t)^T$
- Gradients with respect to \(b_y\):
$ = _t ()^T $
$ = _t $
- Gradients with respect to \(W_{hh}\):
$ = _t _i $
$ = t diag(1-(h{t})^2) (h_{t-1})^T$
- Gradients with respect to \(b_h\):
$ = _t ()^T $
$ = t diag(1-(h{t})^2) $
- Gradients with respect to \(W_{xh}\):
$ = _t _i $
$ = t diag(1-(h{t})^2) (x_t)^T$
def lossFun(inputs, targets, hprev):
"""
inputs,targets are both list of integers.
hprev is Hx1 array of initial hidden state
perform forward and backward pass
returns the loss, gradients on model parameters, and last hidden state
"""
= {}, {}, {}, {}
xs, hs, os, ps -1] = np.copy(hprev)
hs[= 0
loss
# forward pass: compute loss going forward
for t in range(len(inputs)): # looping for t timesteps, which is the size of the length of inputs
= np.zeros((vocab_size,1)) # xs = one-hot encode in 1-of-k representation
xs[t] = 1
xs[t][inputs[t]] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh) # hs_t = tanh(W_hh.hs_t-1 + W_xh.xs_t + b_h) -> hidden state
hs[t] = np.dot(Why, hs[t]) + by # os = W_hy.hs_t + b_y -> unnormalized log probabilities for next chars
os[t] = np.exp(os[t]) / np.sum(np.exp(os[t])) # ps = softmax(os) -> probabilities for next chars
ps[t] += -np.log(ps[t][targets[t],0]) # cross-entropy loss
loss
# backward pass: compute gradients going backwards
= np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why) # create numpy arrays for right size for the weights
dWxh, dWhh, dWhy = np.zeros_like(bh), np.zeros_like(by) # create numpy arrays for right size for the biasses
dbh, dby = np.zeros_like(hs[0]) # h_{t-1} for the first iteration is set to all zeros
dhnext for t in reversed(range(len(inputs))):
= np.copy(ps[t])
dy -= 1 # backprop into y by taking gradient for softmax (http://cs231n.github.io/neural-networks-case-study/#grad)
dy[targets[t]] += np.dot(dy, hs[t].T) # gradient for Why
dWhy += dy # gradient for by
dby = np.dot(Why.T, dy) + dhnext # backprop into h
dh = (1 - hs[t] * hs[t]) * dh # backprop through tanh nonlinearity
dhraw += dhraw # gradient for bh
dbh += np.dot(dhraw, xs[t].T) # gradient for Wxh
dWxh += np.dot(dhraw, hs[t-1].T) # gradient for Whh
dWhh = np.dot(Whh.T, dhraw) # calculate h_t-1 for the next iteration
dhnext for dparam in [dWxh, dWhh, dWhy, dbh, dby]:
-5, 5, out=dparam) # clip gradients to mitigate exploding gradients
np.clip(dparam, return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs)-1]
def sample(h, seed_ix, n):
"""
sample a sequence of integers from the model
h is memory state, seed_ix is seed letter for first time step
predicts probabilities for each character
returns the set of predicted indices with the highest probabilities
"""
# at test-time sample characters one at a time, feed back to model for next character prediction
= np.zeros((vocab_size, 1))
x = 1 # x = one-hot encode the input for seed_ix letter in 1-of-k representation
x[seed_ix] = []
ixes for t in range(n):
# predicting the next character
= np.tanh(np.dot(Wxh, x) + np.dot(Whh, h) + bh) # h_t = tanh(W_hh.h_t-1 + W_xh.x_t + b_h) -> hidden state
h = np.dot(Why, h) + by # y = W_hy.h_t + b_y -> unnormalized log probabilities for next chars
y = np.exp(y) / np.sum(np.exp(y)) # p = softmax(y) -> probabilities for next chars
p = np.random.choice(range(vocab_size), p=p.ravel()) # p.ravel gives the probabilities of each entry, with the maximum ix at argmax
ix = np.zeros((vocab_size, 1))
x = 1 # convert probabilities to one-hot encoded vectors in 1-of-k representation
x[ix]
ixes.append(ix)return ixes # return all the indices to convert them into characters and print the predictions
# p-data pointer, n-iteration counter
= 0, 0 # setting both to zero in the beginning
n, p
# memory variables for Adagrad, initialized to all zeros
= np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why)
mWxh, mWhh, mWhy = np.zeros_like(bh), np.zeros_like(by)
mbh, mby
# loss at time instance 0
= -np.log(1.0/vocab_size)*seq_length
smooth_loss
# while True:
# running for 80000 epochs
for i in range(80000):
# Data pre-processing to prepare inputs and targets
if p+seq_length+1 >= len(data) or n == 0: # sweeping from left to right in steps seq_length=25 long
= np.zeros((hidden_size,1)) # reset RNN memory
hprev = 0 # go from start of data
p = [char_to_ix[ch] for ch in data[p:p+seq_length]] # inputs are tokens each of length seq_length=25
inputs = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]] # targets are the tokens of the subsequent characters for each input sequence
targets
# Model testing
if n % 1000 == 0:
= sample(hprev, inputs[0], 200) # sample from the model and predict characters every 1000 iterations
sample_ix = ''.join(ix_to_char[ix] for ix in sample_ix) # convert tokens into characters and add it to the list of previous predictions
txt print('----\n %s \n----' % (txt, )) # print model predictions
# Model training
= lossFun(inputs, targets, hprev) # forward seq_length characters through the net and fetch gradient
loss, dWxh, dWhh, dWhy, dbh, dby, hprev = smooth_loss * 0.999 + loss * 0.001 # RNN adds all the losses from the previously unrolled steps
smooth_loss if n % 1000 == 0: print('iter %d, loss: %f' % (n, smooth_loss)) # print progress
# parameter update with Adagrad
for param, dparam, mem in zip([Wxh, Whh, Why, bh, by],
[dWxh, dWhh, dWhy, dbh, dby],
[mWxh, mWhh, mWhy, mbh, mby]):+= dparam * dparam
mem += -learning_rate * dparam / np.sqrt(mem + 1e-8) # adagrad parameter update
param
+= seq_length # move data pointer
p += 1 # iteration counter n
----
0yhE4P39uyheem T2lv)gkr2b 2h)2 Pcye]C)uea,mh2b(cd(Ct3qgnP[mibm5wi.(fenaw.8e7b5r5li1P2l]ctEu3i1h( Tt8EdPrE12107Tvwd[)((fPk1]g )om[n 3t37prpl5qg8h 3d)tbgdkt3)e ]PP9Pm3u7ar9[[Cboug7q4v11ss)eg45P8uEp0geqk
----
iter 0, loss: 94.030002
----
hios isntwd antwees, (4,297m seth aed 2.8 (1,.8 ind ang, kmiund celon thes, anlsn and.leng cos ind om rar aad lade aed, Teea nin ite miunlinesce and mThe mannd cind be ong cot, andverrl (3E(1,29t8 e n
----
iter 1000, loss: 67.914614
----
hios is and in ast andaleng the ted aridge orth of the islane lest staa nt rrted or (4219t t r of owe ist or the onof thest ty anda nd as krheEnof 89 br is aid untat, artais, (18.[29 Theed leng phes
----
iter 2000, loss: 37.206784
----
hios island and Epos (1,25uth fovering ff moestaiss, Peling and. (31898 m (3,295 m (3,898 ft)), and. The cende or moftts a ridge of mountaits won (4255 th, and 29 km (18 mi).[2] The sorrennd is ast b
----
iter 3000, loss: 18.941566
----
hios island is crescent or (4,.289 The terrain east are witust, the long fromin the irleng covernd is st, wid in the is knd(4,298 fthe orlan ain ridge of movnnous mgestaiseng the ling and wunt is divi
----
iter 4000, loss: 14.128117
----
hios island is crescent or kidney shaped, 50 km (325.210 sq mi).[2] The terrain is mountainounta f Peleng sheperain is mof Pean ist, aro sides wituasesislaid (1 theg area or kndeseas, are wor kinland
----
iter 5000, loss: 6.929551
----
hios island is cred, 50 km (31 mT) long the length of the island. The center of the island is divided between east and. kndad aris ledeated in the norta of the islath wist by a range of smaller peaks,
----
iter 6000, loss: 4.229557
----
hios islaing and west by a range of sma dib and between east and west ty length of the innd s, The center of the island is divided between east and west by a f west by a range of smaller peaks, known
----
iter 7000, loss: 2.343361
----
hios island is and Efom s, (m,.2k ft)) and ftresnd rertn is th, and 29 km (1, mi) at itn enof mountaend w9 km (18 m (18 mi) at its wnorth of the island. The two largest of these mountains, Pelineon (1
----
iter 8000, loss: 1.702504
----
hios islann or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq moustlis the island. The center of the istcerrain is
----
iter 9000, loss: 1.119716
----
hios islan eisland is divided between east and west by a range of smaller peaks, known as Pr The center of the of the island. The two laets an nd, mi).[2] The terrinna (m,99 mi) at its widest, coverin
----
iter 10000, loss: 0.826397
----
hios island is crescent of the island. The two largest of these mountains, Pelineon and Epos (1,188 m (3,898 ft)), anda frtween enouth rof two (4,255 ft)) and Efte rorth of the cent or kidney c Perin
----
iter 11000, loss: 0.664560
----
hisos known as2 28a of the island Epos (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are side ridge of mountains runnon mi) 8t bt a d, 50 km (31 mi) long from north to south, and 29 km (18 mi) a
----
iter 12000, loss: 0.565176
----
hion ino mi) at island. The center of the island is divided between east and west by a range of smaller peaks, known as Pr douthe con getuth te south, and 29 km (18 mi) at its widest, covering an arra
----
iter 13000, loss: 0.497673
----
hios island is crescent of 842.289 km2 (325.210 sq mi).[2] The mountains, Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the north of the island. The center of the islan
----
iter 14000, loss: 0.447981
----
hios island is crescent or kndeEpraino the island. The two largest of these mountains, Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the north of the island. The two lar
----
iter 15000, loss: 0.409268
----
hios island is crescent or kidney shaped, 50 km (31 mi) long feof sorth of the island. The two largest of these mountains, Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in
----
iter 16000, loss: 0.377955
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 17000, loss: 0.351873
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area island island. The two largest of these mountains, Pelineon (1,297 m
----
iter 18000, loss: 0.329921
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 19000, loss: 0.310971
----
hios island is crescent or kidney shaped, 50 km (31 mi) long f of touthe center of the island is divided between east and west by a range or kinleresh ft of thereathe ente covering an area of 842.289
----
iter 20000, loss: 0.294932
----
hios i(4,.289 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and arreng an area of 842.289 km2 (325. 10 sq mi).[2] The terrain is mountainous
----
iter 21000, loss: 0.280158
----
hios island is crescent or kidney shaped, 50 km (31 mi) long the length of the island. The center of the island is di) ling from north to south, and 29 km (18 mi) at its widest, covering an area of 84
----
iter 22000, loss: 0.267610
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 8425. m4,.[2] The terrain is mountainous and arid, with a ridge o
----
iter 23000, loss: 0.256335
----
hios island is crescent or kidney shaped, kmountains runteins, Pelineon (1,297 m (4,255 ft)) and Eposh at ape lang the length of the island. The center of the island is divided between tast and west b
----
iter 24000, loss: 0.246315
----
hios island is crescent or kidney shaped, 50 km (31 mi) lonn ridge of mountains running the length of the island. The twitunon shunt is ftthe conlerarea of 842.289 km2 (325.210 sq mi).[2] The terrain
----
iter 25000, loss: 0.237299
----
hios island is crescent or kidney shaped, .42.289 km2e ft by a range of smaller peaks, known as Pr ind of mountains running the length of the island. The two largest of these mountains, Pelineon (1,29
----
iter 26000, loss: 0.229051
----
hios ithe island. The two largest of these mountains, Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the north of the island. The two laridge of mountains running the len
----
iter 27000, loss: 0.221695
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 28000, loss: 0.214969
----
hios island is Pe inland is divided between east and west by a range of smaller peaks, known as Prlanin mist its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous
----
iter 29000, loss: 0.208756
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] the om (4,255 ft)) and Epos (m,18
----
iter 30000, loss: 0.203045
----
hios island is crescent or The co1[t its wist by a range of smaller peaks, known as Prinnd sfthe island. The two largest of these mountains, Perineon (1,297 m slano the island. The two largest of thes
----
iter 31000, loss: 0.197785
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 32000, loss: 0.192925
----
hios island is crescent of the island. The center of the island is divided between east and west by a o 25 long frof Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the no
----
iter 33000, loss: 0.188420
----
hios island is creslent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 34000, loss: 0.184230
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 35000, loss: 0.180323
----
hios island is Perth of the island. The center of the island is dividwd a of t)), are situated in the north of the island. The center of the island is divided between east and west by a range of small
----
iter 36000, loss: 0.176668
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 37000, loss: 0.173240
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 38000, loss: 0.170019
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.589 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 39000, loss: 0.166985
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 40000, loss: 0.164122
----
hios island isl rfathese ft)), are situated in the north of the island. The center of the island is divided between east and west by a range of smaller peaks, known as Pr Toe cown is,crescent os (1,29
----
iter 41000, loss: 0.161418
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 42000, loss: 0.158860
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 43000, loss: 0.156438
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terea north row area nouth ro
----
iter 44000, loss: 0.154141
----
hios island is crescent or kidney shaped, 50 km (31 mi) long frof Pelineon (1,297 m (4,255 ft)) and Epos (1,188 m (3,898 ft)), are situated in the north of the island. The two largest iy rea indaind.
----
iter 45000, loss: 0.151962
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area risland. The two largest of these mountains running the length of th
----
iter 46000, loss: 0.149890
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 47000, loss: 0.147919
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 48000, loss: 0.146039
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 49000, loss: 0.144243
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 50000, loss: 0.142517
----
hios island is crescent or kidney shaped, 50 km (31 mi) long from north to south, and 29 km (18 mi) at its widest, covering an area of 842.289 km2 (325.210 sq mi).[2] The terrain is mountainous and ar
----
iter 51000, loss: 0.140829
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) /workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb Cell 17 line 2 <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=25'>26</a> print('----\n %s \n----' % (txt, )) # print model predictions <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=27'>28</a> # Model training ---> <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=28'>29</a> loss, dWxh, dWhh, dWhy, dbh, dby, hprev = lossFun(inputs, targets, hprev) # forward seq_length characters through the net and fetch gradient <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=29'>30</a> smooth_loss = smooth_loss * 0.999 + loss * 0.001 # RNN adds all the losses from the previously unrolled steps <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=30'>31</a> if n % 1000 == 0: print('iter %d, loss: %f' % (n, smooth_loss)) # print progress /workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb Cell 17 line 3 <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=33'>34</a> dWxh += np.dot(dhraw, xs[t].T) # gradient for Wxh <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=34'>35</a> dWhh += np.dot(dhraw, hs[t-1].T) # gradient for Whh ---> <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=35'>36</a> dhnext = np.dot(Whh.T, dhraw) # calculate h_t-1 for the next iteration <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=36'>37</a> for dparam in [dWxh, dWhh, dWhy, dbh, dby]: <a href='vscode-notebook-cell://dev-container%2B7b22686f737450617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e6365222c226c6f63616c446f636b6572223a66616c73652c2273657474696e6773223a7b22686f7374223a22756e69783a2f2f2f7661722f72756e2f646f636b65722e736f636b227d2c22636f6e66696746696c65223a7b22246d6964223a312c22667350617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2265787465726e616c223a2266696c653a2f2f2f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c2270617468223a222f686f6d652f70616e74656c69732e6d6f6e6f67696f756469732f6c6f63616c2f7765622f73697465732f636f75727365732f6172746966696369616c5f696e74656c6c6967656e63652f2e646576636f6e7461696e65722f646576636f6e7461696e65722e6a736f6e222c22736368656d65223a2266696c65227d7d/workspaces/artificial_intelligence/artificial_intelligence/aiml-common/lectures/nlp/language-models/simple-rnn-language-model/index.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=37'>38</a> np.clip(dparam, -5, 5, out=dparam) # clip gradients to mitigate exploding gradients File <__array_function__ internals>:180, in dot(*args, **kwargs) KeyboardInterrupt: