Skip to main content

TensorFlow.org

View tutorial

Colab

Run in Google Colab

GitHub

View source

Download

Download notebook
word2vec is not a singular algorithm, rather, it is a family of model architectures and optimizations that can be used to learn word embeddings from large datasets. Embeddings learned through word2vec have proven to be successful on a variety of downstream natural language processing tasks. Note: This tutorial is based on Efficient estimation of word representations in vector space and Distributed representations of words and phrases and their compositionality. It is not an exact implementation of the papers. Rather, it is intended to illustrate the key ideas. These papers proposed two methods for learning representations of words:
  • Continuous bag-of-words model: predicts the middle word based on surrounding context words. The context consists of a few words before and after the current (middle) word. This architecture is called a bag-of-words model as the order of words in the context is not important.
  • Continuous skip-gram model: predicts words within a certain range before and after the current word in the same sentence. A worked example of this is given below.
You’ll use the skip-gram approach in this tutorial. First, you’ll explore skip-grams and other concepts using a single sentence for illustration. Next, you’ll train your own word2vec model on a small dataset. This tutorial also contains code to export the trained embeddings and visualize them in the TensorFlow Embedding Projector.

Skip-gram and negative sampling

While a bag-of-words model predicts a word given the neighboring context, a skip-gram model predicts the context (or neighbors) of a word, given the word itself. The model is trained on skip-grams, which are n-grams that allow tokens to be skipped (see the diagram below for an example). The context of a word can be represented through a set of skip-gram pairs of (target_word, context_word) where context_word appears in the neighboring context of target_word. Consider the following sentence of eight words:
The wide road shimmered in the hot sun.
The context words for each of the 8 words of this sentence are defined by a window size. The window size determines the span of words on either side of a target_word that can be considered a context word. Below is a table of skip-grams for target words based on different window sizes. Note: For this tutorial, a window size of n implies n words on each side with a total window span of 2*n+1 words across a word. word2vec_skipgrams The training objective of the skip-gram model is to maximize the probability of predicting context words given the target word. For a sequence of words w₁, w₂, … wₜ, the objective can be written as the average log probability word2vec_skipgram_objective where c is the size of the training context. The basic skip-gram formulation defines this probability using the softmax function. word2vec_full_softmax where v and v’ are target and context vector representations of words and W is vocabulary size. Computing the denominator of this formulation involves performing a full softmax over the entire vocabulary words, which are often large (10⁵-10⁷) terms. The noise contrastive estimation (NCE) loss function is an efficient approximation for a full softmax. With an objective to learn word embeddings instead of modeling the word distribution, the NCE loss can be simplified to use negative sampling. The simplified negative sampling objective for a target word is to distinguish the context word from num_ns negative samples drawn from noise distribution Pₙ(w) of words. More precisely, an efficient approximation of full softmax over the vocabulary is, for a skip-gram pair, to pose the loss for a target word as a classification problem between the context word and num_ns negative samples. A negative sample is defined as a (target_word, context_word) pair such that the context_word does not appear in the window_size neighborhood of the target_word. For the example sentence, these are a few potential negative samples (when window_size is 2).
(hot, shimmered)
(wide, hot)
(wide, sun)
In the next section, you’ll generate skip-grams and negative samples for a single sentence. You’ll also learn about subsampling techniques and train a classification model for positive and negative training examples later in the tutorial.

Setup

import io
import re
import string
import tqdm

import numpy as np

import tensorflow as tf
from tensorflow.keras import layers
2023-04-06 14:03:17.815941: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-04-06 14:03:17.845916: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-06 14:03:19.961250: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
# Load the TensorBoard notebook extension
%load_ext tensorboard
SEED = 42
AUTOTUNE = tf.data.AUTOTUNE

Vectorize an example sentence

Consider the following sentence:
The wide road shimmered in the hot sun.
Tokenize the sentence:
sentence = "The wide road shimmered in the hot sun"
tokens = list(sentence.lower().split())
print(len(tokens))
8
Create a vocabulary to save mappings from tokens to integer indices:
vocab, index = {}, 1  # start indexing from 1
vocab["<pad>"] = 0  # add a padding token
for token in tokens:
    if token not in vocab:
        vocab[token] = index
        index += 1
vocab_size = len(vocab)
print(vocab)
{'<pad>': 0, 'the': 1, 'wide': 2, 'road': 3, 'shimmered': 4, 'in': 5, 'hot': 6, 'sun': 7}
Create an inverse vocabulary to save mappings from integer indices to tokens:
inverse_vocab = {index: token for token, index in vocab.items()}
print(inverse_vocab)
{0: '<pad>', 1: 'the', 2: 'wide', 3: 'road', 4: 'shimmered', 5: 'in', 6: 'hot', 7: 'sun'}
Vectorize your sentence:
example_sequence = [vocab[word] for word in tokens]
print(example_sequence)
[1, 2, 3, 4, 5, 1, 6, 7]

Generate skip-grams from one sentence

The tf.keras.preprocessing.sequence module provides useful functions that simplify data preparation for word2vec. You can use the tf.keras.preprocessing.sequence.skipgrams to generate skip-gram pairs from the example_sequence with a given window_size from tokens in the range [0, vocab_size). Note: negative_samples is set to 0 here, as batching negative samples generated by this function requires a bit of code. You will use another function to perform negative sampling in the next section.
window_size = 2
positive_skip_grams, _ = tf.keras.preprocessing.sequence.skipgrams(
    example_sequence,
    vocabulary_size=vocab_size,
    window_size=window_size,
    negative_samples=0,
)
print(len(positive_skip_grams))
26
Print a few positive skip-grams:
for target, context in positive_skip_grams[:5]:
    print(f"({target}, {context}): ({inverse_vocab[target]}, {inverse_vocab[context]})")
(3, 4): (road, shimmered)
(5, 1): (in, the)
(2, 1): (wide, the)
(5, 3): (in, road)
(4, 2): (shimmered, wide)

Negative sampling for one skip-gram

The skipgrams function returns all positive skip-gram pairs by sliding over a given window span. To produce additional skip-gram pairs that would serve as negative samples for training, you need to sample random words from the vocabulary. Use the tf.random.log_uniform_candidate_sampler function to sample num_ns number of negative samples for a given target word in a window. You can call the function on one skip-grams’s target word and pass the context word as true class to exclude it from being sampled. Key point: num_ns (the number of negative samples per a positive context word) in the [5, 20] range is shown to work best for smaller datasets, while num_ns in the [2, 5] range suffices for larger datasets.
# Get target and context words for one positive skip-gram.
target_word, context_word = positive_skip_grams[0]

# Set the number of negative samples per positive context.
num_ns = 4

context_class = tf.reshape(tf.constant(context_word, dtype="int64"), (1, 1))
negative_sampling_candidates, _, _ = tf.random.log_uniform_candidate_sampler(
    true_classes=context_class,  # class that should be sampled as 'positive'
    num_true=1,  # each positive skip-gram has 1 positive context class
    num_sampled=num_ns,  # number of negative context words to sample
    unique=True,  # all the negative samples should be unique
    range_max=vocab_size,  # pick index of the samples from [0, vocab_size]
    seed=SEED,  # seed for reproducibility
    name="negative_sampling",  # name of this operation
)
print(negative_sampling_candidates)
print([inverse_vocab[index.numpy()] for index in negative_sampling_candidates])
tf.Tensor([2 1 4 3], shape=(4,), dtype=int64)
['wide', 'the', 'shimmered', 'road']

Construct one training example

For a given positive (target_word, context_word) skip-gram, you now also have num_ns negative sampled context words that do not appear in the window size neighborhood of target_word. Batch the 1 positive context_word and num_ns negative context words into one tensor. This produces a set of positive skip-grams (labeled as 1) and negative samples (labeled as 0) for each target word.
# Reduce a dimension so you can use concatenation (in the next step).
squeezed_context_class = tf.squeeze(context_class, 1)

# Concatenate a positive context word with negative sampled words.
context = tf.concat([squeezed_context_class, negative_sampling_candidates], 0)

# Label the first context word as `1` (positive) followed by `num_ns` `0`s (negative).
label = tf.constant([1] + [0] * num_ns, dtype="int64")
target = target_word
Check out the context and the corresponding labels for the target word from the skip-gram example above:
print(f"target_index    : {target}")
print(f"target_word     : {inverse_vocab[target_word]}")
print(f"context_indices : {context}")
print(f"context_words   : {[inverse_vocab[c.numpy()] for c in context]}")
print(f"label           : {label}")
target_index    : 3
target_word     : road
context_indices : [4 2 1 4 3]
context_words   : ['shimmered', 'wide', 'the', 'shimmered', 'road']
label           : [1 0 0 0 0]
A tuple of (target, context, label) tensors constitutes one training example for training your skip-gram negative sampling word2vec model. Notice that the target is of shape (1,) while the context and label are of shape (1+num_ns,)
print("target  :", target)
print("context :", context)
print("label   :", label)
target  : 3
context : tf.Tensor([4 2 1 4 3], shape=(5,), dtype=int64)
label   : tf.Tensor([1 0 0 0 0], shape=(5,), dtype=int64)

Summary

This diagram summarizes the procedure of generating a training example from a sentence: word2vec_negative_sampling Notice that the words temperature and code are not part of the input sentence. They belong to the vocabulary like certain other indices used in the diagram above.

Compile all steps into one function

Skip-gram sampling table

A large dataset means larger vocabulary with higher number of more frequent words such as stopwords. Training examples obtained from sampling commonly occurring words (such as the, is, on) don’t add much useful information for the model to learn from. Mikolov et al. suggest subsampling of frequent words as a helpful practice to improve embedding quality. The tf.keras.preprocessing.sequence.skipgrams function accepts a sampling table argument to encode probabilities of sampling any token. You can use the tf.keras.preprocessing.sequence.make_sampling_table to generate a word-frequency rank based probabilistic sampling table and pass it to the skipgrams function. Inspect the sampling probabilities for a vocab_size of 10.
sampling_table = tf.keras.preprocessing.sequence.make_sampling_table(size=10)
print(sampling_table)
[0.00315225 0.00315225 0.00547597 0.00741556 0.00912817 0.01068435
 0.01212381 0.01347162 0.01474487 0.0159558 ]
sampling_table[i] denotes the probability of sampling the i-th most common word in a dataset. The function assumes a Zipf’s distribution of the word frequencies for sampling. Key point: The tf.random.log_uniform_candidate_sampler already assumes that the vocabulary frequency follows a log-uniform (Zipf’s) distribution. Using these distribution weighted sampling also helps approximate the Noise Contrastive Estimation (NCE) loss with simpler loss functions for training a negative sampling objective.

Generate training data

Compile all the steps described above into a function that can be called on a list of vectorized sentences obtained from any text dataset. Notice that the sampling table is built before sampling skip-gram word pairs. You will use this function in the later sections.
# Generates skip-gram pairs with negative sampling for a list of sequences
# (int-encoded sentences) based on window size, number of negative samples
# and vocabulary size.
def generate_training_data(sequences, window_size, num_ns, vocab_size, seed):
    # Elements of each training example are appended to these lists.
    targets, contexts, labels = [], [], []

    # Build the sampling table for `vocab_size` tokens.
    sampling_table = tf.keras.preprocessing.sequence.make_sampling_table(vocab_size)

    # Iterate over all sequences (sentences) in the dataset.
    for sequence in tqdm.tqdm(sequences):
        # Generate positive skip-gram pairs for a sequence (sentence).
        positive_skip_grams, _ = tf.keras.preprocessing.sequence.skipgrams(
            sequence,
            vocabulary_size=vocab_size,
            sampling_table=sampling_table,
            window_size=window_size,
            negative_samples=0,
        )

        # Iterate over each positive skip-gram pair to produce training examples
        # with a positive context word and negative samples.
        for target_word, context_word in positive_skip_grams:
            context_class = tf.expand_dims(
                tf.constant([context_word], dtype="int64"), 1
            )
            negative_sampling_candidates, _, _ = (
                tf.random.log_uniform_candidate_sampler(
                    true_classes=context_class,
                    num_true=1,
                    num_sampled=num_ns,
                    unique=True,
                    range_max=vocab_size,
                    seed=seed,
                    name="negative_sampling",
                )
            )

            # Build context and label vectors (for one target word)
            context = tf.concat(
                [tf.squeeze(context_class, 1), negative_sampling_candidates], 0
            )
            label = tf.constant([1] + [0] * num_ns, dtype="int64")

            # Append each element from the training example to global lists.
            targets.append(target_word)
            contexts.append(context)
            labels.append(label)

    return targets, contexts, labels

Prepare training data for word2vec

With an understanding of how to work with one sentence for a skip-gram negative sampling based word2vec model, you can proceed to generate training examples from a larger list of sentences!

Download text corpus

You will use a text file of Shakespeare’s writing for this tutorial. Change the following line to run this code on your own data.
path_to_file = tf.keras.utils.get_file(
    "shakespeare.txt",
    "https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt",
)
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt

   8192/1115394 [..............................] - ETA: 0s

1115394/1115394 [==============================] - 0s 0us/step
Read the text from the file and print the first few lines:
with open(path_to_file) as f:
    lines = f.read().splitlines()
for line in lines[:20]:
    print(line)
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Use the non empty lines to construct a tf.data.TextLineDataset object for the next steps:
text_ds = tf.data.TextLineDataset(path_to_file).filter(
    lambda x: tf.cast(tf.strings.length(x), bool)
)
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089

Vectorize sentences from the corpus

You can use the TextVectorization layer to vectorize sentences from the corpus. Learn more about using this layer in this Text classification tutorial. Notice from the first few sentences above that the text needs to be in one case and punctuation needs to be removed. To do this, define a custom_standardization function that can be used in the TextVectorization layer.
# Now, create a custom standardization function to lowercase the text and
# remove punctuation.
def custom_standardization(input_data):
    lowercase = tf.strings.lower(input_data)
    return tf.strings.regex_replace(
        lowercase, "[%s]" % re.escape(string.punctuation), ""
    )


# Define the vocabulary size and the number of words in a sequence.
vocab_size = 4096
sequence_length = 10

# Use the `TextVectorization` layer to normalize, split, and map strings to
# integers. Set the `output_sequence_length` length to pad all samples to the
# same length.
vectorize_layer = layers.TextVectorization(
    standardize=custom_standardization,
    max_tokens=vocab_size,
    output_mode="int",
    output_sequence_length=sequence_length,
)
Call TextVectorization.adapt on the text dataset to create vocabulary.
vectorize_layer.adapt(text_ds.batch(1024))
Once the state of the layer has been adapted to represent the text corpus, the vocabulary can be accessed with TextVectorization.get_vocabulary. This function returns a list of all vocabulary tokens sorted (descending) by their frequency.
# Save the created vocabulary for reference.
inverse_vocab = vectorize_layer.get_vocabulary()
print(inverse_vocab[:20])
['', '[UNK]', 'the', 'and', 'to', 'i', 'of', 'you', 'my', 'a', 'that', 'in', 'is', 'not', 'for', 'with', 'me', 'it', 'be', 'your']
The vectorize_layer can now be used to generate vectors for each element in the text_ds (a tf.data.Dataset). Apply Dataset.batch, Dataset.prefetch, Dataset.map, and Dataset.unbatch.
# Vectorize the data in text_ds.
text_vector_ds = text_ds.batch(1024).prefetch(AUTOTUNE).map(vectorize_layer).unbatch()

Obtain sequences from the dataset

You now have a tf.data.Dataset of integer encoded sentences. To prepare the dataset for training a word2vec model, flatten the dataset into a list of sentence vector sequences. This step is required as you would iterate over each sentence in the dataset to produce positive and negative examples. Note: Since the generate_training_data() defined earlier uses non-TensorFlow Python/NumPy functions, you could also use a tf.py_function or tf.numpy_function with tf.data.Dataset.map.
sequences = list(text_vector_ds.as_numpy_iterator())
print(len(sequences))
32777
Inspect a few examples from sequences:
for seq in sequences[:5]:
    print(f"{seq} => {[inverse_vocab[i] for i in seq]}")
[ 89 270   0   0   0   0   0   0   0   0] => ['first', 'citizen', '', '', '', '', '', '', '', '']
[138  36 982 144 673 125  16 106   0   0] => ['before', 'we', 'proceed', 'any', 'further', 'hear', 'me', 'speak', '', '']
[34  0  0  0  0  0  0  0  0  0] => ['all', '', '', '', '', '', '', '', '', '']
[106 106   0   0   0   0   0   0   0   0] => ['speak', 'speak', '', '', '', '', '', '', '', '']
[ 89 270   0   0   0   0   0   0   0   0] => ['first', 'citizen', '', '', '', '', '', '', '', '']

Generate training examples from sequences

sequences is now a list of int encoded sentences. Just call the generate_training_data function defined earlier to generate training examples for the word2vec model. To recap, the function iterates over each word from each sequence to collect positive and negative context words. Length of target, contexts and labels should be the same, representing the total number of training examples.
targets, contexts, labels = generate_training_data(
    sequences=sequences, window_size=2, num_ns=4, vocab_size=vocab_size, seed=SEED
)

targets = np.array(targets)
contexts = np.array(contexts)
labels = np.array(labels)

print("\n")
print(f"targets.shape: {targets.shape}")
print(f"contexts.shape: {contexts.shape}")
print(f"labels.shape: {labels.shape}")

  0%|          | 0/32777 [00:00<?, ?it/s]

  0%|          | 76/32777 [00:00<00:43, 746.29it/s]

  0%|          | 151/32777 [00:00<00:48, 667.00it/s]

  1%|          | 219/32777 [00:00<00:50, 641.54it/s]

  1%|          | 284/32777 [00:00<00:53, 602.98it/s]

  1%|          | 389/32777 [00:00<00:43, 748.18it/s]

  1%|▏         | 466/32777 [00:00<00:43, 740.29it/s]

  2%|▏         | 541/32777 [00:00<00:45, 707.15it/s]

  2%|▏         | 643/32777 [00:00<00:40, 792.21it/s]

  2%|▏         | 726/32777 [00:00<00:39, 802.43it/s]

  2%|▏         | 809/32777 [00:01<00:40, 798.58it/s]

  3%|▎         | 890/32777 [00:01<00:41, 760.61it/s]

  3%|▎         | 987/32777 [00:01<00:39, 814.72it/s]

  3%|▎         | 1073/32777 [00:01<00:38, 827.40it/s]

  4%|▎         | 1157/32777 [00:01<00:38, 819.62it/s]

  4%|▍         | 1240/32777 [00:01<00:41, 768.41it/s]

  4%|▍         | 1333/32777 [00:01<00:38, 810.59it/s]

  4%|▍         | 1418/32777 [00:01<00:38, 821.79it/s]

  5%|▍         | 1501/32777 [00:01<00:43, 722.41it/s]

  5%|▍         | 1580/32777 [00:02<00:42, 731.83it/s]

  5%|▌         | 1656/32777 [00:02<00:46, 671.79it/s]

  5%|▌         | 1726/32777 [00:02<00:48, 640.83it/s]

  6%|▌         | 1813/32777 [00:02<00:44, 699.59it/s]

  6%|▌         | 1909/32777 [00:02<00:40, 764.62it/s]

  6%|▌         | 1988/32777 [00:02<00:41, 750.38it/s]

  6%|▋         | 2072/32777 [00:02<00:39, 772.94it/s]

  7%|▋         | 2180/32777 [00:02<00:35, 855.90it/s]

  7%|▋         | 2267/32777 [00:02<00:36, 842.39it/s]

  7%|▋         | 2353/32777 [00:03<00:36, 826.71it/s]

  8%|▊         | 2474/32777 [00:03<00:32, 932.43it/s]

  8%|▊         | 2569/32777 [00:03<00:32, 917.96it/s]

  8%|▊         | 2662/32777 [00:03<00:34, 863.80it/s]

  8%|▊         | 2750/32777 [00:03<00:34, 862.46it/s]

  9%|▊         | 2837/32777 [00:03<00:35, 831.67it/s]

  9%|▉         | 2931/32777 [00:03<00:34, 855.84it/s]

  9%|▉         | 3022/32777 [00:03<00:34, 871.14it/s]

  9%|▉         | 3110/32777 [00:03<00:38, 778.60it/s]

 10%|▉         | 3210/32777 [00:04<00:35, 833.64it/s]

 10%|█         | 3296/32777 [00:04<00:37, 788.15it/s]

 10%|█         | 3414/32777 [00:04<00:32, 893.25it/s]

 11%|█         | 3506/32777 [00:04<00:35, 829.71it/s]

 11%|█         | 3592/32777 [00:04<00:35, 812.04it/s]

 11%|█         | 3675/32777 [00:04<00:39, 743.73it/s]

 12%|█▏        | 3783/32777 [00:04<00:34, 829.78it/s]

 12%|█▏        | 3880/32777 [00:04<00:33, 856.07it/s]

 12%|█▏        | 3968/32777 [00:04<00:34, 831.66it/s]

 12%|█▏        | 4053/32777 [00:05<00:35, 818.04it/s]

 13%|█▎        | 4142/32777 [00:05<00:34, 831.89it/s]

 13%|█▎        | 4226/32777 [00:05<00:39, 714.17it/s]

 13%|█▎        | 4301/32777 [00:05<00:41, 691.88it/s]

 13%|█▎        | 4374/32777 [00:05<00:40, 699.99it/s]

 14%|█▎        | 4446/32777 [00:05<00:44, 634.07it/s]

 14%|█▍        | 4512/32777 [00:05<00:47, 588.97it/s]

 14%|█▍        | 4590/32777 [00:05<00:44, 633.32it/s]

 14%|█▍        | 4681/32777 [00:06<00:39, 704.07it/s]

 15%|█▍        | 4770/32777 [00:06<00:37, 753.46it/s]

 15%|█▍        | 4848/32777 [00:06<00:38, 729.11it/s]

 15%|█▌        | 4923/32777 [00:06<00:40, 680.67it/s]

 15%|█▌        | 4997/32777 [00:06<00:40, 694.14it/s]

 15%|█▌        | 5068/32777 [00:06<00:39, 697.79it/s]

 16%|█▌        | 5139/32777 [00:06<00:43, 638.16it/s]

 16%|█▌        | 5217/32777 [00:06<00:40, 672.88it/s]

 16%|█▌        | 5286/32777 [00:06<00:41, 654.61it/s]

 16%|█▋        | 5386/32777 [00:07<00:36, 749.61it/s]

 17%|█▋        | 5463/32777 [00:07<00:37, 727.80it/s]

 17%|█▋        | 5537/32777 [00:07<00:41, 663.01it/s]

 17%|█▋        | 5629/32777 [00:07<00:37, 729.85it/s]

 17%|█▋        | 5704/32777 [00:07<00:40, 674.43it/s]

 18%|█▊        | 5789/32777 [00:07<00:37, 716.05it/s]

 18%|█▊        | 5869/32777 [00:07<00:36, 735.60it/s]

 18%|█▊        | 5945/32777 [00:07<00:42, 629.73it/s]

 18%|█▊        | 6037/32777 [00:08<00:38, 702.88it/s]

 19%|█▊        | 6113/32777 [00:08<00:37, 716.17it/s]

 19%|█▉        | 6188/32777 [00:08<00:40, 660.06it/s]

 19%|█▉        | 6264/32777 [00:08<00:38, 683.12it/s]

 19%|█▉        | 6340/32777 [00:08<00:37, 703.95it/s]

 20%|█▉        | 6413/32777 [00:08<00:38, 684.20it/s]

 20%|█▉        | 6483/32777 [00:08<00:38, 676.22it/s]

 20%|██        | 6560/32777 [00:08<00:37, 694.19it/s]

 20%|██        | 6631/32777 [00:08<00:41, 636.95it/s]

 20%|██        | 6705/32777 [00:09<00:39, 657.65it/s]

 21%|██        | 6781/32777 [00:09<00:37, 684.25it/s]

 21%|██        | 6851/32777 [00:09<00:39, 662.56it/s]

 21%|██        | 6918/32777 [00:09<00:39, 656.41it/s]

 21%|██▏       | 7028/32777 [00:09<00:32, 781.69it/s]

 22%|██▏       | 7108/32777 [00:09<00:32, 786.00it/s]

 22%|██▏       | 7196/32777 [00:09<00:31, 812.96it/s]

 22%|██▏       | 7278/32777 [00:09<00:31, 803.22it/s]

 22%|██▏       | 7359/32777 [00:09<00:33, 755.75it/s]

 23%|██▎       | 7436/32777 [00:09<00:34, 744.04it/s]

 23%|██▎       | 7524/32777 [00:10<00:32, 771.68it/s]

 23%|██▎       | 7602/32777 [00:10<00:34, 729.47it/s]

 23%|██▎       | 7676/32777 [00:10<00:34, 729.30it/s]

 24%|██▎       | 7750/32777 [00:10<00:34, 732.27it/s]

 24%|██▍       | 7824/32777 [00:10<00:35, 705.62it/s]

 24%|██▍       | 7902/32777 [00:10<00:34, 721.96it/s]

 24%|██▍       | 7975/32777 [00:10<00:36, 672.43it/s]

 25%|██▍       | 8044/32777 [00:10<00:39, 632.17it/s]

 25%|██▍       | 8125/32777 [00:10<00:36, 677.77it/s]

 25%|██▌       | 8212/32777 [00:11<00:33, 730.36it/s]

 25%|██▌       | 8287/32777 [00:11<00:36, 663.58it/s]

 25%|██▌       | 8356/32777 [00:11<00:40, 601.32it/s]

 26%|██▌       | 8419/32777 [00:11<00:45, 530.31it/s]

 26%|██▌       | 8489/32777 [00:11<00:42, 569.22it/s]

 26%|██▌       | 8567/32777 [00:11<00:39, 613.06it/s]

 26%|██▋       | 8647/32777 [00:11<00:36, 661.89it/s]

 27%|██▋       | 8716/32777 [00:11<00:36, 666.47it/s]

 27%|██▋       | 8792/32777 [00:12<00:34, 690.73it/s]

 27%|██▋       | 8876/32777 [00:12<00:32, 730.57it/s]

 27%|██▋       | 8956/32777 [00:12<00:31, 750.22it/s]

 28%|██▊       | 9032/32777 [00:12<00:33, 700.29it/s]

 28%|██▊       | 9104/32777 [00:12<00:36, 656.35it/s]

 28%|██▊       | 9175/32777 [00:12<00:35, 664.23it/s]

 28%|██▊       | 9243/32777 [00:12<00:35, 664.95it/s]

 28%|██▊       | 9311/32777 [00:12<00:36, 650.59it/s]

 29%|██▊       | 9377/32777 [00:12<00:38, 607.40it/s]

 29%|██▉       | 9446/32777 [00:13<00:37, 620.05it/s]

 29%|██▉       | 9509/32777 [00:13<00:37, 621.07it/s]

 29%|██▉       | 9572/32777 [00:13<00:40, 569.51it/s]

 29%|██▉       | 9641/32777 [00:13<00:38, 594.32it/s]

 30%|██▉       | 9702/32777 [00:13<00:38, 594.59it/s]

 30%|██▉       | 9763/32777 [00:13<00:42, 544.53it/s]

 30%|██▉       | 9820/32777 [00:13<00:41, 550.20it/s]

 30%|███       | 9879/32777 [00:13<00:40, 560.98it/s]

 30%|███       | 9936/32777 [00:13<00:41, 549.91it/s]

 30%|███       | 9992/32777 [00:14<00:42, 538.50it/s]

 31%|███       | 10047/32777 [00:14<00:42, 535.79it/s]

 31%|███       | 10101/32777 [00:14<00:43, 525.42it/s]

 31%|███       | 10155/32777 [00:14<00:43, 525.13it/s]

 31%|███       | 10220/32777 [00:14<00:40, 556.43it/s]

 31%|███▏      | 10280/32777 [00:14<00:39, 566.81it/s]

 32%|███▏      | 10337/32777 [00:14<00:44, 500.35it/s]

 32%|███▏      | 10400/32777 [00:14<00:42, 528.81it/s]

 32%|███▏      | 10467/32777 [00:14<00:39, 564.03it/s]

 32%|███▏      | 10528/32777 [00:14<00:38, 576.87it/s]

 32%|███▏      | 10600/32777 [00:15<00:35, 617.30it/s]

 33%|███▎      | 10663/32777 [00:15<00:35, 619.90it/s]

 33%|███▎      | 10731/32777 [00:15<00:34, 630.75it/s]

 33%|███▎      | 10800/32777 [00:15<00:33, 647.57it/s]

 33%|███▎      | 10866/32777 [00:15<00:34, 640.35it/s]

 33%|███▎      | 10947/32777 [00:15<00:31, 689.58it/s]

 34%|███▎      | 11017/32777 [00:15<00:32, 679.48it/s]

 34%|███▍      | 11086/32777 [00:15<00:36, 595.53it/s]

 34%|███▍      | 11148/32777 [00:15<00:37, 569.63it/s]

 34%|███▍      | 11212/32777 [00:16<00:36, 587.69it/s]

 34%|███▍      | 11273/32777 [00:16<00:36, 581.72it/s]

 35%|███▍      | 11333/32777 [00:16<00:39, 547.72it/s]

 35%|███▍      | 11411/32777 [00:16<00:35, 609.34it/s]

 35%|███▌      | 11474/32777 [00:16<00:35, 592.01it/s]

 35%|███▌      | 11539/32777 [00:16<00:35, 605.03it/s]

 35%|███▌      | 11617/32777 [00:16<00:32, 647.12it/s]

 36%|███▌      | 11685/32777 [00:16<00:32, 653.54it/s]

 36%|███▌      | 11751/32777 [00:16<00:34, 605.07it/s]

 36%|███▌      | 11813/32777 [00:17<00:36, 578.19it/s]

 36%|███▋      | 11895/32777 [00:17<00:32, 642.23it/s]

 36%|███▋      | 11961/32777 [00:17<00:32, 645.17it/s]

 37%|███▋      | 12027/32777 [00:17<00:35, 580.42it/s]

 37%|███▋      | 12088/32777 [00:17<00:35, 580.40it/s]

 37%|███▋      | 12155/32777 [00:17<00:34, 600.89it/s]

 37%|███▋      | 12217/32777 [00:17<00:34, 591.25it/s]

 37%|███▋      | 12279/32777 [00:17<00:34, 596.29it/s]

 38%|███▊      | 12361/32777 [00:17<00:31, 657.12it/s]

 38%|███▊      | 12450/32777 [00:18<00:28, 719.08it/s]

 38%|███▊      | 12523/32777 [00:18<00:28, 710.14it/s]

 38%|███▊      | 12595/32777 [00:18<00:29, 686.28it/s]

 39%|███▊      | 12671/32777 [00:18<00:28, 705.19it/s]

 39%|███▉      | 12742/32777 [00:18<00:31, 645.54it/s]

 39%|███▉      | 12808/32777 [00:18<00:32, 616.24it/s]

 39%|███▉      | 12892/32777 [00:18<00:29, 670.30it/s]

 40%|███▉      | 12985/32777 [00:18<00:26, 740.51it/s]

 40%|███▉      | 13061/32777 [00:18<00:28, 695.38it/s]

 40%|████      | 13146/32777 [00:19<00:26, 729.60it/s]

 40%|████      | 13221/32777 [00:19<00:29, 672.83it/s]

 41%|████      | 13301/32777 [00:19<00:27, 702.82it/s]

 41%|████      | 13373/32777 [00:19<00:27, 696.23it/s]

 41%|████      | 13444/32777 [00:19<00:29, 646.38it/s]

 41%|████      | 13510/32777 [00:19<00:29, 649.25it/s]

 41%|████▏     | 13588/32777 [00:19<00:28, 679.09it/s]

 42%|████▏     | 13670/32777 [00:19<00:26, 712.20it/s]

 42%|████▏     | 13742/32777 [00:19<00:26, 713.09it/s]

 42%|████▏     | 13825/32777 [00:20<00:25, 745.54it/s]

 42%|████▏     | 13900/32777 [00:20<00:25, 729.51it/s]

 43%|████▎     | 13979/32777 [00:20<00:25, 741.02it/s]

 43%|████▎     | 14054/32777 [00:20<00:26, 718.15it/s]

 43%|████▎     | 14140/32777 [00:20<00:24, 755.77it/s]

 43%|████▎     | 14216/32777 [00:20<00:28, 649.60it/s]

 44%|████▎     | 14287/32777 [00:20<00:28, 658.12it/s]

 44%|████▍     | 14366/32777 [00:20<00:26, 686.84it/s]

 44%|████▍     | 14464/32777 [00:20<00:23, 765.88it/s]

 44%|████▍     | 14543/32777 [00:21<00:24, 744.26it/s]

 45%|████▍     | 14619/32777 [00:21<00:25, 718.66it/s]

 45%|████▍     | 14694/32777 [00:21<00:24, 727.12it/s]

 45%|████▌     | 14785/32777 [00:21<00:23, 778.88it/s]

 45%|████▌     | 14864/32777 [00:21<00:23, 754.86it/s]

 46%|████▌     | 14941/32777 [00:21<00:26, 669.72it/s]

 46%|████▌     | 15011/32777 [00:21<00:30, 585.70it/s]

 46%|████▌     | 15080/32777 [00:21<00:28, 610.47it/s]

 46%|████▌     | 15148/32777 [00:21<00:28, 624.39it/s]

 46%|████▋     | 15216/32777 [00:22<00:27, 637.86it/s]

 47%|████▋     | 15282/32777 [00:22<00:28, 622.96it/s]

 47%|████▋     | 15346/32777 [00:22<00:29, 592.49it/s]

 47%|████▋     | 15410/32777 [00:22<00:28, 604.30it/s]

 47%|████▋     | 15473/32777 [00:22<00:28, 608.73it/s]

 47%|████▋     | 15552/32777 [00:22<00:26, 657.74it/s]

 48%|████▊     | 15636/32777 [00:22<00:24, 706.78it/s]

 48%|████▊     | 15708/32777 [00:22<00:24, 710.27it/s]

 48%|████▊     | 15780/32777 [00:22<00:24, 701.26it/s]

 48%|████▊     | 15851/32777 [00:23<00:27, 623.74it/s]

 49%|████▊     | 15943/32777 [00:23<00:23, 703.26it/s]

 49%|████▉     | 16016/32777 [00:23<00:26, 637.17it/s]

 49%|████▉     | 16092/32777 [00:23<00:24, 668.54it/s]

 49%|████▉     | 16161/32777 [00:23<00:25, 643.43it/s]

 50%|████▉     | 16250/32777 [00:23<00:23, 709.53it/s]

 50%|████▉     | 16329/32777 [00:23<00:22, 729.11it/s]

 50%|█████     | 16404/32777 [00:23<00:25, 647.50it/s]

 50%|█████     | 16475/32777 [00:23<00:24, 660.86it/s]

 51%|█████     | 16557/32777 [00:24<00:23, 700.09it/s]

 51%|█████     | 16629/32777 [00:24<00:23, 701.78it/s]

 51%|█████     | 16701/32777 [00:24<00:25, 632.10it/s]

 51%|█████     | 16767/32777 [00:24<00:25, 624.36it/s]

 51%|█████▏    | 16831/32777 [00:24<00:26, 594.14it/s]

 52%|█████▏    | 16906/32777 [00:24<00:25, 634.17it/s]

 52%|█████▏    | 16992/32777 [00:24<00:22, 692.45it/s]

 52%|█████▏    | 17086/32777 [00:24<00:20, 758.91it/s]

 52%|█████▏    | 17164/32777 [00:24<00:21, 710.30it/s]

 53%|█████▎    | 17237/32777 [00:25<00:22, 679.82it/s]

 53%|█████▎    | 17308/32777 [00:25<00:22, 687.48it/s]

 53%|█████▎    | 17378/32777 [00:25<00:23, 644.21it/s]

 53%|█████▎    | 17448/32777 [00:25<00:23, 652.13it/s]

 53%|█████▎    | 17514/32777 [00:25<00:23, 650.47it/s]

 54%|█████▎    | 17584/32777 [00:25<00:22, 661.46it/s]

 54%|█████▍    | 17651/32777 [00:25<00:24, 615.93it/s]

 54%|█████▍    | 17714/32777 [00:25<00:27, 550.14it/s]

 54%|█████▍    | 17777/32777 [00:26<00:26, 569.80it/s]

 54%|█████▍    | 17840/32777 [00:26<00:25, 582.25it/s]

 55%|█████▍    | 17912/32777 [00:26<00:23, 619.82it/s]

 55%|█████▍    | 17976/32777 [00:26<00:25, 583.03it/s]

 55%|█████▌    | 18036/32777 [00:26<00:26, 559.63it/s]

 55%|█████▌    | 18094/32777 [00:26<00:26, 556.74it/s]

 55%|█████▌    | 18151/32777 [00:26<00:28, 510.82it/s]

 56%|█████▌    | 18215/32777 [00:26<00:26, 541.61it/s]

 56%|█████▌    | 18271/32777 [00:26<00:28, 501.04it/s]

 56%|█████▌    | 18345/32777 [00:27<00:25, 559.12it/s]

 56%|█████▌    | 18403/32777 [00:27<00:25, 558.96it/s]

 56%|█████▋    | 18464/32777 [00:27<00:25, 568.78it/s]

 57%|█████▋    | 18535/32777 [00:27<00:23, 606.27it/s]

 57%|█████▋    | 18636/32777 [00:27<00:19, 719.79it/s]

 57%|█████▋    | 18709/32777 [00:27<00:20, 693.62it/s]

 57%|█████▋    | 18780/32777 [00:27<00:23, 605.88it/s]

 57%|█████▋    | 18843/32777 [00:27<00:24, 567.90it/s]

 58%|█████▊    | 18915/32777 [00:27<00:23, 602.40it/s]

 58%|█████▊    | 18991/32777 [00:28<00:21, 641.64it/s]

 58%|█████▊    | 19057/32777 [00:28<00:23, 576.57it/s]

 58%|█████▊    | 19117/32777 [00:28<00:23, 581.85it/s]

 59%|█████▊    | 19190/32777 [00:28<00:21, 621.02it/s]

 59%|█████▉    | 19263/32777 [00:28<00:20, 648.94it/s]

 59%|█████▉    | 19330/32777 [00:28<00:20, 641.99it/s]

 59%|█████▉    | 19411/32777 [00:28<00:19, 681.04it/s]

 59%|█████▉    | 19488/32777 [00:28<00:19, 696.17it/s]

 60%|█████▉    | 19559/32777 [00:28<00:21, 606.02it/s]

 60%|█████▉    | 19622/32777 [00:29<00:21, 609.42it/s]

 60%|██████    | 19711/32777 [00:29<00:19, 680.42it/s]

 60%|██████    | 19781/32777 [00:29<00:19, 668.08it/s]

 61%|██████    | 19850/32777 [00:29<00:20, 631.59it/s]

 61%|██████    | 19915/32777 [00:29<00:21, 604.51it/s]

 61%|██████    | 19983/32777 [00:29<00:20, 621.12it/s]

 61%|██████    | 20046/32777 [00:29<00:22, 572.15it/s]

 61%|██████▏   | 20105/32777 [00:29<00:23, 537.26it/s]

 62%|██████▏   | 20182/32777 [00:29<00:21, 592.74it/s]

 62%|██████▏   | 20248/32777 [00:30<00:20, 609.52it/s]

 62%|██████▏   | 20320/32777 [00:30<00:19, 634.79it/s]

 62%|██████▏   | 20385/32777 [00:30<00:20, 615.23it/s]

 62%|██████▏   | 20448/32777 [00:30<00:20, 610.20it/s]

 63%|██████▎   | 20510/32777 [00:30<00:20, 609.77it/s]

 63%|██████▎   | 20595/32777 [00:30<00:17, 678.11it/s]

 63%|██████▎   | 20679/32777 [00:30<00:16, 717.95it/s]

 63%|██████▎   | 20766/32777 [00:30<00:15, 761.26it/s]

 64%|██████▎   | 20843/32777 [00:30<00:15, 754.96it/s]

 64%|██████▍   | 20929/32777 [00:31<00:15, 780.10it/s]

 64%|██████▍   | 21008/32777 [00:31<00:16, 721.63it/s]

 64%|██████▍   | 21095/32777 [00:31<00:15, 757.11it/s]

 65%|██████▍   | 21172/32777 [00:31<00:15, 728.55it/s]

 65%|██████▍   | 21246/32777 [00:31<00:15, 731.69it/s]

 65%|██████▌   | 21320/32777 [00:31<00:15, 733.57it/s]

 65%|██████▌   | 21401/32777 [00:31<00:15, 754.88it/s]

 66%|██████▌   | 21507/32777 [00:31<00:13, 841.31it/s]

 66%|██████▌   | 21592/32777 [00:31<00:13, 832.54it/s]

 66%|██████▌   | 21690/32777 [00:31<00:12, 867.55it/s]

 66%|██████▋   | 21777/32777 [00:32<00:13, 813.34it/s]

 67%|██████▋   | 21860/32777 [00:32<00:19, 553.92it/s]

 67%|██████▋   | 21927/32777 [00:32<00:18, 572.84it/s]

 67%|██████▋   | 21993/32777 [00:32<00:18, 567.62it/s]

 67%|██████▋   | 22070/32777 [00:32<00:17, 612.75it/s]

 68%|██████▊   | 22140/32777 [00:32<00:16, 632.39it/s]

 68%|██████▊   | 22214/32777 [00:32<00:16, 660.18it/s]

 68%|██████▊   | 22284/32777 [00:33<00:16, 650.88it/s]

 68%|██████▊   | 22352/32777 [00:33<00:16, 635.79it/s]

 68%|██████▊   | 22418/32777 [00:33<00:17, 606.45it/s]

 69%|██████▊   | 22480/32777 [00:33<00:18, 571.47it/s]

 69%|██████▉   | 22573/32777 [00:33<00:15, 664.45it/s]

 69%|██████▉   | 22647/32777 [00:33<00:14, 677.87it/s]

 69%|██████▉   | 22717/32777 [00:33<00:14, 680.10it/s]

 70%|██████▉   | 22793/32777 [00:33<00:14, 702.20it/s]

 70%|██████▉   | 22871/32777 [00:33<00:13, 715.95it/s]

 70%|███████   | 22951/32777 [00:33<00:13, 737.68it/s]

 70%|███████   | 23055/32777 [00:34<00:11, 820.94it/s]

 71%|███████   | 23140/32777 [00:34<00:11, 824.00it/s]

 71%|███████   | 23223/32777 [00:34<00:12, 775.81it/s]

 71%|███████   | 23302/32777 [00:34<00:12, 768.80it/s]

 71%|███████▏  | 23380/32777 [00:34<00:13, 721.35it/s]

 72%|███████▏  | 23453/32777 [00:34<00:13, 714.34it/s]

 72%|███████▏  | 23539/32777 [00:34<00:12, 752.90it/s]

 72%|███████▏  | 23650/32777 [00:34<00:10, 852.27it/s]

 72%|███████▏  | 23737/32777 [00:34<00:11, 779.69it/s]

 73%|███████▎  | 23843/32777 [00:35<00:10, 851.60it/s]

 73%|███████▎  | 23933/32777 [00:35<00:10, 851.33it/s]

 73%|███████▎  | 24020/32777 [00:35<00:10, 813.51it/s]

 74%|███████▎  | 24103/32777 [00:35<00:11, 780.76it/s]

 74%|███████▍  | 24182/32777 [00:35<00:11, 751.50it/s]

 74%|███████▍  | 24259/32777 [00:35<00:11, 751.72it/s]

 74%|███████▍  | 24338/32777 [00:35<00:11, 751.00it/s]

 74%|███████▍  | 24414/32777 [00:35<00:11, 731.62it/s]

 75%|███████▍  | 24488/32777 [00:35<00:11, 696.19it/s]

 75%|███████▍  | 24558/32777 [00:36<00:11, 687.40it/s]

 75%|███████▌  | 24654/32777 [00:36<00:10, 759.38it/s]

 76%|███████▌  | 24749/32777 [00:36<00:09, 811.17it/s]

 76%|███████▌  | 24831/32777 [00:36<00:10, 775.74it/s]

 76%|███████▌  | 24910/32777 [00:36<00:10, 745.55it/s]

 76%|███████▋  | 25004/32777 [00:36<00:09, 793.81it/s]

 77%|███████▋  | 25085/32777 [00:36<00:10, 751.29it/s]

 77%|███████▋  | 25161/32777 [00:36<00:10, 749.48it/s]

 77%|███████▋  | 25252/32777 [00:36<00:09, 783.28it/s]

 77%|███████▋  | 25331/32777 [00:37<00:09, 768.33it/s]

 78%|███████▊  | 25409/32777 [00:37<00:10, 736.53it/s]

 78%|███████▊  | 25514/32777 [00:37<00:08, 815.97it/s]

 78%|███████▊  | 25597/32777 [00:37<00:09, 781.08it/s]

 78%|███████▊  | 25676/32777 [00:37<00:09, 769.02it/s]

 79%|███████▊  | 25758/32777 [00:37<00:09, 778.47it/s]

 79%|███████▉  | 25837/32777 [00:37<00:09, 764.30it/s]

 79%|███████▉  | 25928/32777 [00:37<00:08, 796.31it/s]

 79%|███████▉  | 26008/32777 [00:37<00:08, 791.96it/s]

 80%|███████▉  | 26088/32777 [00:38<00:09, 686.75it/s]

 80%|███████▉  | 26177/32777 [00:38<00:08, 740.15it/s]

 80%|████████  | 26254/32777 [00:38<00:08, 741.22it/s]

 80%|████████  | 26330/32777 [00:38<00:08, 721.49it/s]

 81%|████████  | 26404/32777 [00:38<00:09, 686.57it/s]

 81%|████████  | 26474/32777 [00:38<00:10, 618.17it/s]

 81%|████████  | 26556/32777 [00:38<00:09, 669.08it/s]

 81%|████████▏ | 26648/32777 [00:38<00:08, 734.66it/s]

 82%|████████▏ | 26724/32777 [00:38<00:08, 739.92it/s]

 82%|████████▏ | 26803/32777 [00:39<00:07, 749.74it/s]

 82%|████████▏ | 26893/32777 [00:39<00:07, 789.17it/s]

 82%|████████▏ | 27001/32777 [00:39<00:06, 866.83it/s]

 83%|████████▎ | 27089/32777 [00:39<00:06, 861.69it/s]

 83%|████████▎ | 27176/32777 [00:39<00:06, 822.57it/s]

 83%|████████▎ | 27279/32777 [00:39<00:06, 877.09it/s]

 83%|████████▎ | 27368/32777 [00:39<00:07, 738.12it/s]

 84%|████████▎ | 27446/32777 [00:39<00:07, 737.37it/s]

 84%|████████▍ | 27523/32777 [00:39<00:07, 725.51it/s]

 84%|████████▍ | 27607/32777 [00:40<00:06, 751.02it/s]

 85%|████████▍ | 27698/32777 [00:40<00:06, 793.42it/s]

 85%|████████▍ | 27779/32777 [00:40<00:06, 789.34it/s]

 85%|████████▌ | 27864/32777 [00:40<00:06, 804.03it/s]

 85%|████████▌ | 27961/32777 [00:40<00:05, 851.61it/s]

 86%|████████▌ | 28047/32777 [00:40<00:05, 824.42it/s]

 86%|████████▌ | 28131/32777 [00:40<00:05, 800.87it/s]

 86%|████████▌ | 28212/32777 [00:40<00:06, 717.42it/s]

 86%|████████▋ | 28286/32777 [00:40<00:06, 708.28it/s]

 87%|████████▋ | 28359/32777 [00:41<00:06, 641.77it/s]

 87%|████████▋ | 28425/32777 [00:41<00:06, 626.61it/s]

 87%|████████▋ | 28511/32777 [00:41<00:06, 687.75it/s]

 87%|████████▋ | 28583/32777 [00:41<00:06, 692.66it/s]

 87%|████████▋ | 28665/32777 [00:41<00:05, 718.37it/s]

 88%|████████▊ | 28738/32777 [00:41<00:06, 655.23it/s]

 88%|████████▊ | 28810/32777 [00:41<00:05, 663.11it/s]

 88%|████████▊ | 28888/32777 [00:41<00:05, 694.69it/s]

 88%|████████▊ | 28977/32777 [00:41<00:05, 732.41it/s]

 89%|████████▊ | 29067/32777 [00:42<00:04, 774.02it/s]

 89%|████████▉ | 29146/32777 [00:42<00:04, 754.43it/s]

 89%|████████▉ | 29223/32777 [00:42<00:04, 726.24it/s]

 89%|████████▉ | 29300/32777 [00:42<00:04, 738.29it/s]

 90%|████████▉ | 29375/32777 [00:42<00:04, 734.18it/s]

 90%|████████▉ | 29449/32777 [00:42<00:04, 703.19it/s]

 90%|█████████ | 29543/32777 [00:42<00:04, 766.19it/s]

 90%|█████████ | 29621/32777 [00:42<00:04, 724.18it/s]

 91%|█████████ | 29701/32777 [00:42<00:04, 741.34it/s]

 91%|█████████ | 29776/32777 [00:43<00:04, 725.80it/s]

 91%|█████████ | 29850/32777 [00:43<00:04, 698.84it/s]

 91%|█████████▏| 29930/32777 [00:43<00:03, 725.59it/s]

 92%|█████████▏| 30007/32777 [00:43<00:03, 726.15it/s]

 92%|█████████▏| 30083/32777 [00:43<00:03, 735.58it/s]

 92%|█████████▏| 30166/32777 [00:43<00:03, 752.46it/s]

 92%|█████████▏| 30242/32777 [00:43<00:03, 706.27it/s]

 93%|█████████▎| 30326/32777 [00:43<00:03, 743.09it/s]

 93%|█████████▎| 30411/32777 [00:43<00:03, 769.29it/s]

 93%|█████████▎| 30497/32777 [00:44<00:02, 792.43it/s]

 93%|█████████▎| 30588/32777 [00:44<00:02, 825.01it/s]

 94%|█████████▎| 30671/32777 [00:44<00:02, 776.57it/s]

 94%|█████████▍| 30756/32777 [00:44<00:02, 794.37it/s]

 94%|█████████▍| 30853/32777 [00:44<00:02, 842.18it/s]

 94%|█████████▍| 30938/32777 [00:44<00:02, 813.28it/s]

 95%|█████████▍| 31020/32777 [00:44<00:02, 800.53it/s]

 95%|█████████▍| 31101/32777 [00:44<00:02, 736.31it/s]

 95%|█████████▌| 31185/32777 [00:44<00:02, 762.33it/s]

 95%|█████████▌| 31285/32777 [00:44<00:01, 813.56it/s]

 96%|█████████▌| 31368/32777 [00:45<00:01, 807.72it/s]

 96%|█████████▌| 31488/32777 [00:45<00:01, 914.50it/s]

 96%|█████████▋| 31581/32777 [00:45<00:01, 832.43it/s]

 97%|█████████▋| 31667/32777 [00:45<00:01, 774.80it/s]

 97%|█████████▋| 31747/32777 [00:45<00:01, 766.03it/s]

 97%|█████████▋| 31825/32777 [00:45<00:01, 702.23it/s]

 97%|█████████▋| 31897/32777 [00:45<00:01, 664.19it/s]

 98%|█████████▊| 31968/32777 [00:45<00:01, 673.72it/s]

 98%|█████████▊| 32037/32777 [00:46<00:01, 658.81it/s]

 98%|█████████▊| 32106/32777 [00:46<00:01, 663.28it/s]

 98%|█████████▊| 32173/32777 [00:46<00:00, 660.54it/s]

 98%|█████████▊| 32240/32777 [00:46<00:00, 591.43it/s]

 99%|█████████▊| 32307/32777 [00:46<00:00, 605.93it/s]

 99%|█████████▉| 32389/32777 [00:46<00:00, 656.98it/s]

 99%|█████████▉| 32493/32777 [00:46<00:00, 762.33it/s]

 99%|█████████▉| 32573/32777 [00:46<00:00, 766.54it/s]

100%|█████████▉| 32667/32777 [00:46<00:00, 810.79it/s]

100%|█████████▉| 32755/32777 [00:47<00:00, 830.28it/s]

100%|██████████| 32777/32777 [00:47<00:00, 696.80it/s]


targets.shape: (64953,)
contexts.shape: (64953, 5)
labels.shape: (64953, 5)

Configure the dataset for performance

To perform efficient batching for the potentially large number of training examples, use the tf.data.Dataset API. After this step, you would have a tf.data.Dataset object of (target_word, context_word), (label) elements to train your word2vec model!
BATCH_SIZE = 1024
BUFFER_SIZE = 10000
dataset = tf.data.Dataset.from_tensor_slices(((targets, contexts), labels))
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
print(dataset)
<BatchDataset element_spec=((TensorSpec(shape=(1024,), dtype=tf.int64, name=None), TensorSpec(shape=(1024, 5), dtype=tf.int64, name=None)), TensorSpec(shape=(1024, 5), dtype=tf.int64, name=None))>
Apply Dataset.cache and Dataset.prefetch to improve performance:
dataset = dataset.cache().prefetch(buffer_size=AUTOTUNE)
print(dataset)
<PrefetchDataset element_spec=((TensorSpec(shape=(1024,), dtype=tf.int64, name=None), TensorSpec(shape=(1024, 5), dtype=tf.int64, name=None)), TensorSpec(shape=(1024, 5), dtype=tf.int64, name=None))>

Model and training

The word2vec model can be implemented as a classifier to distinguish between true context words from skip-grams and false context words obtained through negative sampling. You can perform a dot product multiplication between the embeddings of target and context words to obtain predictions for labels and compute the loss function against true labels in the dataset.

Subclassed word2vec model

Use the Keras Subclassing API to define your word2vec model with the following layers:
  • target_embedding: A tf.keras.layers.Embedding layer, which looks up the embedding of a word when it appears as a target word. The number of parameters in this layer are (vocab_size * embedding_dim).
  • context_embedding: Another tf.keras.layers.Embedding layer, which looks up the embedding of a word when it appears as a context word. The number of parameters in this layer are the same as those in target_embedding, i.e. (vocab_size * embedding_dim).
  • dots: A tf.keras.layers.Dot layer that computes the dot product of target and context embeddings from a training pair.
  • flatten: A tf.keras.layers.Flatten layer to flatten the results of dots layer into logits.
With the subclassed model, you can define the call() function that accepts (target, context) pairs which can then be passed into their corresponding embedding layer. Reshape the context_embedding to perform a dot product with target_embedding and return the flattened result. Key point: The target_embedding and context_embedding layers can be shared as well. You could also use a concatenation of both embeddings as the final word2vec embedding.
class Word2Vec(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim):
        super(Word2Vec, self).__init__()
        self.target_embedding = layers.Embedding(
            vocab_size, embedding_dim, input_length=1, name="w2v_embedding"
        )
        self.context_embedding = layers.Embedding(
            vocab_size, embedding_dim, input_length=num_ns + 1
        )

    def call(self, pair):
        target, context = pair
        # target: (batch, dummy?)  # The dummy axis doesn't exist in TF2.7+
        # context: (batch, context)
        if len(target.shape) == 2:
            target = tf.squeeze(target, axis=1)
        # target: (batch,)
        word_emb = self.target_embedding(target)
        # word_emb: (batch, embed)
        context_emb = self.context_embedding(context)
        # context_emb: (batch, context, embed)
        dots = tf.einsum("be,bce->bc", word_emb, context_emb)
        # dots: (batch, context)
        return dots

Define loss function and compile model

For simplicity, you can use tf.keras.losses.CategoricalCrossEntropy as an alternative to the negative sampling loss. If you would like to write your own custom loss function, you can also do so as follows:
def custom_loss(x_logit, y_true):
      return tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=y_true)
It’s time to build your model! Instantiate your word2vec class with an embedding dimension of 128 (you could experiment with different values). Compile the model with the tf.keras.optimizers.Adam optimizer.
embedding_dim = 128
word2vec = Word2Vec(vocab_size, embedding_dim)
word2vec.compile(
    optimizer="adam",
    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)
Also define a callback to log training statistics for TensorBoard:
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir="logs")
Train the model on the dataset for some number of epochs:
word2vec.fit(dataset, epochs=20, callbacks=[tensorboard_callback])
Epoch 1/20

 1/63 [..............................] - ETA: 1:09 - loss: 1.6090 - accuracy: 0.1924

 2/63 [..............................] - ETA: 9s - loss: 1.6092 - accuracy: 0.1997

 3/63 [>.............................] - ETA: 8s - loss: 1.6092 - accuracy: 0.2002

 4/63 [>.............................] - ETA: 8s - loss: 1.6092 - accuracy: 0.2000

 5/63 [=>............................] - ETA: 8s - loss: 1.6092 - accuracy: 0.1980

 6/63 [=>............................] - ETA: 8s - loss: 1.6092 - accuracy: 0.2021

 7/63 [==>...........................] - ETA: 8s - loss: 1.6092 - accuracy: 0.2020

 8/63 [==>...........................] - ETA: 8s - loss: 1.6092 - accuracy: 0.2029

 9/63 [===>..........................] - ETA: 7s - loss: 1.6093 - accuracy: 0.2040

10/63 [===>..........................] - ETA: 7s - loss: 1.6093 - accuracy: 0.2039

11/63 [====>.........................] - ETA: 7s - loss: 1.6092 - accuracy: 0.2067

12/63 [====>.........................] - ETA: 7s - loss: 1.6092 - accuracy: 0.2091

13/63 [=====>........................] - ETA: 7s - loss: 1.6092 - accuracy: 0.2078

14/63 [=====>........................] - ETA: 7s - loss: 1.6092 - accuracy: 0.2081

15/63 [======>.......................] - ETA: 6s - loss: 1.6092 - accuracy: 0.2083

16/63 [======>.......................] - ETA: 6s - loss: 1.6091 - accuracy: 0.2097

17/63 [=======>......................] - ETA: 6s - loss: 1.6091 - accuracy: 0.2105

18/63 [=======>......................] - ETA: 6s - loss: 1.6091 - accuracy: 0.2115

19/63 [========>.....................] - ETA: 6s - loss: 1.6091 - accuracy: 0.2117

20/63 [========>.....................] - ETA: 6s - loss: 1.6091 - accuracy: 0.2124

21/63 [=========>....................] - ETA: 6s - loss: 1.6091 - accuracy: 0.2122

22/63 [=========>....................] - ETA: 6s - loss: 1.6091 - accuracy: 0.2133

23/63 [=========>....................] - ETA: 6s - loss: 1.6091 - accuracy: 0.2145

24/63 [==========>...................] - ETA: 5s - loss: 1.6090 - accuracy: 0.2149

25/63 [==========>...................] - ETA: 5s - loss: 1.6090 - accuracy: 0.2146

26/63 [===========>..................] - ETA: 5s - loss: 1.6090 - accuracy: 0.2145

27/63 [===========>..................] - ETA: 5s - loss: 1.6090 - accuracy: 0.2146

28/63 [============>.................] - ETA: 5s - loss: 1.6090 - accuracy: 0.2160

29/63 [============>.................] - ETA: 5s - loss: 1.6090 - accuracy: 0.2167

31/63 [=============>................] - ETA: 4s - loss: 1.6089 - accuracy: 0.2177

32/63 [==============>...............] - ETA: 4s - loss: 1.6089 - accuracy: 0.2181

33/63 [==============>...............] - ETA: 4s - loss: 1.6089 - accuracy: 0.2182

34/63 [===============>..............] - ETA: 4s - loss: 1.6088 - accuracy: 0.2188

37/63 [================>.............] - ETA: 3s - loss: 1.6088 - accuracy: 0.2199

38/63 [=================>............] - ETA: 3s - loss: 1.6088 - accuracy: 0.2199

39/63 [=================>............] - ETA: 3s - loss: 1.6087 - accuracy: 0.2202

40/63 [==================>...........] - ETA: 3s - loss: 1.6087 - accuracy: 0.2207

41/63 [==================>...........] - ETA: 3s - loss: 1.6087 - accuracy: 0.2214

42/63 [===================>..........] - ETA: 2s - loss: 1.6087 - accuracy: 0.2216

43/63 [===================>..........] - ETA: 2s - loss: 1.6087 - accuracy: 0.2219

44/63 [===================>..........] - ETA: 2s - loss: 1.6086 - accuracy: 0.2227

47/63 [=====================>........] - ETA: 2s - loss: 1.6086 - accuracy: 0.2233

48/63 [=====================>........] - ETA: 1s - loss: 1.6086 - accuracy: 0.2237

55/63 [=========================>....] - ETA: 0s - loss: 1.6084 - accuracy: 0.2268

56/63 [=========================>....] - ETA: 0s - loss: 1.6084 - accuracy: 0.2273

60/63 [===========================>..] - ETA: 0s - loss: 1.6083 - accuracy: 0.2295

62/63 [============================>.] - ETA: 0s - loss: 1.6082 - accuracy: 0.2307

63/63 [==============================] - ETA: 0s - loss: 1.6082 - accuracy: 0.2314

63/63 [==============================] - 8s 112ms/step - loss: 1.6082 - accuracy: 0.2314
Epoch 2/20

 1/63 [..............................] - ETA: 0s - loss: 1.5901 - accuracy: 0.7510

20/63 [========>.....................] - ETA: 0s - loss: 1.5938 - accuracy: 0.5989

39/63 [=================>............] - ETA: 0s - loss: 1.5920 - accuracy: 0.5735

58/63 [==========================>...] - ETA: 0s - loss: 1.5894 - accuracy: 0.5599

63/63 [==============================] - 0s 3ms/step - loss: 1.5886 - accuracy: 0.5562
Epoch 3/20

 1/63 [..............................] - ETA: 0s - loss: 1.5586 - accuracy: 0.7285

20/63 [========>.....................] - ETA: 0s - loss: 1.5575 - accuracy: 0.6493

39/63 [=================>............] - ETA: 0s - loss: 1.5507 - accuracy: 0.6189

57/63 [==========================>...] - ETA: 0s - loss: 1.5431 - accuracy: 0.6033

63/63 [==============================] - 0s 3ms/step - loss: 1.5403 - accuracy: 0.5982
Epoch 4/20

 1/63 [..............................] - ETA: 0s - loss: 1.4883 - accuracy: 0.6328

20/63 [========>.....................] - ETA: 0s - loss: 1.4820 - accuracy: 0.5963

39/63 [=================>............] - ETA: 0s - loss: 1.4721 - accuracy: 0.5783

58/63 [==========================>...] - ETA: 0s - loss: 1.4603 - accuracy: 0.5749

63/63 [==============================] - 0s 3ms/step - loss: 1.4573 - accuracy: 0.5730
Epoch 5/20

 1/63 [..............................] - ETA: 0s - loss: 1.3923 - accuracy: 0.6006

20/63 [========>.....................] - ETA: 0s - loss: 1.3834 - accuracy: 0.5872

39/63 [=================>............] - ETA: 0s - loss: 1.3744 - accuracy: 0.5778

57/63 [==========================>...] - ETA: 0s - loss: 1.3623 - accuracy: 0.5815

63/63 [==============================] - 0s 3ms/step - loss: 1.3589 - accuracy: 0.5810
Epoch 6/20

 1/63 [..............................] - ETA: 0s - loss: 1.2932 - accuracy: 0.6055

20/63 [========>.....................] - ETA: 0s - loss: 1.2829 - accuracy: 0.6085

39/63 [=================>............] - ETA: 0s - loss: 1.2760 - accuracy: 0.6043

58/63 [==========================>...] - ETA: 0s - loss: 1.2639 - accuracy: 0.6096

63/63 [==============================] - 0s 3ms/step - loss: 1.2615 - accuracy: 0.6101
Epoch 7/20

 1/63 [..............................] - ETA: 0s - loss: 1.2011 - accuracy: 0.6357

20/63 [========>.....................] - ETA: 0s - loss: 1.1888 - accuracy: 0.6427

39/63 [=================>............] - ETA: 0s - loss: 1.1835 - accuracy: 0.6400

58/63 [==========================>...] - ETA: 0s - loss: 1.1723 - accuracy: 0.6448

63/63 [==============================] - 0s 3ms/step - loss: 1.1704 - accuracy: 0.6450
Epoch 8/20

 1/63 [..............................] - ETA: 0s - loss: 1.1166 - accuracy: 0.6689

20/63 [========>.....................] - ETA: 0s - loss: 1.1020 - accuracy: 0.6766

39/63 [=================>............] - ETA: 0s - loss: 1.0978 - accuracy: 0.6744

58/63 [==========================>...] - ETA: 0s - loss: 1.0874 - accuracy: 0.6791

63/63 [==============================] - 0s 3ms/step - loss: 1.0858 - accuracy: 0.6794
Epoch 9/20

 1/63 [..............................] - ETA: 0s - loss: 1.0387 - accuracy: 0.6982

20/63 [========>.....................] - ETA: 0s - loss: 1.0220 - accuracy: 0.7100

40/63 [==================>...........] - ETA: 0s - loss: 1.0179 - accuracy: 0.7067

59/63 [===========================>..] - ETA: 0s - loss: 1.0084 - accuracy: 0.7106

63/63 [==============================] - 0s 3ms/step - loss: 1.0075 - accuracy: 0.7106
Epoch 10/20

 1/63 [..............................] - ETA: 0s - loss: 0.9666 - accuracy: 0.7324

20/63 [========>.....................] - ETA: 0s - loss: 0.9481 - accuracy: 0.7414

39/63 [=================>............] - ETA: 0s - loss: 0.9445 - accuracy: 0.7384

58/63 [==========================>...] - ETA: 0s - loss: 0.9357 - accuracy: 0.7414

63/63 [==============================] - 0s 3ms/step - loss: 0.9348 - accuracy: 0.7413
Epoch 11/20

 1/63 [..............................] - ETA: 0s - loss: 0.8997 - accuracy: 0.7549

20/63 [========>.....................] - ETA: 0s - loss: 0.8798 - accuracy: 0.7657

39/63 [=================>............] - ETA: 0s - loss: 0.8763 - accuracy: 0.7638

58/63 [==========================>...] - ETA: 0s - loss: 0.8683 - accuracy: 0.7661

63/63 [==============================] - 0s 3ms/step - loss: 0.8676 - accuracy: 0.7657
Epoch 12/20

 1/63 [..............................] - ETA: 0s - loss: 0.8376 - accuracy: 0.7764

20/63 [========>.....................] - ETA: 0s - loss: 0.8170 - accuracy: 0.7875

39/63 [=================>............] - ETA: 0s - loss: 0.8133 - accuracy: 0.7856

58/63 [==========================>...] - ETA: 0s - loss: 0.8060 - accuracy: 0.7874

63/63 [==============================] - 0s 3ms/step - loss: 0.8056 - accuracy: 0.7871
Epoch 13/20

 1/63 [..............................] - ETA: 0s - loss: 0.7802 - accuracy: 0.7910

20/63 [========>.....................] - ETA: 0s - loss: 0.7592 - accuracy: 0.8057

39/63 [=================>............] - ETA: 0s - loss: 0.7554 - accuracy: 0.8050

58/63 [==========================>...] - ETA: 0s - loss: 0.7488 - accuracy: 0.8070

63/63 [==============================] - 0s 3ms/step - loss: 0.7485 - accuracy: 0.8069
Epoch 14/20

 1/63 [..............................] - ETA: 0s - loss: 0.7272 - accuracy: 0.8105

20/63 [========>.....................] - ETA: 0s - loss: 0.7063 - accuracy: 0.8236

39/63 [=================>............] - ETA: 0s - loss: 0.7022 - accuracy: 0.8241

58/63 [==========================>...] - ETA: 0s - loss: 0.6964 - accuracy: 0.8260

63/63 [==============================] - 0s 3ms/step - loss: 0.6962 - accuracy: 0.8258
Epoch 15/20

 1/63 [..............................] - ETA: 0s - loss: 0.6785 - accuracy: 0.8242

20/63 [========>.....................] - ETA: 0s - loss: 0.6579 - accuracy: 0.8401

39/63 [=================>............] - ETA: 0s - loss: 0.6536 - accuracy: 0.8408

58/63 [==========================>...] - ETA: 0s - loss: 0.6484 - accuracy: 0.8417

63/63 [==============================] - 0s 3ms/step - loss: 0.6484 - accuracy: 0.8415
Epoch 16/20

 1/63 [..............................] - ETA: 0s - loss: 0.6337 - accuracy: 0.8428

21/63 [=========>....................] - ETA: 0s - loss: 0.6139 - accuracy: 0.8523

41/63 [==================>...........] - ETA: 0s - loss: 0.6092 - accuracy: 0.8538

61/63 [============================>.] - ETA: 0s - loss: 0.6049 - accuracy: 0.8548

63/63 [==============================] - 0s 3ms/step - loss: 0.6048 - accuracy: 0.8549
Epoch 17/20

 1/63 [..............................] - ETA: 0s - loss: 0.5926 - accuracy: 0.8555

20/63 [========>.....................] - ETA: 0s - loss: 0.5735 - accuracy: 0.8662

39/63 [=================>............] - ETA: 0s - loss: 0.5689 - accuracy: 0.8665

57/63 [==========================>...] - ETA: 0s - loss: 0.5648 - accuracy: 0.8673

63/63 [==============================] - 0s 3ms/step - loss: 0.5650 - accuracy: 0.8671
Epoch 18/20

 1/63 [..............................] - ETA: 0s - loss: 0.5551 - accuracy: 0.8662

21/63 [=========>....................] - ETA: 0s - loss: 0.5367 - accuracy: 0.8756

41/63 [==================>...........] - ETA: 0s - loss: 0.5321 - accuracy: 0.8769

60/63 [===========================>..] - ETA: 0s - loss: 0.5288 - accuracy: 0.8773

63/63 [==============================] - 0s 3ms/step - loss: 0.5288 - accuracy: 0.8775
Epoch 19/20

 1/63 [..............................] - ETA: 0s - loss: 0.5209 - accuracy: 0.8770

20/63 [========>.....................] - ETA: 0s - loss: 0.5035 - accuracy: 0.8837

39/63 [=================>............] - ETA: 0s - loss: 0.4988 - accuracy: 0.8851

58/63 [==========================>...] - ETA: 0s - loss: 0.4955 - accuracy: 0.8863

63/63 [==============================] - 0s 3ms/step - loss: 0.4959 - accuracy: 0.8864
Epoch 20/20

 1/63 [..............................] - ETA: 0s - loss: 0.4896 - accuracy: 0.8828

20/63 [========>.....................] - ETA: 0s - loss: 0.4731 - accuracy: 0.8933

39/63 [=================>............] - ETA: 0s - loss: 0.4684 - accuracy: 0.8950

58/63 [==========================>...] - ETA: 0s - loss: 0.4655 - accuracy: 0.8959

63/63 [==============================] - 0s 3ms/step - loss: 0.4659 - accuracy: 0.8959
<keras.callbacks.History at 0x7f6bd0344f70>
TensorBoard now shows the word2vec model’s accuracy and loss:
# docs_infra: no_execute
%tensorboard --logdir logs

Embedding lookup and analysis

Obtain the weights from the model using Model.get_layer and Layer.get_weights. The TextVectorization.get_vocabulary function provides the vocabulary to build a metadata file with one token per line.
weights = word2vec.get_layer("w2v_embedding").get_weights()[0]
vocab = vectorize_layer.get_vocabulary()
Create and save the vectors and metadata files:
out_v = io.open("vectors.tsv", "w", encoding="utf-8")
out_m = io.open("metadata.tsv", "w", encoding="utf-8")

for index, word in enumerate(vocab):
    if index == 0:
        continue  # skip 0, it's padding.
    vec = weights[index]
    out_v.write("\t".join([str(x) for x in vec]) + "\n")
    out_m.write(word + "\n")
out_v.close()
out_m.close()
Download the vectors.tsv and metadata.tsv to analyze the obtained embeddings in the Embedding Projector:
try:
    from google.colab import files

    files.download("vectors.tsv")
    files.download("metadata.tsv")
except Exception:
    pass

Next steps

This tutorial has shown you how to implement a skip-gram word2vec model with negative sampling from scratch and visualize the obtained word embeddings.