Welcome to the Linux Foundation Forum!

Chapter 15 - Global Vectors (GloVe)

When I try to execute the first 2 lines of code:

from gensimtorchtext.vocab import downloaderGloVe 
vec = downloader.load('glove-wiki-gigaword-50')

I get this error from colab:

ModuleNotFoundError: No module named 'gensimtorchtext'

Gemini suggest this code instead:

from torchtext.vocab import GloVe, Vectors

# Use GloVe or Vectors as needed. Assuming GloVe based on original attempt.
# Also, downloader is not directly available in torchtext.vocab,
# We can load GloVe vectors directly.
vec = GloVe(name='840B', dim=50)

And some changes are also applied to the get_vecs_by_tokens function:

def get_vecs_by_tokens(tokens, vec=vec):
    """
    Get word vectors for a list of tokens.
    Handles out-of-vocabulary words by returning a zero vector.
    """
    # Ensure tokens is a list of strings
    tokens = [str(token) for token in tokens]

    # Get vectors for the tokens
    vectors = vec.get_vecs_by_tokens(tokens, lower_case_backup=True)

    return vectors

Is it possible to update this part of the course?
Thanks

Comments

  • dvgodoy
    dvgodoy Posts: 9

    Hi @redmarx ,

    Thank you for pointing out this typo. The line should read:

    from gensim import downloader
    

    The torchtext package was discontinued, so we replaced it with gensim. Unfortunately, during the update, we introduced a typo in the import. We apologize for the confusion.

    Once the import is fixed, the rest should work as expected, including the get_vecs_by_tokens() function, which I reproduce below:

    def func_builder(vec):
        tensor_glove = torch.as_tensor(vec.vectors).float()
        embedding = nn.Embedding.from_pretrained(tensor_glove)
    
        def get_vecs_by_tokens(tokens):
            token_ids = encode_str(vec.key_to_index, tokens)        
            embedded_tokens = get_embeddings(embedding, token_ids)
            return embedded_tokens
    
        return get_vecs_by_tokens
    
    get_vecs_by_tokens = func_builder(vec)
    

    Please let us know if you need anything else.

    Best,
    Daniel

  • fcioanca
    fcioanca Posts: 2,368

    This has been fixed. Thank you for flagging.

Categories

Upcoming Training