Is there a missing section in 05: Building Your First HF Dataset?
In 05: Building Your First Hugging Face Dataset section of the course, the Cleaning Data page is skipped. Now that could have been normal, however, upon getting to the Building Dataset/DataPipe page of the notebook, the features morph from a list (in the datasets) to tensors. It is not clear how this was achieved. Can you kindly validate this? Thanks so much for the updated material.
Best Answer
-
Hi @mklomo ,
Thank you for pointing this out. You're absolutely right, there was a missing paragraph and corresponding snippet of code, our apologies for the confusion.
The output of the dataset should be a dictionary of lists at that point.
{'label': [[14390.0], [17000.0]], 'cont_X': [[2019.0, 8307.0, 145.0, 39.20000076293945, 1.399999976158142], [2018.0, 19566.0, 145.0, 54.29999923706055, 2.0]], 'cat_X': [[109, 1, 4], [1, 1, 0]]}The missing snippet (below) sets the output format, so whenever the data is retrieved, it produces the desired dictionary of tensors.
datasets = datasets.with_format('torch') datasets['train'][:2]{'label': tensor([[14390.], [17000.]]), 'cont_X': tensor([[2.0190e+03, 8.3070e+03, 1.4500e+02, 3.9200e+01, 1.4000e+00], [2.0180e+03, 1.9566e+04, 1.4500e+02, 5.4300e+01, 2.0000e+00]]), 'cat_X': tensor([[109, 1, 4], [ 1, 1, 0]])}The content has already been corrected to include the missing snippet.
Let us know if you have any more questions.
0
Answers
-
Thanks, @dvgodoy. I had to take a break from the course to finish my PhD comps, and this was really helpful.
A related question I have is why the material here does not cover Generative Adversarial Networks (GANs). If you can point us (myself and future learners) to any relevant resources on GANs, we would be grateful.
0 -
Hi @mklomo ,
I'm glad you found it helpful!
Regarding GANs, they have been superseded by diffusion models in general. GANs were notoriously tricky to train, as one had to balance the training of two competing models.
Generative models, especially for images, are a big area on its own, so we would need a full course to cover so much material.
Having said that, back in 2022, I presented a short tutorial on GANs at ODSC Europe conference, you can find the materials here: https://github.com/dvgodoy/GANsNRoses_ODSC_Europe2022
And, if you're interested in diffusion models as well, there's a tutorial from 2023 here: https://github.com/dvgodoy/DiffusionModels101_ODSC_Europe2023
I hope it helps!
0
Categories
- All Categories
- 176 LFX Mentorship
- 176 LFX Mentorship: Linux Kernel
- 750 Linux Foundation IT Professional Programs
- 373 Cloud Engineer IT Professional Program
- 169 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 4 DevOps & GitOps IT Professional Program
- 99 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 1 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 3 Cloud & Containers Training
- 1 Cybersecurity Training
- 1 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 1 Open Source Best Practice Training
- 1 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 768 Linux Distributions
- 81 Debian
- 67 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 106 Mobile Computing
- 18 Android
- 73 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 949 Programming and Development
- 310 Kernel Development
- 621 Software Development
- 982 Software
- 374 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)
