Datasets#

Download datasets for Machine Learning projects#

Top 5 ways to find datasets for your next Machine Learning project!

Super convenient for those looking to build up a project portfolio to showcase to prospective hiring managers.

  1. Kaggle: https://www.kaggle.com/datasets

  2. Papers with code: https://paperswithcode.com/datasets.

  3. UCI machine learning repository: http://archive.ics.uci.edu/ml/datasets.php

  4. Huggingface datasets library: https://huggingface.co/datasets. The most convenient with a standard interface. pip install datasets. See image.

  5. Google dataset search: https://datasetsearch.research.google.com/. Search for any kind of dataset across the entire internet.

Of course, for the closest experience into what real-world Data Science would look like, you should scrape the data by yourself from the internet, but hey, it’s fun to play with a clean dataset!

Labelstudio#

Labelstudio is the best open-source data labeling tool I’ve come across. You can use it to label several data modalities, including text, images, audio, video, and time series.

Check it out for your next project, and give them a star if you found it helpful!

🌟 Github: https://github.com/heartexlabs/label-studio

labelstudio