Start Your Next Data Project With a Real Purpose

Back in the days, when there were a lot of gods and mystical creatures walking and flying around on Earth, the tyrannical God Zeus decided that humans are stupid and unnecessary beings that got to be erased. Here is were Prometheus jumped in and helped us out with fire. With this new tool our species got inventive, creative and held out Zeus’ wrath.

Now imagine, you want to start a new project in data science, machine learning, data engineering or just anything related to Big Data. You don’t really know where to start but you would like your project to be meaningful or even helpful to others when you finish it. Why not go the Prometheus way and add something really useful to humanity using the fire of the day?

If someone does not really know where to start their next project, more often than not, they will still choose a project of some sort. Mildly engaged, they will work their way through, maybe even learning something at some point. But most of the time they will give up half way through because their initial goal, if they even had one, does not matter to them anymore.

But not all projects have to be doomed like this.

To avoid this, you want to put some serious data to meaningful work, sharpen your skills, help people, help the planet and meet like-minded folks. If you got all of those points in your project, it guarantees a satisfying and finished project.

There are plenty of options for every taste to go along this route. Datagoodie aims to create a reference here for the sweetest opportunities.

Please let us know in the comments below if some great options are missing here and we will work them into this post accordingly.

By giving the fire to the people and saving humanity, Prometheus coined the term philanthropy 2500 years ago because the dudes who wrote their observations down needed a word for this selfless act. Philanthropy directly translated from the greek means “the love of humanity”. This is what we will be practicing as a bonus with every one of these options:

Data competitions

Most data competitions I’ve come across are dealing with a certain problem that you want to solve with the best machine learning model or by setting up an app (prototype) right away. There are online and live competitions that both have their advantages.

drivendata is the perfect place when you want to take part in a data challenge that is playing a role to make the world a better place. For instance, there are serious rumors going around that if bees die out, we die out too. But no worries, our lives are saved for now because brave data science warriors at drivendata rocked the Naive Bees Classifier Competition. But drivendata also attends to other serious planet challenges like water supply, education and much more: All Competitions


Normally kaggle competitions are constructed by commercial companies to squeeze out the best model for their particular use case. However, if you like to get philanthropic, there are also some competitions from time to time that aim to be purpose-driven. For example there is the RSNA Pneumonia Detection Challenge that deals with detecting a disease on image screenings via machine learning.

In online competitions, if you take them seriously, you will have to go very deeply into your technical skill base and work problems out in detail. In live competitions, on the other hand, your creativity and adaptability to challenges is asked the most. Here you will come up with ideas and solutions in a very short period of time.

datathons | DataDives

Companies and NGOs sometimes organize “datathons” which are mostly like regular hackathons but with the goal to solve a data challenge in a few days or even hours. Just search in Google for datathon or DataDive + [YOUR_CITY] and embark on the journey!

Local organizations & online communities

DataKind & DSSG

DataKind mostly tackles pressing challenges like poverty and other local social issues. DataKind actually minted the term DataDive. They organize and help to organize DataDives where they come up with solutions about a question which can result in concepts or even usable machine learning models, apps and products. Data Science for Social Good actually aims for the same goals with similar methods. DSSG is most active in Chicago, Washington and in some Cities of Europe.

open data

Open data is not just a term but a movement. Open data, like the other “open” movements, deals with today’s challenge that a lot of organizations want to keep data under restricted or forbidden access. This is a good idea with companies that are storing masses of personal information like facebook. But so much data is held back by institutions that could be used to solve some of today’s problems. There are local meet-ups all around the globe that you will find on Google and on There is also an Open Data Stack Exchange if you want to start as quickly as possible.

Data for Democracy

Data for Democracy has a vibrant online community with a very strong slack channel. You can go there with your own ideas or get inspired by the creative minds there. Formed in late 2016 they got quite some projects done and appeared in various news posts: D4D in the News

Other Muses

Here are less concrete options to get you going if the options above were nothing for you so far or if you would like to go on your own first:

create your own application

Think of what problems do you encounter in your everyday life? Could there be data that solves that problem? Many solutions are created like this. For example I live in Berlin. The government provides about 2000 datasets. From these datasets dozens of projects have emerged like visualizing underfinanced kindergartens, dangerous places for bike drivers or a huge environmental map to show unhealthy city spots. However, if you convince others about your idea to work together on it, your result might be even more powerful.

contribute to opensource (GitHub/GitLab)

Contributing to “good” open source data projects will bump up your technical skill immensely. Check the repositories of the organizations above for existing projects. Or think of software tools that help people survive and check if there is an open source version of them.

use open data

There are tons of portals that offer open datasets and even more information on which might be the best fit for you. Which one is the most interesting portal for you will depend on the topic you are interested in. E.g. kaggle offers all kinds of datasets, while you probably won’t find historical football results data on Anyway, if you have a certain challenge in mind it is probably best to dig deeper into the web. It is very likely that someone tackled on a similar one already. Often you will find great compilations of very specific datasets in blogs and articles.


If you are proficient in Machine Learning, AI, Python, Scala or any other Big Data tool you will do some real good if you teach it to interested people. This will enable them to do good with the acquired learned skills. You could base the tutorials that you create on a data for a good purpose topic and thus inspire students to go in the same direction in future.

Who knows, maybe we as data loving engineers and enthusiasts are the Prometheuses of the modern era who hold the sparkling data fire to help the human kind there, were it is in need the most.

I also hope some of the tips were so helpful to you that you click the share button and thus maybe invoke even more Good 🚀

datagoodie logo

data rumors and monthly digest