About two years ago I introduced a couple of free sample data sets on GitHub (https://github.com/gvenzl/sample-data). You can read the entire back story of them in this blog post. Back then I stated, “Over time, my aim is to provide more such data sets in that repository, however, I am in no rush to do so. So don’t expect to see anything anytime soon“. Well, that time has now come. Last year my wife and I visited the Kennedy Space Center in Florida which reignited my curiosity about space. I was always very interested in space and space travel and last year I finally fulfilled a childhood dream of mine seeing the Apollo capsules and Space Shuttles with my own eyes. Back then I joked with my wife about how cool it would be to try and put data about our solar system together and sure thing not long after I couldn’t resist any longer and started looking into it.
As it turns out, we have a lot of information when it comes to our solar system yet the sources on the web are fewer than I first thought. Luckily I found the NASA Space Science Data Coordinated Archive (NSSDCA) Planetary Fact Sheets, which provide a great overview and a great many numbers for the sun and the planets in our solar system. These pages are curated by Dr. David R. Williams, the acting head of the NSSDCA, to whom I have reached out to inquire about the license that the data is under. To my luck, the data is all in the public domain and hence I was free to reuse it for my repository.
Having data about the solar system is one thing. Putting it in a format so that it can be loaded into a database and still be meaningful, however, is a whole different story. Almost all numbers are so large that they are best expressed in scientific notation (10n). Some of them are even so large that the data types of some databases wouldn’t even be able to store them otherwise. And then there are the units for the individual values. They range from Joule, Kelvin, Celsius, Kilograms, Kilometers per Second, Meters per Second2, and many, many more. When I looked at the data, storing the values was easy, but conveying what they mean was a challenge on its own. Hence this data set is different from the other ones you will find in the GitHub repository. For the solar system data, the column names are long. They include not only the meaning of the value itself, such as Volume
or Mass
but also the scientific notation and the unit. So, for example, the column name for Mass (1024 kg)
is mass_10_exp_24_kg
, the column name for Volume (1010 km3)
is volume_10_exp_10_km_exp_3
. Of course, there are many other ways that this could be done but to me, this approach seems the most pragmatic one as the column name itself carries the meaning of the values, rather than having to, e.g., look it up somewhere else every time.
Additionally, this data set also provides a metadata
table that contains more human-readable information about the meaning of all the columns and their values.
SQL fun with solar data
Just like with the other data sets you can have some SQL fun with the solar system data. For example, how do the planets in our solar system rank in terms of volume? Easy:
SELECT name, volume_10_exp_10_km_exp_3 AS volume
FROM planets
ORDER BY volume DESC;
NAME VOLUME
__________ __________
Jupiter 143128
Saturn 82713
Uranus 6833
Neptune 6254
Earth 108.321
Venus 92.843
Mars 16.312
Mercury 6.083
Pluto 0.702
Their relative percentage in size to all of them combined, also easy and quite insightful:
SELECT
name,
volume_10_exp_10_km_exp_3 AS volume,
ROUND(
100*
RATIO_TO_REPORT(volume_10_exp_10_km_exp_3)
OVER()
,2) AS percentage
FROM planets
ORDER BY volume DESC;
NAME VOLUME PERCENTAGE
__________ __________ _____________
Jupiter 143128 59.85
Saturn 82713 34.59
Uranus 6833 2.86
Neptune 6254 2.62
Earth 108.321 0.05
Venus 92.843 0.04
Mars 16.312 0.01
Mercury 6.083 0
Pluto 0.702 0
Just look how massive Jupiter and Saturn are! Every other planet is just a tiny “blip” compared to the two. However, what about their density alongside their volume:
SELECT
name,
volume_10_exp_10_km_exp_3 AS volume,
ROUND(
100*
RATIO_TO_REPORT(volume_10_exp_10_km_exp_3)
OVER()
,2) AS "VOLUME %",
mean_density_kg_per_m_exp_3 AS density,
ROUND(
100*
RATIO_TO_REPORT(mean_density_kg_per_m_exp_3)
OVER()
,2) AS "DENSITY %"
FROM planets
ORDER BY volume DESC;
NAME VOLUME VOLUME % DENSITY DENSITY %
__________ __________ ___________ __________ ____________
Jupiter 143128 59.85 1326 4.93
Saturn 82713 34.59 687 2.55
Uranus 6833 2.86 1270 4.72
Neptune 6254 2.62 1638 6.09
Earth 108.321 0.05 5514 20.5
Venus 92.843 0.04 5243 19.49
Mars 16.312 0.01 3934 14.63
Mercury 6.083 0 5429 20.19
Pluto 0.702 0 1854 6.89
Now, this is also interesting! Although Saturn is the second largest planet in the solar system, it’s also the least dense of all of them. On the flip side, Earth is the densest planet of them all with close competition by Mercury.
How do they all compare to the sun? Here is where the real fun starts. Note how the stars
table uses volume_10_exp_12_km_exp_3
(Volume (1012 km3)
) instead of the planets
volume_10_exp_10_km_exp_3
(Volume (1010 km3)
). That’s because the sun is just so much more massive compared to all of our planets. So, to calculate the percentages with the sun included, one first has to make sure these numbers are normalized. This was one of the reasons why I thought having the scientific notion and units in the column names makes this more apparent and hopefully easier to normalize when working with the data:
SELECT
name,
volume,
ROUND((100*RATIO_TO_REPORT(volume) OVER()),7) AS "VOLUME %",
density,
ROUND((100*RATIO_TO_REPORT(density) OVER()),3) AS "DENSITY %"
FROM (
SELECT
name,
volume_10_exp_10_km_exp_3 AS volume,
mean_density_kg_per_m_exp_3 AS density
FROM planets
UNION ALL
SELECT
name,
volume_10_exp_12_km_exp_3*100 AS volume,
mean_density_kg_per_m_exp_3 AS density
FROM stars
)
ORDER BY volume DESC;
NAME VOLUME VOLUME % DENSITY DENSITY %
__________ ____________ _____________ __________ ____________
Sun 141200000 99.8309151 1408 4.975
Jupiter 143128 0.101194 1326 4.685
Saturn 82713 0.0584796 687 2.427
Uranus 6833 0.0048311 1270 4.487
Neptune 6254 0.0044217 1638 5.787
Earth 108.321 0.0000766 5514 19.482
Venus 92.843 0.0000656 5243 18.525
Mars 16.312 0.0000115 3934 13.9
Mercury 6.083 0.0000043 5429 19.182
Pluto 0.702 0.0000005 1854 6.551
It should be no big surprise that the sun is by far the largest body in the solar system. What is interesting, however, is that it is not the densest. In the result above, you can actually somewhat guess which of the bodies are gas/ice giants and which ones are solid bodies.
Ok, to round up, one last one that also surprised me. Everybody knows that Saturn is the planet with the rings around it, yes? But did you know that it is not the only one? Turns out, there are a couple of planets that have rings:
SELECT name
FROM planets
WHERE has_ring_system='Y';
NAME
__________
Jupiter
Saturn
Uranus
Neptune
Final words
I hope that you enjoy the addition of our solar system data to the data sets just as much as I do.
A special Thanks goes to Dr. David R. Williams who not only responded to my initial request but also took the time to answer some of my amateur astronomy questions!