Some interesting things I learned today on my road to a successful career in data science:
- Correlation does not imply causation.
- If X predicts Y it does not mean that X causes Y.
- Prediction is hard, especially about the future.
- Causal relationships are usually identified as average effects, but may not apply to every individual.
- The most important thing in data science is the question, the second most important is the data.
- Often the data will limit or enable the questions, but having data alone won't save you if you don't have a question.
- Beware of data dredging.
"The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data... no matter how big the data are."