This list contains tools I have used before but do not use on a daily basis. You may want to check out my Daily Software.

Data Prep

Tabula

Tabula is a handy tool to extract data from PDF files.

If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can’t easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple web interface. website

Reporting

Tableau Public

Tableau, in my experience, is probably the fastest way to either map geographical data, or make a beautiful interactive dashboard. Just import a file or hookup a database connection and start to drag-and-drop data points onto plots.

Platforms

KNIME

I have used KNIME for basic data transformation, automate ETL pipelines, and build full reporting webapps. Unfortunately you would need to pay to host the KNIME server somewhere to take full advantage of web access to workflows, but you can still do everything else with the desktop version.

Programming

The above software was mostly basic drag-and-drop style. To really break into the data science world, it is recommended to learn a programming language. I recommend Python. See my notes on it below, and learn some skill site like codeacademy

Anaconda

Want to start with python? Just download Anaconda

With over 4.5 million users, Anaconda is the world’s most popular Python data science platform. Anaconda, Inc. continues to lead open source projects like Anaconda, NumPy and SciPy that form the foundation of modern data science. Anaconda’s flagship product, Anaconda Enterprise, allows organizations to secure, govern, scale and extend Anaconda to deliver actionable insights that drive businesses and industries forward. - website

Jupyter

If you installed Anaconda, you have Jupyter. Now just fire up a notebook from the command line with jupyter notebook, navigate to localhost:8888 and start programming, visualizing, and taking notes (e.g. markdown, html, latex etc.). For more see Jupyter Quickstart. Note, you can program in other languages besides Python by installing different kernals.

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. - website

Rodeo

Rodeo is a nice Python IDE, basically RStudio for Python.

Rodeo is a development environment that’s lightweight and intuitive, yet customizable to its core - your own personal home base for exploring and interpreting data. - website