High- Dimensional Data Visualization

Unfortunately our imagination sucks if you go beyond 3 dimensions.

Therefore for “high-dimensional data visualization” you can adjust one of two things, either the visualization or the data.

Adjusting the visualization:

You can use some of the techniques for high dimensional data visualization. You can use color, shape, size and other properties of 3D and 2D objects. This allows you to go further in high-dimensional visualization but still if you have more than 6 dimensions, the visualization might be hard to understand.

Then there are other visualization techniques for high dimensional data. Some visualization specifically designed for high dimensional data are:

Parallel Coordinates:

Glyphplot:

Andrew’s Plot:

Scatter-Plot Matrix:

or Arc-Diagram:

For this I strongly recommend to read this paper: A Tour Through the Visualization Zoo

Another solution if to change the data and rely on traditional 2D and 3D visualization techniques (e.g. scatterplot).

others are mentioned in answer by Lors Soren.

Adjusting the data:

Here you have basically 3 options:

  • Feature Selection
  • Feature Extraction
  • Manifold Learning

Feature Selection

Allows you to choose “best” features from your data, which you the visualize. “Best” could be defined based on you cost function. For example it could be “informativeness” of particular feature in case of Information Gain criterion.

Feature Extraction

Allows you to create new features and you can project your data into the space of these new features and then visualize them.

Here is example of feature extraction using popular and quite basic PCA method, which based on SVD chooses the direction of eigevectors correcponding to highest eigenvalues. Basically it chooses the direction on which to project data as the most informative one as first and the second direction it chooses in respect to 1. chosen (orthogonal to it) etc. for as many dimensions as you want.

Manifold Learning

One option was mentioned by William Chen , t-SNE. Which is very popular in widely used. Though others are worth mentioning, such as MDS or Spectral Embedding.

Here you can see comparison of different manifold learning techniques in reducing data from 3D to 2D. Many of them are working rally nicely also in higher dimensions.

 

Author: Aditya Bhuyan

I am an IT Professional with close to two decades of experience. I mostly work in open source application development and cloud technologies. I have expertise in Java, Spring and Cloud Foundry.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s