Principal Component Analysis PCA Reduction

PCA (Principal Component Analysis) reduction is a technique for shrinking high-dimensional vectors into fewer dimensions while preserving as much of the important information as possible.

So using PCA reduction will reduce the vectors from n, example 512 down to just 2 which you can then save to CSV. You can reduce to any number, example 3072 can be reduced to 384. Here we used 2 because its easy to pop into CSV

CSV Example Data

This can then be plotted on a graph using tools like https://app.flourish.studio/

Plotted Example Data

So although the x.y values by themself dont mean anything, once plotted on a graph we can see the grouping showing they are close together, so a meaning can be inferred.

Example Code

Using the values above create a collection of vectors with 512 dimensions and Open AIs text-embedding-3-small model.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
using Microsoft.Extensions.AI;
using OpenAI.Embeddings;

var apiKey = "REDACTED-KEY";
var vectors = new List<(string Word, float[] Vector)>();

var embedder = new EmbeddingClient(
model: "text-embedding-3-small",
apiKey: apiKey)
.AsIEmbeddingGenerator();

var words = new [] { "cat", "mouse", "lion", "tiger", "helicopter", "train", "blue", "carrot", "space" };
for (var i = 0; i < words.Length; i++)
{
var embedding = await embedder.GenerateAsync(
[words[i]],
new Microsoft.Extensions.AI.EmbeddingGenerationOptions
{
Dimensions = 512
});

var vector = embedding[0].Vector.ToArray();
vectors.Add((words[i], vector));
}

So now to pop these into the CSV above run through PcaCsvExporter

1
2
var csvPath = Chatbot.API.PcaCsvExporter.SaveReducedVectorsToCsv(vectors);
Console.WriteLine($"Saved PCA CSV to: {csvPath}");

The contents of the PCA process was 100% vibe coded but it worked so I moved on with my life :D

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
using System.Globalization;
using MathNet.Numerics.LinearAlgebra;

namespace Chatbot.API;

public static class PcaCsvExporter
{
public static string SaveReducedVectorsToCsv(
IReadOnlyList<(string Word, float[] Vector)> vectors,
string? outputDirectory = null)
{
if (vectors.Count < 2)
{
throw new ArgumentException("At least two vectors are required for PCA.", nameof(vectors));
}

var featureCount = vectors[0].Vector.Length;
if (featureCount < 2)
{
throw new ArgumentException("Vectors must have at least two dimensions.", nameof(vectors));
}

if (vectors.Any(item => item.Vector.Length != featureCount))
{
throw new ArgumentException("All vectors must have the same dimensionality.", nameof(vectors));
}

var sampleCount = vectors.Count;
var matrix = Matrix<double>.Build.Dense(sampleCount, featureCount);

for (var row = 0; row < sampleCount; row++)
{
for (var column = 0; column < featureCount; column++)
{
matrix[row, column] = vectors[row].Vector[column];
}
}

var centeredMatrix = CenterColumns(matrix);
var projected = ProjectToTwoDimensions(centeredMatrix);

var directory = outputDirectory ?? Directory.GetCurrentDirectory();
Directory.CreateDirectory(directory);

var filePath = Path.Combine(directory, $"{Guid.NewGuid()}.csv");
using var writer = new StreamWriter(filePath);

writer.WriteLine("Word,X,Y");

for (var row = 0; row < sampleCount; row++)
{
writer.WriteLine(string.Create(
CultureInfo.InvariantCulture,
$"{vectors[row].Word},{projected[row, 0]:0.00},{projected[row, 1]:0.00}"));
}

return filePath;
}

private static Matrix<double> CenterColumns(Matrix<double> matrix)
{
var centered = matrix.Clone();

for (var column = 0; column < centered.ColumnCount; column++)
{
var mean = centered.Column(column).Average();

for (var row = 0; row < centered.RowCount; row++)
{
centered[row, column] -= mean;
}
}

return centered;
}

private static Matrix<double> ProjectToTwoDimensions(Matrix<double> centeredMatrix)
{
var svd = centeredMatrix.Svd(computeVectors: true);
var rightSingularVectors = svd.VT.Transpose();
var principalComponents = rightSingularVectors.SubMatrix(0, rightSingularVectors.RowCount, 0, 2);

return centeredMatrix * principalComponents;
}
}

The resulting file was

1
2
3
4
5
6
7
8
9
10
Word,X,Y
cat,-0.37,0.22
mouse,0.06,0.29
lion,-0.23,0.33
tiger,-0.29,0.20
helicopter,-0.15,-0.48
train,-0.12,-0.43
blue,0.52,0.21
carrot,0.02,-0.32
space,0.56,-0.04