Trying and failing to interpret sentence embeddings
I was born with congenital anosmia i.e. I cannot and have never been able to smell. Farts, flowers, cookies, and perfume; I have no personal experience of any of these smells. Yet, I can tell you that farts take over a room and cookies smell like home. This is all picked up through context. It’s picked up from watching my friends retching at the stench of the small animal that died in the vents of my middle school. For me, a smell is defined by its relation to other smells and the emotive descriptions of others. This is not altogether different from a sentence embedding.
I want to eventually build a system that can help me interpret smells. Now, I know I could ask an LLM (or a friend) to describe the smell of something but I do wonder if vector addition could provide some unexpected insights. What smells are quite similar but distant in context? I’d also like to try using reduced vectors to generate music or some other synesthetic output.
In this post, I’m going to explore vector addition and vector rotations as a means of modifying and interpreting these embeddings. My explorations are (mostly) a failure although hopefully, my process might save someone else some time.
If you have any ideas or corrections, please email me at ted@timbrell.dev
Background
My inspiration for this comes from Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Except in this case, I’m using sentence embeddings rather than word embeddings. If I want to embed smells, I’ll need to be able to input something like, “the smell of red wine.”
from openai import OpenAI
import numpy as np
import heapq
import pandas as pd
import os
openai_client = OpenAI()
def get_embedding_openai(names):
response = openai_client.embeddings.create(
input=names, model="text-embedding-3-small"
)
return np.array([d.embedding for d in response.data])
king, queen, man, woman, prince, princess = get_embedding_openai(
[
"king of england",
"queen of england",
"man",
"woman",
"prince of england",
"princess of england",
]
)
son, daughter, actor, actress, steward, stewardess = get_embedding_openai([
"son",
"daughter",
"actor",
"actress",
"steward",
"stewardess",
])
Q: Wait aren’t you supposed to be using smells?
It’s pretty hard to reason about something you can’t experience. I might know fresh coffee smells good in the morning but I can’t tell how similar/dissimilar that is to the smell of dew in the morning. I’ll get to smells in a later post.
I’m already making a jump from word embeddings to sentence embeddings so I believe it’s worth revisiting gender before moving on. It also has the benefit of being easy to generate examples for and is built into the English language. I’ll be using cosine similarity and Euclidean distance to get a sense of the distance between the vectors.
def cosine_similarity(vec1, vec2):
vec1 = np.array(vec1)
vec2 = np.array(vec2)
dot_product = np.dot(vec1, vec2)
magnitude_vec1 = np.linalg.norm(vec1)
magnitude_vec2 = np.linalg.norm(vec2)
if magnitude_vec1 == 0 or magnitude_vec2 == 0:
return 0.0
return dot_product / (magnitude_vec1 * magnitude_vec2)
def euc_dist(a, b):
return sum(abs(a - b))
Simple example: vector offsets and addition
Let’s try to get the vector for “King” from the vector for “Queen”.
male_offset = man - woman
added_queen = queen + male_offset
print(f"{cosine_similarity(king, queen)=}")
print(f"{cosine_similarity(king, added_queen)=}")
cosine_similarity(king, queen)=np.float64(0.7561968293567973)
cosine_similarity(king, added_queen)=np.float64(0.7436281583952487)
print(f"{euc_dist(king, queen)=}")
print(f"{euc_dist(king, added_queen)=}")
euc_dist(king, queen)=np.float64(21.68610072977549)
euc_dist(king, added_queen)=np.float64(23.88454184561374)
Well, that’s annoying. Unlike what I’d expect from the word embedding paper, the vector for “Queen” plus the gender offset is further away from the vector for “King” both in angle and Euclidean distance.
I’m also surprised by just how little the similarity metrics moved. Then again, the geometry is unclear here. The vector offset might be going in the wrong direction or under/overshooting.
f"man - woman offset magnitude: {np.linalg.norm(male_offset)}", f"King - queen offset magnitude {np.linalg.norm(king - queen)}"
('man - woman offset magnitude: 0.7648199511941038',
'King - queen offset magnitude 0.6982881772713763')
So it’s moving, roughly, the same distance as it would need to reach the “king” vector.
f"{np.arccos(cosine_similarity(added_queen, queen))} radians between added_queen and queen"
f"{np.arccos(cosine_similarity(king, queen))} radians between king and queen"
'0.7318444725115928 radians between added_queen and queen'
'0.7133150836556438 radians between king and queen'
And changing the angle by roughly the same amount as expected… just not in the right direction.
Let’s take a look at the cosine similarity between these gendered offsets
gender_vectors = [
man - woman,
king - queen,
prince - princess,
son - daughter,
actor - actress,
steward - stewardess,
]
for idx in range(len(gender_vectors)):
gender_vectors[idx] /= np.linalg.norm(gender_vectors[0])
res = np.zeros(shape=(len(gender_vectors), len(gender_vectors)))
for r in range(len(gender_vectors)):
for c in range(len(gender_vectors)):
res[r, c] = cosine_similarity(gender_vectors[r], gender_vectors[c])
pd.DataFrame(res)
0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
0 | 1.000000 | 0.455847 | 0.438728 | 0.244890 | 0.276235 | 0.214461 |
1 | 0.455847 | 1.000000 | 0.657469 | 0.222574 | 0.386973 | 0.355544 |
2 | 0.438728 | 0.657469 | 1.000000 | 0.229321 | 0.337566 | 0.385467 |
3 | 0.244890 | 0.222574 | 0.229321 | 1.000000 | 0.154418 | 0.111242 |
4 | 0.276235 | 0.386973 | 0.337566 | 0.154418 | 1.000000 | 0.232568 |
5 | 0.214461 | 0.355544 | 0.385467 | 0.111242 | 0.232568 | 1.000000 |
Despite the thought that these are just gendered versions of the same concept… the offsets point in quite different directions. son - daughter
differs from steward - stewardess
by 1.47 radians (or 84 degrees).
Rotation
I’m not up to date on research into embeddings but I find the use of vector addition for these analyses odd. I know that these vectors are generated through a series of additions and activations but if these models are normalizing everything to a unit vector and comparing everything with cosine similarity are we not inherently saying that it’s the angles that matter?
To that end, what if I rotate the “queen” vector along the plane created by the “man” and “woman” vectors? The vectors for “king” and “queen” have to be offset from our vectors for “man” and “woman”. Sentence embeddings capture more concepts than just the N dimensions represented in the vector. So we know that some correlation between axises is required to encode everything. A rotation, while more expensive, could help in the case of an angular difference between the initial vector pair and the compared vector pair.
Below, I try rotating our “queen” vector with the rotation matrix found from getting to “man” from “woman”.
def compute_nd_rotation_matrix(a, b):
a_norm = a / np.linalg.norm(a)
b_norm = b / np.linalg.norm(b)
cos_theta = np.dot(a_norm, b_norm)
cos_theta = np.clip(cos_theta, -1.0, 1.0)
angle = np.arccos(cos_theta)
v = b_norm - np.dot(b_norm, a_norm) * a_norm
v_norm = np.linalg.norm(v)
if v_norm < 1e-8: # a and b are collinear
return np.eye(len(a)),
v = v / v_norm
identity = np.eye(len(a))
outer_aa = np.outer(a_norm, a_norm)
outer_av = np.outer(a_norm, v)
outer_va = np.outer(v, a_norm)
outer_vv = np.outer(v, v)
R = (
identity
+ np.sin(angle) * (outer_va - outer_av)
+ (np.cos(angle) - 1) * (outer_vv + outer_aa)
)
return R, angle
gender_rotation, gender_angle = compute_nd_rotation_matrix(woman, man)
rotated_queen = np.dot(gender_rotation, queen)
def highlight_max(s):
is_max = s == s.max()
return ["font-weight: bold" if v else "" for v in is_max]
def highlight_min(s):
is_min = s == s.min()
return ["font-weight: bold" if v else "" for v in is_min]
def compute_results(*, target, source, offset, rotation):
target_norm = target / np.linalg.norm(target)
source_norm = source / np.linalg.norm(source)
added_source = source_norm + offset
added_source /= np.linalg.norm(added_source)
rotated_source = np.dot(rotation, source_norm)
rotated_vector_metrics = {
"cosine_similarity": cosine_similarity(target_norm, rotated_source),
"euclidean_distance": euc_dist(target_norm, rotated_source),
}
summed_vector_metrics = {
"cosine_similarity": cosine_similarity(target_norm, added_source),
"euclidean_distance": euc_dist(target_norm, added_source),
}
original_vector_metrics = {
"cosine_similarity": cosine_similarity(target_norm, source_norm),
"euclidean_distance": euc_dist(target_norm, source_norm),
}
df = pd.DataFrame(
{
"Original Vector": original_vector_metrics,
"Summed Vector": summed_vector_metrics,
"Rotated Vector": rotated_vector_metrics,
}
).T
return df
def style_results(df):
styled_df = df.style.apply(highlight_max, subset=["cosine_similarity"])
styled_df.apply(highlight_min, subset=["euclidean_distance"])
return styled_df
style_results(
compute_results(
target=king, source=queen, offset=male_offset, rotation=gender_rotation
)
)
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.756197 | 21.686100 |
Summed Vector | 0.743628 | 22.333814 |
Rotated Vector | 0.800727 | 19.759377 |
np.arccos(0.756197)- np.arccos(0.800727)
np.float64(0.07102636138332474)
The rotation helps! Though, it only moves the vector 0.07 radians (4 degrees) closer.
Let’s try with other gendered titles.
Below I try with “prince” and “princess”,
style_results(compute_results(target=prince, source=princess, rotation=gender_rotation, offset=male_offset))
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.798122 | 19.734372 |
Summed Vector | 0.752733 | 21.830848 |
Rotated Vector | 0.831910 | 17.951380 |
This yields similar results, although this is just another title for royalty.
style_results(compute_results(target=son, source=daughter, rotation=gender_rotation, offset=male_offset))
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.506902 | 30.924019 |
Summed Vector | 0.505972 | 31.099625 |
Rotated Vector | 0.519920 | 30.628173 |
style_results(compute_results(target=actor, source=actress, rotation=gender_rotation, offset=male_offset))
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.618884 | 27.222238 |
Summed Vector | 0.565936 | 29.213596 |
Rotated Vector | 0.644523 | 26.165533 |
style_results(compute_results(target=steward, source=stewardess, rotation=gender_rotation, offset=male_offset))
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.753096 | 21.497546 |
Summed Vector | 0.633975 | 26.397621 |
Rotated Vector | 0.760474 | 21.201120 |
I have two takeaways here, 1) that the summed vector is worse in every pairing and 2) that the rotated vector is encoding some aspect of gender (but the improvement is quite small). Let’s explore each of these.
1. The summed vector is further from the target than the original vector of all title pairs.
I find this result suspicious. Let’s try scaling the offset vector to see if I can get a better result.
from scipy.optimize import minimize
def objective(k, source, offset, target):
adjusted_vector = source + k * offset
return -cosine_similarity(adjusted_vector, target)
options = []
for target, source in [
(king, queen),
(prince, princess),
(son, daughter),
(actor, actress),
(steward, stewardess),
]:
initial_k = 0.0
result = minimize(objective, initial_k, args=(source, male_offset, target))
optimal_k = result.x[0]
options.append(optimal_k)
print(optimal_k)
average_k = sum(options) / len(options)
print(f"Average K: {average_k}")
0.4442216918664982
0.3758034405295738
0.45508087793089264
0.3248524288805894
0.18004962701645405
Average K: 0.3560016132448016
Above, I’m printing the individual best-fit scalar modifier for our gender offset vector for each pair. We can see it’s overshooting in every case.
For simplicity, let’s average the optimal scalar and recompute the similarity stats. This is not optimal as I should be minimizing on the batch and then testing on out-of-sample data.
Trivial to say; that a singular, consistent magnitude for the offset would have been nice. In the case where we don’t have a known target, that offset would allow us to naively add/subtract the gender offset to a source vector and have confidence in its meaning.
for target, source in [
(king, queen),
(prince, princess),
(son, daughter),
(actor, actress),
(steward, stewardess),
]:
display(style_results(compute_results(target=target, source=source, rotation=gender_rotation, offset=male_offset * average_k)))
print()
Queen -> King | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.756197 | 21.686100 |
Summed Vector | 0.801380 | 19.744261 |
Rotated Vector | 0.800727 | 19.759377 |
Princess -> Prince | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.798122 | 19.734372 |
Summed Vector | 0.833044 | 18.017618 |
Rotated Vector | 0.831910 | 17.951380 |
Daughter -> Son | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.506902 | 30.924019 |
Summed Vector | 0.537458 | 29.976789 |
Rotated Vector | 0.519920 | 30.628173 |
Actress -> Actor | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.618884 | 27.222238 |
Summed Vector | 0.638717 | 26.398096 |
Rotated Vector | 0.644523 | 26.165533 |
Stewardess -> Steward | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.753096 | 21.497546 |
Summed Vector | 0.753192 | 21.533765 |
Rotated Vector | 0.760474 | 21.201120 |
There we go! Our summed vector is now better or at least matches our original vector’s similarity. The summed vector now also matches the performance of the rotated vector, although it required an additional optimization step and K chosen in-sample.
Our largest outlier pair when optimizing for our scalar K was “steward” and “stewardess”. The optimal scalar K for that pair is half the average. Still, we the addition does no harm in terms of distance from the target. Though, we see the rotated vector makes progress in approaching the target vector.
I recognize that only using the “man” and “woman” vectors to generate the offset is a bit silly. Using a broader collection of gendered words, sentences, titles, etc., and averaging them to create average “man” and “woman” vectors before taking the offset is best practice. However, I’m going to have limited data once I get to smells so I’m trying to keep this simple.
Now that I’ve resolved the issue with vector addition, let’s go back to rotations!
2. The rotated vector is closer to the target!
The rotated vector is helping a little but isn’t closing much of the gap between the vectors. As we saw earlier, we’re rotating by roughly the correct amount, but on the wrong plane.
I think I’m encoding some concept of gender in the rotation but am I accounting for it entirely? While I might say there are no differences between a King and a Queen the studies on bias in LLMs show us that isn’t the case. I expect some difference in the transformed vectors no matter what naive transformations are performed. Gender stereotypes encoded into the embedding of “prince of England” might not be encoded into the general embedding for “man” (or are lessened through averaging). So, is the remaining distance due to other features/meanings or am I failing to account for general aspects of gender?
To start with, let’s make this a fair comparison with the offset and optimize the angle (magnitude) of rotation. After all, my hypothesis is that the two-dimensional plane can be treated as a feature, and its angle as a magnitude.
A scalar product for our angle wouldn’t be all that interpretable so instead, I’ll optimize for the angle of rotation directly.
def compute_nd_rotation_matrix(a, b, angle=None):
a_norm = a / np.linalg.norm(a)
b_norm = b / np.linalg.norm(b)
cos_theta = np.dot(a_norm, b_norm)
cos_theta = np.clip(cos_theta, -1.0, 1.0)
if angle is None:
angle = np.arccos(cos_theta)
v = b_norm - np.dot(b_norm, a_norm) * a_norm
v_norm = np.linalg.norm(v)
if v_norm < 1e-8: # a and b are collinear
return np.eye(len(a)),
v = v / v_norm
identity = np.eye(len(a))
outer_aa = np.outer(a_norm, a_norm)
outer_av = np.outer(a_norm, v)
outer_va = np.outer(v, a_norm)
outer_vv = np.outer(v, v)
R = (
identity
+ np.sin(angle) * (outer_va - outer_av)
+ (np.cos(angle) - 1) * (outer_vv + outer_aa)
)
return R, angle
def objective(m, source, base_source, base_target , target):
R, angle = compute_nd_rotation_matrix(base_source, base_target, m)
adjusted_vector = np.dot(R, source)
return -cosine_similarity(adjusted_vector, target)
options = []
for target, source in [
(king, queen),
(prince, princess),
(son, daughter),
(actor, actress),
(steward, stewardess),
]:
initial_m = gender_angle
result = minimize(objective, initial_m, args=(source, woman, man, target))
optimal_m = result.x[0]
options.append(optimal_m)
print(optimal_m)
print()
average_m = sum(options) / len(options)
print(f"Average Optimized Angle: {average_m} (radians)")
print(f"Orginal Angle: {gender_angle} (radians)")
print(f"Difference: {abs(average_m - gender_angle)} radians, {np.rad2deg(abs(average_m - gender_angle))} degrees")
1.1203259132974637
1.0205728689773772
0.4497845291476582
0.5237429339776948
0.4732922638975175
Average Optimized Angle: 0.7175437018595423 (radians)
Orginal Angle: 0.7848061954720583 (radians)
Difference: 0.06726249361251602 radians, 3.8538570035228257 degrees
For idiots like myself, this makes the range for the optimal angle [26, 63] degrees. A lot like our offset, it would have been nice if this range was small.
The angle between the “man” and “woman” vectors is quite similar to the average of our optimized angles on our pairs! The average optimized angle is 90% of the original angle (only three degrees off!) whereas the optimized offset magnitude is 34% of the original magnitude. If this were to hold for concepts other than gender, rotations might be easier to work with.
Let’s check how the optimized similarity metrics performed with our optimized angle…
for target, source in [
(king, queen),
(prince, princess),
(son, daughter),
(actor, actress),
(steward, stewardess),
]:
optimized_rotation_df = compute_results(
target=target,
source=source,
rotation=compute_nd_rotation_matrix(woman, man, average_m)[0],
offset=male_offset * average_k
).T
df = compute_results(
target=target,
source=source,
rotation=gender_rotation,
offset=male_offset * average_k
).T
df["Optimized Rotated Vector"] = optimized_rotation_df["Rotated Vector"]
display(style_results(df.T))
print()
Queen -> King | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.756197 | 21.686100 |
Summed Vector | 0.801380 | 19.744261 |
Rotated Vector | 0.800727 | 19.759377 |
Optimized Rotated Vector | 0.798603 | 19.853226 |
Princess -> Prince | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.798122 | 19.734372 |
Summed Vector | 0.833044 | 18.017618 |
Rotated Vector | 0.831910 | 17.951380 |
Optimized Rotated Vector | 0.830565 | 18.004223 |
Daughter -> Son | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.506902 | 30.924019 |
Summed Vector | 0.537458 | 29.976789 |
Rotated Vector | 0.519920 | 30.628173 |
Optimized Rotated Vector | 0.525843 | 30.419578 |
Actress -> Actor | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.618884 | 27.222238 |
Summed Vector | 0.638717 | 26.398096 |
Rotated Vector | 0.644523 | 26.165533 |
Optimized Rotated Vector | 0.648405 | 26.021171 |
Stewardess -> Steward | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.753096 | 21.497546 |
Summed Vector | 0.753192 | 21.533765 |
Rotated Vector | 0.760474 | 21.201120 |
Optimized Rotated Vector | 0.762675 | 21.091211 |
Welp, small_rotation == small_similarity_change
, is hardly surprising.
Similarity improves in the three pairs with a smaller optimal rotation angle and loses in the two pairs with a larger optimal rotation angle.
Running out of steam
The theory behind using rotations rather than offsets is that the offset pushes the vector off the spherical geometry of the vector embedding on a tangent that might not make sense for the initial vector… If a rotation makes sense, the plane it rotates on should be similar for all pairs.
… so are the planes of rotation similar?
def compute_plane_similarity(A, B, C, D):
def orthonormal_basis(vec1, vec2):
vec1_norm = vec1 / np.linalg.norm(vec1)
vec2_proj = vec2 - np.dot(vec2, vec1_norm) * vec1_norm
vec2_norm = vec2_proj / np.linalg.norm(vec2_proj)
return np.stack([vec1_norm, vec2_norm], axis=1)
plane1 = orthonormal_basis(A, B)
plane2 = orthonormal_basis(C, D)
M = np.dot(plane1.T, plane2)
_, singular_values, _ = np.linalg.svd(M)
angles = np.arccos(np.clip(singular_values, -1.0, 1.0))
total = 1
for angle in angles:
total *= np.cos(angle)
return total
pairs = [
(man, woman),
(king, queen),
(prince, princess),
(son, daughter),
(actor, actress),
(steward, stewardess),
]
res = np.zeros(shape=(len(gender_vectors), len(gender_vectors)))
for r in range(len(gender_vectors)):
for c in range(len(gender_vectors)):
target_a, source_a = pairs[r]
target_b, source_b = pairs[c]
res[r, c] = compute_plane_similarity(source_a, target_a, source_b, target_b)
pd.DataFrame(res)
0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
0 | 1.000000 | 0.120395 | 0.106358 | 0.149667 | 0.162984 | 0.082681 |
1 | 0.120395 | 1.000000 | 0.555729 | 0.059285 | 0.092489 | 0.097893 |
2 | 0.106358 | 0.555729 | 1.000000 | 0.060170 | 0.077986 | 0.099463 |
3 | 0.149667 | 0.059285 | 0.060170 | 1.000000 | 0.072818 | 0.037598 |
4 | 0.162984 | 0.092489 | 0.077986 | 0.072818 | 1.000000 | 0.091440 |
5 | 0.082681 | 0.097893 | 0.099463 | 0.037598 | 0.091440 | 1.000000 |
Nope, they are not aligned at all… It would have been great to have checked this first.
Does rotation work for other concepts?
for target, source, pairs in [
(
"parent",
"child",
[
("father", "son"),
("mother", "daughter"),
("dog", "puppy"),
],
),
(
"Group",
"Individual",
[
("a people", "a person"),
("nation", "citizen"),
],
),
(
"Positive",
"Negative",
[
("happy", "sad"),
("love", "hate"),
("success", "failure"),
("hot", "cold"),
("light", "dark"),
],
),
(
"Plural noun",
"Singular noun",
[
("people", "person"),
("children", "child"),
("mice", "mouse"),
],
),
]:
target_v, source_v = get_embedding_openai([target, source])
pair_vecs = [get_embedding_openai(pair) for pair in pairs]
rotation, angle = compute_nd_rotation_matrix(source_v, target_v)
offset = target_v - source_v
print(f"Processing {source} to {target}")
for t, s in pair_vecs:
display(
style_results(
compute_results(target=t, source=s, rotation=rotation, offset=offset)
)
)
print()
print()
print()
Processing child to parent
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.468058 | 31.761523 |
Summed Vector | 0.543565 | 29.534604 |
Rotated Vector | 0.443090 | 32.606579 |
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.669960 | 25.355998 |
Summed Vector | 0.648269 | 25.751596 |
Rotated Vector | 0.644994 | 25.854934 |
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.559153 | 29.110067 |
Summed Vector | 0.353922 | 35.056743 |
Rotated Vector | 0.464196 | 31.915750 |
Processing Individual to Group
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.687920 | 24.467766 |
Summed Vector | 0.549779 | 29.304673 |
Rotated Vector | 0.648763 | 26.148103 |
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.524459 | 30.635595 |
Summed Vector | 0.316418 | 36.517791 |
Rotated Vector | 0.442958 | 33.354509 |
Processing Negative to Positive
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.609937 | 27.556616 |
Summed Vector | 0.660815 | 25.557532 |
Rotated Vector | 0.657954 | 25.656030 |
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.373151 | 34.858771 |
Summed Vector | 0.388448 | 33.971496 |
Rotated Vector | 0.402705 | 33.769119 |
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.597542 | 27.482831 |
Summed Vector | 0.618126 | 27.209885 |
Rotated Vector | 0.632441 | 26.262374 |
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.554468 | 29.435530 |
Summed Vector | 0.514706 | 30.383623 |
Rotated Vector | 0.565492 | 28.948688 |
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.448411 | 32.322260 |
Summed Vector | 0.458061 | 32.072349 |
Rotated Vector | 0.480533 | 31.395405 |
Processing Singular noun to Plural noun
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.847886 | 17.133105 |
Summed Vector | 0.720367 | 23.143023 |
Rotated Vector | 0.839824 | 17.431185 |
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.832203 | 18.139276 |
Summed Vector | 0.671063 | 25.081717 |
Rotated Vector | 0.825308 | 18.384035 |
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.561507 | 28.809626 |
Summed Vector | 0.453895 | 32.022517 |
Rotated Vector | 0.559407 | 28.808987 |
Useful maybe?
From applying this rotation strategy to other concepts, we can see that for all but “Positive/Negative”, these rotations hurt more than they help (as do the offsets).
Is it possible that positive/negative and gender are so baked into language that they have a special place in the embedding? It is more likely that I’m simply reading the tea leaves.
While this exploration was unsuccessful, I’m glad I did it. It’s good to know that the vector addition interpretations for word embeddings don’t necessarily apply to sentence embeddings in a way that is immediately interpretable.
Limitations
- We don’t know if these learnings apply to all models. From brief testing, the rotation for gender falls apart when using “all-MiniLM-L6-v2” on HuggingFace.
- Perhaps averaged vectors could lead to substantially better results. From brief testing, this resulted in fewer cases where the vector was further away from the target but wasn’t a silver bullet.