Features
Possible features for training, validation and test data.
- src.features.features.POS(sentence: str)[source]
Returns Number of nouns, adjectives and verbs in a sentence
- Parameters
sentence (str) – Sentence as string
- Returns
Number of nouns adj (int): Number of adjectives vetbs (int): Number of verbs
- Return type
nouns (int)
- src.features.features.characters(sentence: str)[source]
Returns length of sentence, not considering whitespaces
- Parameters
sentence (str) – Sentence as string
- Returns
Length of sentence as int
- Return type
(int)
- src.features.features.cosine_similarity_score(embedding_1: list, embedding_2: list)[source]
Calculates cosine similarity between two word embeddings.
- Parameters
embedding_1 (list) – List contaning word embedding
embedding_2 (list) – List contaning word embedding
- Returns
Cosine similarity between two word embeddings
- Return type
cosine_similarity (float)
- src.features.features.difference(doc_count, query_count)[source]
Returns the absolut difference between doc_count and query_count
- Parameters
doc_count (int) –
query_count (int) –
- Returns
Absolut difference between doc_count and query_count
- Return type
(int)
- src.features.features.euclidean_distance_score(embedding_1: list, embedding_2: list)[source]
Calculates euclidean distance between two word embeddings.
- Parameters
embedding_1 (list) – List contaning word embedding
embedding_2 (list) – List contaning word embedding
- Returns
Euclidean distance between two word embeddings
- Return type
euclidean_distances (float)
- src.features.features.jaccard(token_vector_1: list, token_vector_2: list)[source]
Calculates jaccard coefficient between two lists of tokens
- Parameters
token_vector_1 (list) – List contaning tokens
token_vector_2 (list) – List contaning tokens
- Returns
Jaccard coefficient as float
- Return type
(float)
- src.features.features.manhattan_distance_score(embedding_1: list, embedding_2: list)[source]
Calculates manhatten distance between two word embeddings.
- Parameters
embedding_1 (list) – List contaning word embedding
embedding_2 (list) – List contaning word embedding
- Returns
Manhatten distance between two word embeddings
- Return type
manhattan_distances (float)
- src.features.features.polarisation(sentence: str)[source]
Returns the polarisation of a sentence
- Parameters
sentence (str) – Sentence as string
- Returns
float within the range [-1.0, 1.0]
- Return type
polarisation (int)
- src.features.features.relative_difference(doc_count, query_count)[source]
Returns the relative difference between doc_count and query_count
- Parameters
doc_count (int) –
query_count (int) –
- Returns
Relative difference between doc_count and query_count
- Return type
(float)