Features

Possible features for training, validation and test data.

src.features.features.POS(sentence: str)[source]

Returns Number of nouns, adjectives and verbs in a sentence

Parameters

sentence (str) – Sentence as string

Returns

Number of nouns adj (int): Number of adjectives vetbs (int): Number of verbs

Return type

nouns (int)

src.features.features.characters(sentence: str)[source]

Returns length of sentence, not considering whitespaces

Parameters

sentence (str) – Sentence as string

Returns

Length of sentence as int

Return type

(int)

src.features.features.cosine_similarity_score(embedding_1: list, embedding_2: list)[source]

Calculates cosine similarity between two word embeddings.

Parameters
  • embedding_1 (list) – List contaning word embedding

  • embedding_2 (list) – List contaning word embedding

Returns

Cosine similarity between two word embeddings

Return type

cosine_similarity (float)

src.features.features.difference(doc_count, query_count)[source]

Returns the absolut difference between doc_count and query_count

Parameters
  • doc_count (int) –

  • query_count (int) –

Returns

Absolut difference between doc_count and query_count

Return type

(int)

src.features.features.euclidean_distance_score(embedding_1: list, embedding_2: list)[source]

Calculates euclidean distance between two word embeddings.

Parameters
  • embedding_1 (list) – List contaning word embedding

  • embedding_2 (list) – List contaning word embedding

Returns

Euclidean distance between two word embeddings

Return type

euclidean_distances (float)

src.features.features.jaccard(token_vector_1: list, token_vector_2: list)[source]

Calculates jaccard coefficient between two lists of tokens

Parameters
  • token_vector_1 (list) – List contaning tokens

  • token_vector_2 (list) – List contaning tokens

Returns

Jaccard coefficient as float

Return type

(float)

src.features.features.manhattan_distance_score(embedding_1: list, embedding_2: list)[source]

Calculates manhatten distance between two word embeddings.

Parameters
  • embedding_1 (list) – List contaning word embedding

  • embedding_2 (list) – List contaning word embedding

Returns

Manhatten distance between two word embeddings

Return type

manhattan_distances (float)

src.features.features.polarisation(sentence: str)[source]

Returns the polarisation of a sentence

Parameters

sentence (str) – Sentence as string

Returns

float within the range [-1.0, 1.0]

Return type

polarisation (int)

src.features.features.relative_difference(doc_count, query_count)[source]

Returns the relative difference between doc_count and query_count

Parameters
  • doc_count (int) –

  • query_count (int) –

Returns

Relative difference between doc_count and query_count

Return type

(float)

src.features.features.subjectivity(sentence: str)[source]

Returns the subjectivity of a sentence

Parameters

sentence (str) – Sentence as string

Returns

float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective

Return type

subjectivity (int)

src.features.features.words(sentence: str)[source]

Returns number of words in a sentence

Parameters

sentence (str) – Sentence as string

Returns

Number of words in a sentence as int

Return type

(int)