Generator

Functionality to generate features.

src.features.generator.create_BM2_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame)[source]

Creates BM25 features for query-collection combinations

Parameters
  • features (pd.DataFrame) – Dataframe containing feature data

  • collection (pd.DataFrame) – Dataframe containing collection data

  • queries (pd.DataFrame) – Dataframe containing queries data

Returns

Dataframe “features” with new column “bm25” appended

Return type

features (pd.DataFrame)

src.features.generator.create_POS_features(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame)[source]

Creates Part of Speech features for query and collection data (nouns, adjectives, verbs)

Parameters
  • features (pd.DataFrame) – Dataframe containing feature data

  • collection (pd.DataFrame) – Dataframe containing collection data

  • queries (pd.DataFrame) – Dataframe containing queries data

Returns

Dataframe “features” with new columns “doc_nouns”, “doc_adjectives”, “doc_verbs”, “query_nouns”, “query_adjectives”, “query_verbs” appended

Return type

features (pd.DataFrame)

src.features.generator.create_all(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, tfidf=None, glove=None, bert=None, w2v=None)[source]
Creates all implemented embeddings (bert, glove, tfidf, word2vec)

and features (cosine, euclidean, manhattan, jaccard, sentence, interpretation, BM25, POS).

Parameters
  • features (pd.DataFrame) – Dataframe containing feature data

  • collection (pd.DataFrame) – Dataframe containing collection data

  • queries (pd.DataFrame) – Dataframe containing query data

  • tfidf (TFIDF object) – Creates new object of class tfidf if None

  • glove (Glove object) – Creates new object of class Glove if None

  • bert (Bert object) – Creates new object of class Bert if None

  • w2v (word2vec object) – Creates new object of class word2vec if None

Returns

Dataframe containing feature data

Return type

features (pd.DataFrame)

src.features.generator.create_bert_embeddings(data: pandas.core.frame.DataFrame, bert=None, name: str = '')[source]

Creates bert embeddings

Parameters
  • data (pd.DataFrame) – Dataframe containing data to be embedded

  • bert (str) – Creates new object of class Bert if None

  • name (str) – Adds string to name of the .pkl file created and stored of the data Dataframe

Returns

Object of class Bert data (pd.DataFrame): Dataframe data with new column “preprocessed” appended

Return type

bert (Bert object)

src.features.generator.create_bert_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, path_collection: str = 'data/embeddings/bert_collection_embeddings.pkl', path_query: str = 'data/embeddings/bert_query_embeddings.pkl')[source]

Creates bert features (cosine, euclidean, manhattan)

Parameters
  • features (pd.DataFrame) – Dataframe containing feature data

  • collection (pd.DataFrame) – Dataframe containing collection data

  • queries (pd.DataFrame) – Dataframe containing queries data

  • path_collection (str) – Path to “bert_collection_embeddings.pkl” file

  • path_query (str) – Path to “bert_query_embeddings.pkl” file

Returns

Dataframe “features” with new columns “bert_cosine”, “bert_euclidean”, “bert_manhattan” appended

Return type

features (pd.DataFrame)

src.features.generator.create_glove_embeddings(data: pandas.core.frame.DataFrame, glove=None, name: str = '')[source]

Creates glove embeddings

Parameters
  • data (pd.DataFrame) – Dataframe containing data to be embedded

  • glove (str) – Creates new object of class Glove if None

  • name (str) – Adds string to name of the .pkl file created and stored of the data Dataframe

Returns

Object of class Glove data (pd.DataFrame): Dataframe data with new column “preprocessed” appended

Return type

glove (Glove object)

src.features.generator.create_glove_embeddings_tf_idf_weighted(data: pandas.core.frame.DataFrame, glove=None, name: str = '')[source]

Creates tfidf weighted glove embeddings

Parameters
  • data (pd.DataFrame) – Dataframe containing data to be embedded

  • glove (str) – Creates new object of class Glove if None

  • name (str) – Adds string to name of the .pkl file created and stored of the data Dataframe

Returns

Object of class Glove data (pd.DataFrame): Dataframe data with new column “glove_tfidf” appended

Return type

glove (Glove object)

src.features.generator.create_glove_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, path_collection: str = 'data/embeddings/glove_collection_embeddings.pkl', path_query: str = 'data/embeddings/glove_query_embeddings.pkl')[source]

Creates glove features (cosine, euclidean, manhattan)

Parameters
  • features (pd.DataFrame) – Dataframe containing feature data

  • collection (pd.DataFrame) – Dataframe containing collection data

  • queries (pd.DataFrame) – Dataframe containing queries data

  • path_collection (str) – Path to “glove_collection_embeddings.pkl” file

  • path_query (str) – Path to “glove_query_embeddings.pkl” file

Returns

Dataframe “features” with new columns “glove_cosine”, “glove_euclidean”, “glove_manhattan” appended

Return type

features (pd.DataFrame)

src.features.generator.create_interpretation_features(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame)[source]

Creates interpretation features for query and collection data (subjectivity, polarity)

Parameters
  • features (pd.DataFrame) – Dataframe containing feature data

  • collection (pd.DataFrame) – Dataframe containing collection data

  • queries (pd.DataFrame) – Dataframe containing queries data

Returns

Dataframe “features” with new columns “subjectivity_doc”, “polarity_doc”, “subjectivity_query”, “polarity_query” appended

Return type

features (pd.DataFrame)

src.features.generator.create_jaccard_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame)[source]

Creates jaccard features for query-collection combinations

Parameters
  • features (pd.DataFrame) – Dataframe containing feature data

  • collection (pd.DataFrame) – Dataframe containing collection data

  • queries (pd.DataFrame) – Dataframe containing queries data

Returns

Dataframe “features” with new column “jaccard” appended

Return type

features (pd.DataFrame)

src.features.generator.create_sentence_features(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame)[source]

Creates sentence features for query-collection combinations (words_difference, words_rel_difference, char_difference, char_rel_difference)

Parameters
  • features (pd.DataFrame) – Dataframe containing feature data

  • collection (pd.DataFrame) – Dataframe containing collection data

  • queries (pd.DataFrame) – Dataframe containing queries data

Returns

Dataframe “features” with new columns “words_doc”, “words_query”, “words_difference”, “words_rel_difference” “char_doc”, “char_query”, “char_difference”, “char_rel_difference” appended

Return type

features (pd.DataFrame)

src.features.generator.create_tfidf_embeddings(data: pandas.core.frame.DataFrame, tfidf=None, name: str = '')[source]

Creates tfidf embeddings

Parameters
  • data (pd.DataFrame) – Dataframe containing data to be embedded

  • tfidf (str) – Creates new object of class tfidf if None

  • name (str) – Adds string to name of the .pkl file created and stored of the data data frame

Returns

Object of class TFIDF data (pd.DataFrame): Dataframe data with new column “preprocessed” appended

Return type

tfidf (TFIDF object)

src.features.generator.create_tfidf_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, path_collection: str = 'data/embeddings/tfidf_collection_embeddings.pkl', path_query: str = 'data/embeddings/tfidf_query_embeddings.pkl')[source]

Creates tfidf features (cosine, euclidean, manhattan)

Parameters
  • features (pd.DataFrame) – Dataframe containing feature data

  • collection (pd.DataFrame) – Dataframe containing collection data

  • queries (pd.DataFrame) – Dataframe containing queries data

  • path_collection (str) – Path to “tfidf_collection_embeddings.pkl” file

  • path_query (str) – Path to “tfidf_query_embeddings.pkl” file

Returns

Dataframe “features” with new columns “tfidf_cosine”, “tfidf_euclidean”, “tfidf_manhattan” appended

Return type

features (pd.DataFrame)

src.features.generator.create_w2v_embeddings(data: pandas.core.frame.DataFrame, w2v=None, name: str = '')[source]

Creates word2vec embeddings

Parameters
  • data (pd.DataFrame) – Dataframe containing data to be embedded

  • w2v (str) – Creates new object of class word2vec if None

  • name (str) – Adds string to name of the .pkl file created and stored of the data Dataframe

Returns

Object of class word2vec data (pd.DataFrame): Dataframe data with new column “preprocessed” appended

Return type

w2v (word2vec object)

src.features.generator.create_w2v_embeddings_tf_idf_weighted(data: pandas.core.frame.DataFrame, w2v=None, name: str = '')[source]

Creates weighted tfidf word2vec embeddings

Parameters
  • data (pd.DataFrame) – Dataframe containing data to be embedded

  • w2v (str) – Creates new object of class word2vec if None

  • name (str) – Adds string to name of the .pkl file created and stored of the data Dataframe

Returns

Object of class word2vec data (pd.DataFrame): Dataframe data with new column “w2v_tfidf” appended

Return type

w2v (word2vec object)

src.features.generator.create_w2v_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, path_collection: str = 'data/embeddings/w2v_collection_embeddings.pkl', path_query: str = 'data/embeddings/w2v_query_embeddings.pkl')[source]

Creates word2vec features (cosine, euclidean, manhattan)

Parameters
  • features (pd.DataFrame) – Dataframe containing feature data

  • collection (pd.DataFrame) – Dataframe containing collection data

  • queries (pd.DataFrame) – Dataframe containing queries data

  • path_collection (str) – Path to “w2v_collection_embeddings.pkl” file

  • path_query (str) – Path to “w2v_query_embeddings.pkl” file

Returns

Dataframe “features” with new columns “w2v_cosine”, “w2v_euclidean”, “w2v_manhattan” appended

Return type

features (pd.DataFrame)

src.features.generator.create_w2v_tfidf_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, path_collection: str = 'data/embeddings/w2v_tfidf_collection_embeddings.pkl', path_query: str = 'data/embeddings/w2v_tfidf_query_embeddings.pkl')[source]

Creates tfidf weighted word2vec features (cosine, euclidean, manhattan)

Parameters
  • features (pd.DataFrame) – Dataframe containing feature data

  • collection (pd.DataFrame) – Dataframe containing collection data

  • queries (pd.DataFrame) – Dataframe containing queries data

  • path_collection (str) – Path to “w2v_tfidf_collection_embeddings.pkl” file

  • path_query (str) – Path to “w2v_tfidf_query_embeddings.pkl” file

Returns

Dataframe “features” with new columns “w2v_tfidf_cosine”, “w2v_tfidf_euclidean”, “w2v_tfidf_manhattan” appended

Return type

features (pd.DataFrame)