Generator
Functionality to generate features.
- src.features.generator.create_BM2_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame)[source]
Creates BM25 features for query-collection combinations
- Parameters
features (pd.DataFrame) – Dataframe containing feature data
collection (pd.DataFrame) – Dataframe containing collection data
queries (pd.DataFrame) – Dataframe containing queries data
- Returns
Dataframe “features” with new column “bm25” appended
- Return type
features (pd.DataFrame)
- src.features.generator.create_POS_features(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame)[source]
Creates Part of Speech features for query and collection data (nouns, adjectives, verbs)
- Parameters
features (pd.DataFrame) – Dataframe containing feature data
collection (pd.DataFrame) – Dataframe containing collection data
queries (pd.DataFrame) – Dataframe containing queries data
- Returns
Dataframe “features” with new columns “doc_nouns”, “doc_adjectives”, “doc_verbs”, “query_nouns”, “query_adjectives”, “query_verbs” appended
- Return type
features (pd.DataFrame)
- src.features.generator.create_all(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, tfidf=None, glove=None, bert=None, w2v=None)[source]
- Creates all implemented embeddings (bert, glove, tfidf, word2vec)
and features (cosine, euclidean, manhattan, jaccard, sentence, interpretation, BM25, POS).
- Parameters
features (pd.DataFrame) – Dataframe containing feature data
collection (pd.DataFrame) – Dataframe containing collection data
queries (pd.DataFrame) – Dataframe containing query data
tfidf (TFIDF object) – Creates new object of class tfidf if None
glove (Glove object) – Creates new object of class Glove if None
bert (Bert object) – Creates new object of class Bert if None
w2v (word2vec object) – Creates new object of class word2vec if None
- Returns
Dataframe containing feature data
- Return type
features (pd.DataFrame)
- src.features.generator.create_bert_embeddings(data: pandas.core.frame.DataFrame, bert=None, name: str = '')[source]
Creates bert embeddings
- Parameters
data (pd.DataFrame) – Dataframe containing data to be embedded
bert (str) – Creates new object of class Bert if None
name (str) – Adds string to name of the .pkl file created and stored of the data Dataframe
- Returns
Object of class Bert data (pd.DataFrame): Dataframe data with new column “preprocessed” appended
- Return type
bert (Bert object)
- src.features.generator.create_bert_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, path_collection: str = 'data/embeddings/bert_collection_embeddings.pkl', path_query: str = 'data/embeddings/bert_query_embeddings.pkl')[source]
Creates bert features (cosine, euclidean, manhattan)
- Parameters
features (pd.DataFrame) – Dataframe containing feature data
collection (pd.DataFrame) – Dataframe containing collection data
queries (pd.DataFrame) – Dataframe containing queries data
path_collection (str) – Path to “bert_collection_embeddings.pkl” file
path_query (str) – Path to “bert_query_embeddings.pkl” file
- Returns
Dataframe “features” with new columns “bert_cosine”, “bert_euclidean”, “bert_manhattan” appended
- Return type
features (pd.DataFrame)
- src.features.generator.create_glove_embeddings(data: pandas.core.frame.DataFrame, glove=None, name: str = '')[source]
Creates glove embeddings
- Parameters
data (pd.DataFrame) – Dataframe containing data to be embedded
glove (str) – Creates new object of class Glove if None
name (str) – Adds string to name of the .pkl file created and stored of the data Dataframe
- Returns
Object of class Glove data (pd.DataFrame): Dataframe data with new column “preprocessed” appended
- Return type
glove (Glove object)
- src.features.generator.create_glove_embeddings_tf_idf_weighted(data: pandas.core.frame.DataFrame, glove=None, name: str = '')[source]
Creates tfidf weighted glove embeddings
- Parameters
data (pd.DataFrame) – Dataframe containing data to be embedded
glove (str) – Creates new object of class Glove if None
name (str) – Adds string to name of the .pkl file created and stored of the data Dataframe
- Returns
Object of class Glove data (pd.DataFrame): Dataframe data with new column “glove_tfidf” appended
- Return type
glove (Glove object)
- src.features.generator.create_glove_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, path_collection: str = 'data/embeddings/glove_collection_embeddings.pkl', path_query: str = 'data/embeddings/glove_query_embeddings.pkl')[source]
Creates glove features (cosine, euclidean, manhattan)
- Parameters
features (pd.DataFrame) – Dataframe containing feature data
collection (pd.DataFrame) – Dataframe containing collection data
queries (pd.DataFrame) – Dataframe containing queries data
path_collection (str) – Path to “glove_collection_embeddings.pkl” file
path_query (str) – Path to “glove_query_embeddings.pkl” file
- Returns
Dataframe “features” with new columns “glove_cosine”, “glove_euclidean”, “glove_manhattan” appended
- Return type
features (pd.DataFrame)
- src.features.generator.create_interpretation_features(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame)[source]
Creates interpretation features for query and collection data (subjectivity, polarity)
- Parameters
features (pd.DataFrame) – Dataframe containing feature data
collection (pd.DataFrame) – Dataframe containing collection data
queries (pd.DataFrame) – Dataframe containing queries data
- Returns
Dataframe “features” with new columns “subjectivity_doc”, “polarity_doc”, “subjectivity_query”, “polarity_query” appended
- Return type
features (pd.DataFrame)
- src.features.generator.create_jaccard_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame)[source]
Creates jaccard features for query-collection combinations
- Parameters
features (pd.DataFrame) – Dataframe containing feature data
collection (pd.DataFrame) – Dataframe containing collection data
queries (pd.DataFrame) – Dataframe containing queries data
- Returns
Dataframe “features” with new column “jaccard” appended
- Return type
features (pd.DataFrame)
- src.features.generator.create_sentence_features(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame)[source]
Creates sentence features for query-collection combinations (words_difference, words_rel_difference, char_difference, char_rel_difference)
- Parameters
features (pd.DataFrame) – Dataframe containing feature data
collection (pd.DataFrame) – Dataframe containing collection data
queries (pd.DataFrame) – Dataframe containing queries data
- Returns
Dataframe “features” with new columns “words_doc”, “words_query”, “words_difference”, “words_rel_difference” “char_doc”, “char_query”, “char_difference”, “char_rel_difference” appended
- Return type
features (pd.DataFrame)
- src.features.generator.create_tfidf_embeddings(data: pandas.core.frame.DataFrame, tfidf=None, name: str = '')[source]
Creates tfidf embeddings
- Parameters
data (pd.DataFrame) – Dataframe containing data to be embedded
tfidf (str) – Creates new object of class tfidf if None
name (str) – Adds string to name of the .pkl file created and stored of the data data frame
- Returns
Object of class TFIDF data (pd.DataFrame): Dataframe data with new column “preprocessed” appended
- Return type
tfidf (TFIDF object)
- src.features.generator.create_tfidf_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, path_collection: str = 'data/embeddings/tfidf_collection_embeddings.pkl', path_query: str = 'data/embeddings/tfidf_query_embeddings.pkl')[source]
Creates tfidf features (cosine, euclidean, manhattan)
- Parameters
features (pd.DataFrame) – Dataframe containing feature data
collection (pd.DataFrame) – Dataframe containing collection data
queries (pd.DataFrame) – Dataframe containing queries data
path_collection (str) – Path to “tfidf_collection_embeddings.pkl” file
path_query (str) – Path to “tfidf_query_embeddings.pkl” file
- Returns
Dataframe “features” with new columns “tfidf_cosine”, “tfidf_euclidean”, “tfidf_manhattan” appended
- Return type
features (pd.DataFrame)
- src.features.generator.create_w2v_embeddings(data: pandas.core.frame.DataFrame, w2v=None, name: str = '')[source]
Creates word2vec embeddings
- Parameters
data (pd.DataFrame) – Dataframe containing data to be embedded
w2v (str) – Creates new object of class word2vec if None
name (str) – Adds string to name of the .pkl file created and stored of the data Dataframe
- Returns
Object of class word2vec data (pd.DataFrame): Dataframe data with new column “preprocessed” appended
- Return type
w2v (word2vec object)
- src.features.generator.create_w2v_embeddings_tf_idf_weighted(data: pandas.core.frame.DataFrame, w2v=None, name: str = '')[source]
Creates weighted tfidf word2vec embeddings
- Parameters
data (pd.DataFrame) – Dataframe containing data to be embedded
w2v (str) – Creates new object of class word2vec if None
name (str) – Adds string to name of the .pkl file created and stored of the data Dataframe
- Returns
Object of class word2vec data (pd.DataFrame): Dataframe data with new column “w2v_tfidf” appended
- Return type
w2v (word2vec object)
- src.features.generator.create_w2v_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, path_collection: str = 'data/embeddings/w2v_collection_embeddings.pkl', path_query: str = 'data/embeddings/w2v_query_embeddings.pkl')[source]
Creates word2vec features (cosine, euclidean, manhattan)
- Parameters
features (pd.DataFrame) – Dataframe containing feature data
collection (pd.DataFrame) – Dataframe containing collection data
queries (pd.DataFrame) – Dataframe containing queries data
path_collection (str) – Path to “w2v_collection_embeddings.pkl” file
path_query (str) – Path to “w2v_query_embeddings.pkl” file
- Returns
Dataframe “features” with new columns “w2v_cosine”, “w2v_euclidean”, “w2v_manhattan” appended
- Return type
features (pd.DataFrame)
- src.features.generator.create_w2v_tfidf_feature(features: pandas.core.frame.DataFrame, collection: pandas.core.frame.DataFrame, queries: pandas.core.frame.DataFrame, path_collection: str = 'data/embeddings/w2v_tfidf_collection_embeddings.pkl', path_query: str = 'data/embeddings/w2v_tfidf_query_embeddings.pkl')[source]
Creates tfidf weighted word2vec features (cosine, euclidean, manhattan)
- Parameters
features (pd.DataFrame) – Dataframe containing feature data
collection (pd.DataFrame) – Dataframe containing collection data
queries (pd.DataFrame) – Dataframe containing queries data
path_collection (str) – Path to “w2v_tfidf_collection_embeddings.pkl” file
path_query (str) – Path to “w2v_tfidf_query_embeddings.pkl” file
- Returns
Dataframe “features” with new columns “w2v_tfidf_cosine”, “w2v_tfidf_euclidean”, “w2v_tfidf_manhattan” appended
- Return type
features (pd.DataFrame)