Training and Evaluation

Training and Evaluation of models.

class src.models.training.Evaluation(previous_results: str = 'data/results/results.pkl')[source]

Bases: object

A class to create perform model evaluations.

previous_results

Path to previously stored results

Type

str

Methods: __call__(X_y_train: pd.DataFrame, X_test: pd.DataFrame, qrels: pd.DataFrame, k: int = 50,

components_pca: int = 0, model=GaussianNB(), pairwise_model=None, pairwise_top_k: int = 50, pairwise_train: bool = True, name: str = None, save_result: bool = True):

INSERT_DESCRIPTION

hyperparameter_optimization(model, search_space, X_y_train: pd.DataFrame, X_test: pd.DataFrame,

X_val: pd.DataFrame, qrels: pd.DataFrame, qrels_val: pd.DataFrame, k: int = 50, components_pca: int = 0, pairwise_model=None, pairwise_top_k: int = 50, pairwise_train: bool = True, trials: int = 50, name: str = None, save_result: bool = True):

Performs hyperparameter optimization.

feature_selection(model, search_space, X_y_train: pd.DataFrame, X_test: pd.DataFrame, X_val: pd.DataFrame,

qrels: pd.DataFrame, qrels_val: pd.DataFrame, k: int = 50, components_pca: int = 0, save_results: bool = True, name: str = None):

Performs feature selection.

compute_metrics(model, X: pd.DataFrame, y, X_test, test_pair, qrels: pd.DataFrame, k: int = 50,

components_pca: int = 0, pairwise_model=None, pairwise_top_k: int = 50, pairwise_train: bool = True, name: str = None, save_result: bool = False):

Calculates metrics and saves them in a dataframe locally.

calculate_ranks(results: pd.DataFrame):

Returns relevant documents with their corresponding rank

average_precision_score(results: pd.DataFrame):

Calculates Average Precision

mean_average_precision_score(results: pd.DataFrame):

Calculates Mean Average Precision for a set of queries

metrics(results: pd.DataFrame, k: int = None):

Calculates accuracy, precision, recall and f1 globally and in the top-k area

normalized_discounted_cumulative_gain(results: pd.DataFrame):

Calculates Normalized Discounted Cumulative Gain

mean_normalized_discounted_cumulative_gain_score(results: pd.DataFrame):

Calculates Mean Normalized Cumulative Gain

mean_reciprocal_rank(results: pd.DataFrame):

Calculates Mean Reciprocal Rank

average_precision_score(results: pandas.core.frame.DataFrame)[source]

Calculates average precision score.

Parameters

results (pd.DataFrame) –

Return type

AP (float)

calculate_ranks(results: pandas.core.frame.DataFrame)[source]

Calculates ranks.

Parameters

results (pd.DataFrame) –

Return type

ranks (pd.DataFrame)

compute_metrics(model, X: pandas.core.frame.DataFrame, y, X_test, test_pair, qrels: pandas.core.frame.DataFrame, k: int = 50, components_pca: int = 0, pairwise_model=None, pairwise_top_k: int = 50, pairwise_train: bool = True, name: Optional[str] = None, save_result: bool = False)[source]

Computes metrics.

Parameters
  • () (model) –

  • X (pd.DataFrame) –

  • y (pd.Series) –

  • X_test (pd.DataFrame) –

  • test_pair (pd.DataFrame) –

  • qrels (pd.DataFrame) –

  • k (int) –

  • components_pca (int) –

  • pairwise_model (str) –

  • pairwise_top_k (int) –

  • pairwise_train (Boolean) –

  • name (str) –

  • save_result (Boolean) –

Return type

MRR (float)

feature_selection(model, X_y_train: pandas.core.frame.DataFrame, X_test: pandas.core.frame.DataFrame, qrels: pandas.core.frame.DataFrame, k: int = 50, components_pca: int = 0, save_results: bool = True, name: Optional[str] = None)[source]

Performs feature selection.

Parameters
  • () (model) –

  • X_y_train (pd.DataFrame) –

  • X_test (pd.DataFrame) –

  • qrels (pd.DataFrame) –

  • k (int) –

  • components_pca (int) –

  • name (str) –

  • save_results (Boolean) –

Return type

Selected Features (list)

hyperparameter_optimization(model, search_space, X_y_train: pandas.core.frame.DataFrame, X_test: pandas.core.frame.DataFrame, X_val: pandas.core.frame.DataFrame, qrels: pandas.core.frame.DataFrame, qrels_val: pandas.core.frame.DataFrame, k: int = 50, components_pca: int = 0, pairwise_model=None, pairwise_top_k: int = 50, pairwise_train: bool = True, trials: int = 50, name: Optional[str] = None, save_result: bool = True)[source]

Performs hyperparameter optimization.

Parameters
  • () (search_space) –

  • ()

  • X_y_train (pd.DataFrame) –

  • X_test (pd.DataFrame) –

  • X_val (pd.DataFrame) –

  • qrels (pd.DataFrame) –

  • qrels_val (pd.DataFrame) –

  • k (int) –

  • components_pca (int) –

  • pairwise_model (str) –

  • pairwise_top_k (int) –

  • pairwise_train (Boolean) –

  • trials (int) –

  • name (str) –

  • save_result (Boolean) –

Returns

MRR and nDCG

Return type

tuple (float)

mean_average_precision_score(results: pandas.core.frame.DataFrame)[source]

Calculates mean average precision score.

Parameters

results (pd.DataFrame) –

Return type

MAP (float)

mean_normalized_discounted_cumulative_gain_score(results: pandas.core.frame.DataFrame)[source]

Calculates mean normalized discounted cumulative gain score.

Parameters

results (pd.DataFrame) –

Return type

Mean of nDCG (float)

mean_reciprocal_rank(results: pandas.core.frame.DataFrame, threshold: int = 3)[source]

Calculates mean reciprocal rank.

Parameters

results (pd.DataFrame) –

Return type

MRR (float)

metrics(results: pandas.core.frame.DataFrame, k: Optional[int] = None)[source]

Calculates metrics (accuracy, precision, recall, f1).

Parameters
  • results (pd.DataFrame) –

  • k (int) –

Returns

Returns accuracy score of model precision (float): Returns precision score of model recall (float): Returns recall score of model f_score (float): Returns f_score score of model

Return type

accuracy (float)

normalized_discounted_cumulative_gain(results: pandas.core.frame.DataFrame)[source]

Calculates normalized discounted cumulative gain.

Parameters

results (pd.DataFrame) –

Return type

nDCG (float)