Datasets
Used to download, import and samples the datasets.
- src.data.dataset.download(remote_url: Optional[str] = None, path: Optional[str] = None)[source]
Downloads files
- Parameters
remote_url (str) – URL to dataset
path (str) – Location to store downloaded data.
- Returns
none
- src.data.dataset.download_dataset(datasets: Optional[list] = None, path: str = 'data/TREC_Passage')[source]
Combines and executes download and unzip methods
- Parameters
datasets (list) – List of required files
path (str) – Location to store downloaded data.
- Returns
none
- src.data.dataset.import_collection(path: str = 'data/TREC_Passage', qrels_val: Optional[list] = None, qrels_test: Optional[list] = None, triples: Optional[list] = None, samples: int = 0)[source]
Imports data from collection.tsv file
- Parameters
path (str) – Location of dataset
qrels_val (list) –
triples (list) –
samples (int) – Specify number of rows to be imported from dataset
- Returns
Data frame containing IDs and Passages from collection dataset
- Return type
df (pd.DataFrame)
- src.data.dataset.import_qrels(path: str = 'data/TREC_Passage', samples: int = 5)[source]
Imports data from 2019qrels-pass.txt as validation set and from 2020qrels-pass.txt as test set
- Parameters
path (str) – Location of dataset
samples (int) – Specify number of rows to be imported from dataset
- Returns
Data frame containing validation set df_test (pd.DataFrame): Data frame containing test set
- Return type
df_val (pd.DataFrame)
- src.data.dataset.import_queries(path: str = 'data/TREC_Passage', collection: Optional[list] = None)[source]
Imports train queries
- Parameters
path (str) – Location of dataset
collection (list) –
- Returns
Query train IDs and content
- Return type
df (pd.DataFrame)
- src.data.dataset.import_training_set(path: str = 'data/TREC_Passage', samples: int = 200)[source]
Imports data from qidpidtriples.train.full.2.tsv as training set
- Parameters
path (str) – Location of dataset
samples (int) – Specify number of rows to be imported from dataset
- Returns
Data frame containing training set
- Return type
df (pd.DataFrame)
- src.data.dataset.import_val_test_queries(path: str = 'data/TREC_Passage', qrels_val: Optional[list] = None, qrels_test: Optional[list] = None)[source]
Imports validation and test queries
- Parameters
path (str) – Location of dataset
qrels_val (list) –
qrels_test (list) –
- Returns
Query validation IDs and content test_df (pd.DataFrame): Query test IDs and content
- Return type
val_df (pd.DataFrame)