cogdata.tasks package¶
cogdata.tasks.image_text_tokenization_task module¶
-
class
cogdata.tasks.image_text_tokenization_task.
ImageTextTokenizationTask
(saver, img_sizes, **kwargs)¶ Bases:
cogdata.tasks.base_task.BaseTask
handle tokenization
-
__init__
(saver, img_sizes, **kwargs) → None¶ config saver
-
get_transform_fn
(transform=None)¶ - Parameters
transform (torchvision.transforms) – a transform in torchvision, do not use ToTensor().
- Returns
A transform function for images
- Return type
function
-
process
(sub_datasets, progress_record=None, dataset_dir='', **kwargs)¶ - Use cuda to process batch data from dataloader,
save via Saver, report progress every 1/5000 ? final commit saver
- Parameters
sub_datasets ([Dataset]) – All datasets in processing list
progress_record (ProgressBar) – The progress bar for this task
dataset_dir (str) – The path of the dataset folder
- Returns
0 - Process successfully
- Return type
int
-
read_text
(txt_files, mode)¶ Read text dict from text files
- Parameters
txt_files ([str]) – All names of the text files
mode (str) – The mode of the text, including json,txt,json_ks,tsv,dict
-