cogdata.data_savers package

cogdata.data_savers.base_saver module

class cogdata.data_savers.base_saver.BaseSaver(output_path, *args, **kwargs)

Bases: abc.ABC

abstract commit()

Commit all buffered samples.

classmethod merge(files, output_path)

Merge files into one.

abstract save(*args)

Save a sample, can be buffered.

classmethod split(input_path, output_path, n)

Split input_path into n files in output_path.

suffix = '.dat'

cogdata.data_savers.binary_saver module

class cogdata.data_savers.binary_saver.BinarySaver(output_path, dtype='int32', **kwargs)

Bases: cogdata.data_savers.base_saver.BaseSaver

Save data as binary files

commit()

Commit all buffered samples.

mapping = {'bool': <class 'torch.BoolTensor'>, 'float32': <class 'torch.FloatTensor'>, 'int32': <class 'torch.IntTensor'>, 'int64': <class 'torch.LongTensor'>, 'uint8': <class 'torch.ByteTensor'>}
max_buffer_size = 10737418240
classmethod merge(files, output_path, overwrite=False)

Merge files into one.

Parameters
  • files ([file pointer]) – Files which need to merge

  • output_path (str) – Path of output file.

save(data)
Parameters

data (Tensor) – write in a binary file.

classmethod split(input_path, output_dir, n, **kwargs)

Split input_path into n files in output_path.

Parameters
  • input_path (str) – The path of the input binary file.

  • output_dir (str) – The root folder of n output files.

suffix = '.bin'

cogdata.data_savers.tar_saver module

class cogdata.data_savers.tar_saver.TarSaver(output_path, mode='w:', **kwargs)

Bases: cogdata.data_savers.base_saver.BaseSaver

Save data as tar files

commit()

Commit all buffered samples.

classmethod merge(files, output_path, overwrite=False)

Merge files into one.

Parameters
  • files ([file pointer]) – Files which need to merge

  • output_path (str) – Path of output file.

save(fp, full_filename, file_size)
Parameters
  • fp (file pointer) – Processed data files

  • full_filename (str) – The name of fp in tar files

classmethod split(input_path, output_dir, n)

Split input_path into n files in output_path.

Parameters
  • input_path (str) – The path of the input tar file.

  • output_dir (str) – The root folder of n output files.

suffix = '.tar'