Inventors:
Ralph Shnelvar - Boulder CO, 80302
International Classification:
G06F 1730
Abstract:
A method for storing data from a data source in a storage device of a data repository by reading all source allocation units, restructuring the data into data units having a size corresponding to the repository allocation units, and generating a hash value for the data of each data unit read from the data source. For each data unit, a data table is searched for a table entry having a matching hash value wherein each table entry contains the hash value of a data unit stored in a repository allocation unit and a repository allocation unit pointer to the corresponding repository allocation unit. When the hash value of a data unit does not match any hash value of any table entry in the data table, the data of the data unit is written into a newly allocated repository allocation unit a new table entry is written to the data table. When the hash value of a data unit matches the hash value of a data entry in the data table, the data of the corresponding repository allocation unit and is compared with the data of the data unit. If the data of the data unit matches the repository allocation unit, the data unit is discarded.