Compression Experiments on Term-Document Index

Sorkun M. C., Ozbey C.

2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Turkey, 5 - 08 October 2017, pp.435-439 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/ubmk.2017.8093432
  • City: Antalya
  • Country: Turkey
  • Page Numbers: pp.435-439
  • Galatasaray University Affiliated: No


The increase in the size of the data used in natural language processing activities brings with it time and space constraints. Thus, it is important to both store and access data efficiently. This study includes experiments for storing the term-document index, which will be used in a natural language processing project, effectively in memory. For this purpose, the indexed data is compressed using Run-Length coding and then Huffman coding algorithm. Compression experiments have been conducted with new versions of Huffman Coding which are arranged in a structure suitable for indexing data used in the study. The results were compared with the results of commonly used compression tools and the results turned out to be successful.