Compression Experiments on Term-Document Index


Sorkun M. C., Ozbey C.

2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Türkiye, 5 - 08 Ekim 2017, ss.435-439 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ubmk.2017.8093432
  • Basıldığı Şehir: Antalya
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.435-439
  • Galatasaray Üniversitesi Adresli: Hayır

Özet

The increase in the size of the data used in natural language processing activities brings with it time and space constraints. Thus, it is important to both store and access data efficiently. This study includes experiments for storing the term-document index, which will be used in a natural language processing project, effectively in memory. For this purpose, the indexed data is compressed using Run-Length coding and then Huffman coding algorithm. Compression experiments have been conducted with new versions of Huffman Coding which are arranged in a structure suitable for indexing data used in the study. The results were compared with the results of commonly used compression tools and the results turned out to be successful.