Compression Experiments on Term-Document Index

Sorkun M. C., Ozbey C.

2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Türkiye, 5 - 08 Ekim 2017, ss.435-439, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/ubmk.2017.8093432
Basıldığı Şehir: Antalya
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.435-439
Galatasaray Üniversitesi Adresli: Hayır

Özet

The increase in the size of the data used in natural language processing activities brings with it time and space constraints. Thus, it is important to both store and access data efficiently. This study includes experiments for storing the term-document index, which will be used in a natural language processing project, effectively in memory. For this purpose, the indexed data is compressed using Run-Length coding and then Huffman coding algorithm. Compression experiments have been conducted with new versions of Huffman Coding which are arranged in a structure suitable for indexing data used in the study. The results were compared with the results of commonly used compression tools and the results turned out to be successful.