The Montgomery multiplication is an efficient method for modular arithmetic. Typically, it is used for modular arithmetic over integer rings to prevent the expensive inversion for the modulo reduction. In this work, we consider modular arithmetic over rings of Gaussian integers. Gaussian integers are subset of the complex numbers such that the real and imaginary parts are integers. In many cases Gaussian integer rings are isomorphic to ordinary integer rings. We demonstrate that the concept of the Montgomery multiplication can be extended to Gaussian integers. Due to independent calculation of the real and imaginary parts, the computation complexity of the multiplication is reduced compared with ordinary integer modular arithmetic. This concept is suitable for coding applications as well as for asymmetric key cryptographic systems, such as elliptic curve cryptography or the Rivest-Shamir-Adleman system.
Many resource-constrained systems still rely on symmetric cryptography for verification and authentication. Asymmetric cryptographic systems provide higher security levels, but are very computational intensive. Hence, embedded systems can benefit from hardware assistance, i.e., coprocessors optimized for the required public key operations. In this work, we propose an elliptic curve cryptographic coprocessors design for resource-constrained systems. Many such coprocessor designs consider only special (Solinas) prime fields, which enable a low-complexity modulo arithmetic. Other implementations support arbitrary prime curves using the Montgomery reduction. These implementations typically require more time for the point multiplication. We present a coprocessor design that has low area requirements and enables a trade-off between performance and flexibility. The point multiplication can be performed either using a fast arithmetic based on Solinas primes or using a slower, but flexible Montgomery modular arithmetic.
The Lempel-Ziv-Welch (LZW) algorithm is an important dictionary-based data compression approach that is used in many communication and storage systems. The parallel dictionary LZW (PDLZW) algorithm speeds up the LZW encoding by using multiple dictionaries. The PDLZW algorithm applies different dictionaries to store strings of different lengths, where each dictionary stores only strings of the same length. This simplifies the parallel search in the dictionaries for hardware implementations. The compression gain of the PDLZW depends on the partitioning of the address space, i.e. on the sizes of the parallel dictionaries. However, there is no universal partitioning that is optimal for all data sources. This work proposes an address space partitioning technique that optimizes the compression rate of the PDLZW using a Markov model for the data. Numerical results for address spaces with 512, 1024, and 2048 entries demonstrate that the proposed partitioning improves the performance of the PDLZW compared with the original proposal.
In this work, we investigate a hybrid decoding approach that combines algebraic hard-input decoding of binary block codes with soft-input decoding. In particular, an acceptance criterion is proposed which determines the reliability of a candidate codeword. For many received codewords the stopping criterion indicates that the hard-decoding result is sufficiently reliable, and the costly soft-input decoding can be omitted. The proposed acceptance criterion significantly reduces the decoding complexity. For simulations we combine the algebraic hard-input decoding with ordered statistics decoding, which enables near maximum likelihood soft-input decoding for codes of small to medium block lengths.
The Lempel–Ziv–Welch (LZW) algorithm is an important dictionary-based data compression approach that is used in many communication and storage systems. The parallel dictionary LZW (PDLZW) algorithm speeds up the LZW encoding by using multiple dictionaries. This simplifies the parallel search in the dictionaries. However, the compression gain of the PDLZW depends on the partitioning of the address space, i.e. on the sizes of the parallel dictionaries. This work proposes an address space partitioning technique that optimises the compression rate of the PDLZW. Numerical results for address spaces with 512, 1024, and 2048 entries demonstrate that the proposed address partitioning improves the performance of the PDLZW compared with the original proposal. These address space sizes are suitable for flash storage systems. Moreover, the PDLZW has relative high memory requirements which dominate the costs of a hardware implementation. This work proposes a recursive dictionary structure and a word partitioning technique that significantly reduce the memory size of the parallel dictionaries.
The Burrows–Wheeler transformation (BWT) is a reversible block sorting transform that is an integral part of many data compression algorithms. This work proposes a memory-efficient pipelined decoder for the BWT. In particular, the authors consider the limited context order BWT that has low memory requirements and enable fast encoding. However, the decoding of the limited context order BWT is typically much slower than the encoding. The proposed decoder pipeline provides a fast inverse BWT by splitting the decoding into several processing stages which are executed in parallel.