IOUArray to ByteSring, as quickly as possible


December 2018


97 раз


Мне нужно , чтобы мутировать элементы в массиве фиксированного размера из Word8очень быстро. Для этого я использую IOUArray. Мне нужно отправить этот массив через соединение WebSocket. Функция sendBinaryDataиз пакета WebSockets требует ByteString. Мне нужно преобразовать из одного представления в другое. Я использую эту функцию в настоящее время:

arrayToBS :: IOUArray Int Word8 -> IO (BS.ByteString)
arrayToBS = (fmap BS.pack) . getElems

Эта функция преобразует элементы массива в [Word8]перед упаковкой , что список в виде массива байт, и от профилирования я могу видеть , что это довольно медленно. Мне было интересно , если есть способ ускорить эту функцию, или , возможно , отправить массив через подключение WebSocket напрямую?

Массив, который я использую в настоящее время:

size = 1000;
numBytes = size * size * 4

newBuffer :: IO (IOUArray Int Word8)
newBuffer = newArray (0, numBytes) 200 :: IO (IOUArray Int Word8)

и за исключением того, из отчета об исполнении:

COST CENTRE MODULE SRC                        %time %alloc

arrayToBS   Lib    src/Lib.hs:28:1-37          88.1   99.0
newBuffer   Lib    src/Lib.hs:(23,1)-(25,12)    9.9    0.8

В идеале arrayToBSбыло бы гораздо быстрее , чем создание массива. Если я изменю sizeдо 100:

COST CENTRE         MODULE                          SRC                                                %time %alloc

arrayToBS           Lib                             src/Lib.hs:21:1-37                           100.0   86.1
mkEncodeTable.table Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:105:5-75    0.0    8.0
mkEncodeTable.ix    Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:104:5-43    0.0    1.1

2 ответы


Disclaimer: I'm not very familiar with these low level primitives so this might be unsafe in some cases.

You will at the very least need to copy the data over once since, as @user2407038 remarks, the underlying data stored in an IOUArray is an unpinned array, so we can't count on GHC not moving the array around. The reverse direction (ByteString to IOArray) is however possible without a copy.

{-# LANGUAGE UnboxedTuples, MagicHash #-}

import Data.ByteString.Internal (ByteString(..))
import Data.Array.IO.Internals  (IOUArray(..))
import Data.Array.Base          (STUArray(..))
import Data.Word                (Word8)

import Foreign.ForeignPtr (mallocForeignPtrBytes, withForeignPtr)
import GHC.IO             (IO(..))
import GHC.Exts           (copyMutableByteArrayToAddr#, Ptr(..), Int(..))

arrayToBS :: IOUArray Int Word8 -> IO ByteString
arrayToBS (IOUArray (STUArray _ _ [email protected](I# n') mutByteArr)) = do
  bytes <- mallocForeignPtrBytes n
  withForeignPtr bytes $ \(Ptr addr) -> IO $ \state ->
    (# copyMutableByteArrayToAddr# mutByteArr 0# addr n' state, () #)
  pure (PS bytes 0 n)

Here is a test of this working (remember that the ascii code for 'A' is 65):

ghci> iou <- newListArray (-2,9) [65,67..] :: IO (IOUArray Int Word8)
ghci> arrayToBS iou

Ok, thanks to user2407038 I have something (Note that I have never played with primitives or unboxed types before):

import Control.Monad.ST
import qualified Data.ByteString as BS
import Data.Word
import Data.Array.ST
import Data.Array.Base
import Data.ByteString.Internal
import GHC.Prim
import GHC.Exts
import GHC.ForeignPtr

bs2Addr# :: BS.ByteString -> Addr#
bs2Addr# (PS fptr offset len) = case fptr of
  (ForeignPtr addr _ ) -> addr

arrayPrim (STUArray _ _ _ x) = x

unbox :: Int -> Int#
unbox (I# n#) = n#

copy :: Int -> IO BS.ByteString
copy len = do
  -- Get the length as unboxed
  let len# = unbox len

  -- Bytestring to copy to, filled with 0s initially
  let bs = BS.pack (replicate len 0)

  -- Create a new STUArray. I don't know why it needs to be length * 2.
  arr <- stToIO (newArray (0, len * 2) 255 :: ST s (STUArray s Int Word8))

  -- MutableByteArray#
  let mArrPrim = arrayPrim arr

  -- Addr#
  let addr = bs2Addr# bs

  -- I don't know what the 2nd and 4th Int# arguments are suppose to be.
  let _ = copyMutableByteArrayToAddr# mArrPrim len# addr len# realWorld#
  return bs

I am using STUArray here instead of IOUArray now because I couldn't find the IOUArray constructor.

Results of profiling this code with 4000000 element array:

    Sun Aug 20 20:49 2017 Time and Allocation Profiling Report  (Final)

       shoot-exe +RTS -N -p -RTS

    total time  =        0.05 secs   (47 ticks @ 1000 us, 1 processor)
    total alloc = 204,067,640 bytes  (excludes profiling overheads)

COST CENTRE MODULE SRC                        %time %alloc     Lib    src/Lib.hs:32:7-36          66.0   96.0
copy        Lib    src/Lib.hs:(27,1)-(45,11)   34.0    3.9

This is the code that I compared it against:

arrayToBS :: (STUArray s Int Word8) -> ST s (BS.ByteString)
arrayToBS = (fmap BS.pack) . getElems

slowCopy :: Int -> IO BS.ByteString
slowCopy len = do
  arr <- stToIO (newArray (0, len - 1) 255 :: ST s (STUArray s Int Word8))
  stToIO $ arrayToBS arr

And its profiling report:

    Sun Aug 20 20:48 2017 Time and Allocation Profiling Report  (Final)

       shoot-exe +RTS -N -p -RTS

    total time  =        0.55 secs   (548 ticks @ 1000 us, 1 processor)
    total alloc = 1,604,073,872 bytes  (excludes profiling overheads)

COST CENTRE MODULE SRC                        %time %alloc

arrayToBS   Lib    src/Lib.hs:48:1-37          98.2   99.7
slowCopy    Lib    src/Lib.hs:(51,1)-(53,24)    1.6    0.2

OK, new version is faster. They both produce the same output. However, I would still like to know what the #Int parameters to copyMutableByteArrayToAddr# and why I have to multiply the length of the array in the fast version by 2. I'll play around a bit more and update this answer if I find out.

Update: Alec's answer

For those curious, this is the result of profiling Alec's answer:

    Sun Aug 20 21:13 2017 Time and Allocation Profiling Report  (Final)

       shoot-exe +RTS -N -p -RTS

    total time  =        0.01 secs   (7 ticks @ 1000 us, 1 processor)
    total alloc =   8,067,696 bytes  (excludes profiling overheads)

COST CENTRE   MODULE SRC                          %time %alloc

newBuffer     Other  src/Other.hs:23:1-33          85.7   49.6
arrayToBS.\.\ Other  src/Other.hs:19:5-69          14.3    0.0
arrayToBS     Other  src/Other.hs:(16,1)-(20,21)    0.0   49.6

Looks like that's the one to use.