What error should be return if input reader is short than expected?

Migrated from What error should be return if input reader is short than expected? · Issue #715 · beyondstorage/go-storage · GitHub

Given code like this:

Write(path string, r io.Reader, size int64, pairs ...Pair) (n int64, err error)

If r only has 1MB data, but the user input size is 4MB, do we need to return an error for it?

Due to our size support, we only read/write the size specified by size. So it’s OK if our Reader / Writer have more data than expected. But it’s broken if the user specifies a size that is more than Reader / Writer. We need to handle all the following cases.

Read

func(path string, w io.Writer) → read from path into w.

It’s the user’s responsibility to make sure w is able to write all data, so we don’t need to care about it.

Now, we have the length for path and size for this operation.

case description
length >= size It’s OK, we only read size bytes
length < size data is not enough, we need to handle it.

Write

func(path string, r io.Reader, size int64) → write from r into path.

Well, here is a design decision here, do we need to make it the user’s responsibility to make sure r has enough data?

In some cases, the user doesn’t know how much data is in r. Like we met in go-stream:

	for {
		size, err := br.s.upper.Write(p, r, chunkSize, ps...)
		// No matter we read success or not, we both need to send data.
		n += size
		if err != nil && errors.Is(err, io.EOF) {
			break
		}
		if err != nil {
			return n, err
		}
	}

In this code example, r is steam and the user doesn’t when it stopped. If we don’t support this kind of pattern for the user, the user needs to read data into a buffer, and then copy data from the buffer.

However, we can’t.

Because we MUST make sure that the Reader only be consumed once. And it’s impossible to know how much data is in r without reading or seeking it.

If we return an error or handle them internally (which is impossible for services like s3, they will use size for content-length), the data will be consumed and can’t be back again.

Now, we have these var for this operations:

  • the maximum size for path
  • size for this op.
  • length for this reader.

The user MUST make sure that size <= length, the relation between length and maximum does not matter.

So there are the following cases:

case description
size <= length It’s OK, we only write size bytes
size > length data is no enough, we need to return an error
size <= maximum It’s OK, we only write size bytes
size > maximum data is too large for write, we need to return an error.