Thomas Russell: Programmer & Designer
Have a browse of my thoughts!

Automatic allocation on the heap or stack

In response to one of the Google+ comments (which can be found here) to my post on how to create a Tensor class, I have decided to write an article on how to modify the CTensor to use a storage class which uses automatic allocation on the heap or the stack depending on whether a stack overflow is likely.

This is quite a long post, so I’ve provided a table of contents here for easier navigation:

  1. Current CTensor concerns
  2. How does automatic allocation help us?
  3. Writing our automatic storage class
  4. Finishing the stack-allocated class
  5. Finishing the heap-allocated class
  6. The Indexing Operator (operator[])
  7. Conclusion

Current CTensor concerns

In the Google+ post linked above, Nikolay Orlyuk raises two key valid points; he states that:

  1. Whilst large CTensor objects cannot be safely allocated on the stack, small ones could be; but the extra layer of indirection for these resulting from the object allocating all of the array on the heap would result in inefficiencies.
  2. If we were to create an array of CTensor objects, not having all of the data be contiguous will result in extra work for memory allocator as it cannot just reserve a large block of memory of the correct size.

These are both valid points, and we will address the first of these in this blog article. However, as far as I know (I would love to be corrected on this), aside from forcing allocation on the stack, there is no way to combat the second issue. The side effect of forcing stack allocation for the internal array would be that the programmer would be forced to allocate the CTensor object on the heap for large CTensor objects (which most multidimensional tensors would be), and this would result in extra syntax and an increased potential for programmer error.

How does automatic allocation help us?

One of the wonderful things about C++ is its versatility, with access to low-level techniques such as memory management, and high-level features such as template metaprogramming, we can often make clever improvements to run-time efficiency with a bit of compile-time magic.

In this case we will change the internal storage type of the CTensor object to a class which changes whether it allocates the data on the stack or the heap depending on the size requirements.

Writing our automatic storage class

The first thing we will do, before even declaring our automatic storage class, is to make use of the C preprocessor to allow us to create a compile-time constant that can easily be changed by our end-user.

#ifndef MAX_STACK_SIZE
	#define MAX_STACK_SIZE 10000
#endif // MAX_STACK_SIZE

This allows us to set a default maximum size for which we will allocate on the stack but which can easily be overwritten by a user (say to 5000) through them calling #define MAX_STACK_SIZE 5000 before including the header file.

We now turn to our friend SFINAE for help. We begin by creating our base template class; it is not possible to instantiate this class, but it is necessary to allow us to specialize for the two cases where the size is greater or less than the specified maximum safe stack size. We then create our template specialized classes, giving us the following barebones:

// Helper Data Storage Class (automatically puts it on the stack if possible):
template <typename T, std::size_t Size, typename Enable = void>
class CAutoStorage; // Undefined class for SFINAE

template <typename T, std::size_t Size>
class CAutoStorage<T, Size, typename std::enable_if<(Size > MAX_STACK_SIZE)>::type> {

private:
	T*			m_ptValues;
};

template <typename T, std::size_t Size>
class CAutoStorage<T, Size, typename std::enable_if<(Size <= MAX_STACK_SIZE)>::type> {

private:
	T			m_tValues[Size];
};

We have now created two separate classes, one of which just has an internal pointer to T as its data member, and the other has a stack-allocated array of size Size.

Finishing the stack-allocated class

We will now divert our attentions to the stack-allocated version of the CAutoStorage class. We can use the default constructor and destructor, but we can improve the copy-constructor by introducing two slightly modified versions, to cater for trivially copyable and non-trivially copyable types, so our constructor block looks something like the following:

CAutoStorage() = default;
~CAutoStorage() = default;

CAutoStorage( const CAutoStorage& storage, std::true_type )
{
	std::memcpy( std::addressof( m_tValues[0] ), std::addressof( storage.m_tValues[0] ), Size * sizeof(T) ) );
}

CAutoStorage( const CAutoStorage& storage, std::false_type )
{
	for( std::size_t s = 0; s < Size; ++s )
	{
		m_tValues[s] = storage.m_tValues[s];
	}
}

CAutoStorage( const CAutoStorage& storage ) : CAutoStorage( storage, std::is_trivially_copyable<T>::type() ) {}
CAutoStorage( CAutoStorage&& storage ) : CAutoStorage( storage ) {}

template <typename U>
CAutoStorage( const CAutoStorage<U, Size>& storage )
{
	static_assert(std::is_convertible<U, T>::value, "Cannot copy-construct CAutoStorage from non-convertible type");
	
	for( std::size_t s = 0; s < Size; ++s )
	{
		m_tValues[s] = static_cast<T>(storage.m_tValues[s]);
	}
}

Where we have made use of the C++11 delegating constructor feature. We can use a similar trick to define the assignment operators:

private:
	CAutoStorage&	equal_helper( const CAutoStorage& storage, std::true_type )
	{
		std::memcpy( std::addressof( m_tValues[0] ), std::addressof( storage.m_tValues[0] ), Size * sizeof(T) );
		return *this;
	}

	CAutoStorage&	equal_helper( const CAutoStorage& storage, std::false_type )
	{
		for( std::size_t s = 0; s < Size; ++s )
		{
			m_tValues[s] = storage.m_tValues[s];
		}

		return *this;
	}

public:
	CAutoStorage&	operator=( const CAutoStorage& storage )
	{
		return equal_helper( storage, std::is_trivially_copyable<T>::type() );
	}

	CAutoStorage&	operator=( CAutoStorage&& storage )
	{
		return equal_helper( storage, std::is_trivially_copyable<T>::type() );
	}

	template <typename U>
	CAutoStorage&	operator=(const CAutoStorage<U, Size>& storage)
	{
		static_assert(std::is_convertible<U, T>::value, "Cannot assign from a CAutoStorage of inconvertible type");

		for( std::size_t s = 0; s < Size; ++s )
		{
			m_tValues[s] = static_cast<T>(storage.m_tValues[s]);
		}

		return *this;
	}

This specialization is now almost done; all that is left is to add indexing operators [], but as they are the same for both specializations, we will include their implementation in a separate section.

Finishing the heap-allocated class

The heap-allocated specialization of the class is slightly more complex as it has to deal with memory-management. However, we also have the opportunity to improve the move-constructor and move-assigment operator to make moving more runtime efficient. Our constructors for the class look like the following:

CAutoStorage()
{
	m_ptValues = new T[Size]();
}

CAutoStorage( const CAutoStorage& storage, std::true_type ) : CAutoStorage()
{
	std::memcpy( m_ptValues, storage.m_ptValues, Size * sizeof(T) );
}

CAutoStorage( const CAutoStorage& storage, std::false_type ) : CAutoStorage()
{
	for( std::size_t s = 0U; s < Size; ++s )
	{
		m_ptValues[s] = storage.m_ptValues[s];
	}
}

CAutoStorage( const CAutoStorage& storage ) : CAutoStorage( storage, std::is_trivially_copyable<T>::type() ) {}

template <typename U>
CAutoStorage( const CAutoStorage<U, Size>& storage ) : CAutoStorage()
{
	static_assert(std::is_convertible<U, T>::value, "Cannot copy-construct from automatic storage of inconvertible type");
	for( std::size_t s = 0U; s < Size; ++s )
	{
		m_ptValues[s] = static_cast<T>(storage.m_ptValues[s]);
	}
}

CAutoStorage( CAutoStorage&& storage ) : m_ptValues( nullptr )
{
	std::swap( m_ptValues, storage.m_ptValues );
}

~CAutoStorage()
{
	if( m_ptValues )
		delete[] m_ptValues;
}

The assignment operators are the same as for the stack-allocated specialization, with the exception of the move-assignment operator, which looks as follows:

CAutoStorage&	operator=(CAutoStorage&& storage)
{
	std::swap( m_ptValues, storage.m_ptValues );
	return *this;
}

The Indexing Operator (operator[])

We can implement the indexing operator for both classes using the same function definition:

T&			operator[]( std::size_t sPos )
{
	// We could even implement bounds checking if we wanted!
	return m_ptValues[sPos];
}

const T&	operator[]( std::size_t sPos ) const
{
	return m_ptValues[sPos];
}

Conclusion

And this is it, we’ve finally finished our CAutoStorage class, which allows for automatic allocation on the heap or stack depending on the size given to it at compile time. We can now use this for the internal data member for the CTensor class to alleviate the first of the problems noted above.

ZIP containing .h file: AutomaticStorageClass

Leave a Reply

Your email address will not be published. Required fields are marked *

Designed and Produced by Thomas Russell © 2014-2017

Log-in | Register