You should not mix the code that contains the add-in function and the code that uses the GPU. Always keep the two sets of code in separate files. Use a header file to declare the CUDA functions, and include that in the add-in functions' source file.
Your CUDA code (in .cu files) cannot include the XLL+ headers, since they use C++ language features such as recursion that are not supported in CUDA code.
Similarly, your add-in functions should not include CUDA language extensions.
If you pass structured data between the host and the GPU's global shared memory, you will find it much easier to pack it into a "flat" format than to support pointers within the structure. For example, the code below supports a format that contains a yield curve of variable size:
struct YieldCurvePoint { int date; double yield; }; struct YieldCurve { unsigned int count; YieldCurvePoint points[1]; }; #define YIELDCURVE_SIZE(count) \ (sizeof(YieldCurve) + (sizeof(YieldCurvePoint) * (count - 1)))
To allocate and populate the structure from C++ code, you might use the following class:
class YieldCurveHolder { public: YieldCurveHolder() : data(0) {} ~YieldCurveHolder() { deallocate(); } YieldCurveHolder(const std::vector<long>& dates, const std::vector<double>& yields) { unsigned count = static_cast<unsigned int>(dates.size()); allocate(count); data->count = count; for (unsigned int i = 0; i < count; i++) { data->points[i].date = static_cast<int>(dates[i]); data->points[i].yield = yields[i]; } } void allocate(unsigned int count) { data = (YieldCurve*)new char[YIELDCURVE_SIZE(count)]; } void deallocate() { if (data) { delete[] (char*)data; data = 0; } } public: YieldCurve* data; };
This helper class ensures that memory is correctly allocated, and reliably freed. This simplifies the code that uses the structure and is particularly important when exceptions may be thrown.