Small writes to SMB network share are slow on Windows, fast over CIFS Linux mount
The C++ endl is defined to output '\n' followed by a flush. flush() is an expensive operation, so you should generally avoid using endl as your default end of line as it can create exactly the performance issue you are seeing (and not just with SMB, but with any ofstream with an expensive flush including local spinning rust or even the latest NVMe at some ridiculously high rate of output).
Replacing endl with "\n" will fix the performance above by allowing the system to buffer as intended. Except some libraries may flush on "\n", in which case you have more headaches (see https://stackoverflow.com/questions/21129162/tell-endl-not-to-flush for a solution overriding the sync() method).
Now to complicate things, flush() is only defined for what happens within the library buffers. The effect of flush on operating system, disk, and other external buffers is not defined. For Microsoft.NET "When you call the FileStream.Flush method, the operating system I/O buffer is also flushed." (https://msdn.microsoft.com/en-us/library/2bw4h516(v=vs.110).aspx) This makes flush particularly expensive for Visual Studio C++ as it will round-trip the write all the way out to the physical media at the far end of your remote server as you are seeing. GCC on the other hand says "A last reminder: there are usually more buffers involved than just those at the language/library level. Kernel buffers, disk buffers, and the like will also have an effect. Inspecting and changing those are system-dependent." (https://gcc.gnu.org/onlinedocs/libstdc++/manual/streambufs.html) Your Ubuntu traces would seem to indicate that the operating system / network buffers are not flushed by the library flush(). System dependant behaviour would be all the more reason to avoid endl and flushing excessively. If you are using VC++ you might try switching to a Windows GCC derivative to see how the system dependant behaviours react, or alternatively using Wine to run the Windows executable on Ubuntu.
More generally you need to think about your requirements to determine if flushing every line is appropriate or not. endl is generally suitable for interactive streams such as the display (we need the user to actually see our output, and not in bursts), but generally not suitable for other types of streams including files where the flushing overhead can be significant. I've seen apps flush on every 1 and 2 and 4 and 8 byte writes... it's not pretty to see the OS grind millions of IOs to write a 1MB file.
As an example a log file may need flushing every line if you are debugging a crash because you need to flush the ofstream before the crash occurs; while another log file may not need flushing every line if it is just producing verbose informational logging that is expected to flush automatically before the application terminates. It need not be either/or as you could derive a class with a more sophisticated flush algorithm to suit specific requirements.
Compare your case with the contrasting case of people who need to ensure their data is completely persisted to disk and not vulnerable in an operating system buffer (https://stackoverflow.com/questions/7522479/how-do-i-ensure-data-is-written-to-disk-before-closing-fstream).
Note that as written, outFile.flush() is superfluous as it flushes an already flushed ofstream. To be pedantic, you should have used endl alone or preferably "\n" with outFile.flush() but not both.
The performance of remote file operations, such as read/write, using SMB protocol can be affected by the size of buffers allocated by servers and clients. The buffer size determines the number of round trips needed to send a fixed amount of data. Every time when requests and responses are sent between client and server, the amount of time taken is equal to at least the latency between both sides, which could be very significant in the case of Wide Area Network (WAN).
SMB buffer -- The MaxBufferSize can be configured through the following registry setting:
Range: 1024 to 65535 (Choose value as per your requirement above 5000)
BUT SMB SIGNING effects the maximum buffer size allowed. Thus we need to disable SMB signing as well to aechieve our goal. Following registry need to be created on both server side and if possible on client side as well.
Data: 0 (disable), 1 (enable)
I don't have enough reputation to leave a comment (which I think would be better given the level of verification on this answer).
I notice that one big variance in your Linux vs Windows level trace is that you're using SMB1 on Linux and SMB2 in Windows. Perhaps the batch oplock mechanism performs better in SMB1 samba than the SMB2 exclusive lease implementation. In both cases these should allow for some amount of client side caching.
1) Perhaps try setting a lower max protocol level in Samba to try out windows with SMB1 2) Validate that exclusive oplocks or leases are taken out
Hope this helps :)