Very clean explanation of piece-wise linear approximation using a ReLU network. While there's no difference wrt to the training set, predictions of CPWL vs CC functions differ if the input vector is not in the training set (often the case in real world). This becomes a problem especially when the input is outside the range of training examples (Eg: x1=infinity, but the expected output is known to be bounded). Splines do a decent job at mitigating 'exploding predictions' when the input is outside the range of training examples, while simultaneously fitting a good model in the 'convex hull' that envelopes the training input-output vectors.
Non-linear activations in conjunction with linear or ReLU neurons can unlock continuous curve approximations. Here's my bit (though sigmoid is not the most efficient at polynomial approximations): https://medium.com/@snaveenmathew/manufacturing-polynomials-using-a-sigmoid-neural-network-693f6abc2aee