keras accuracy doesn't improve more than 59 percent

It seems to me that for a neural network your data is not variate enough. You have a lot of similar values in your dataset. That might be a reason of the low accuracy. Try a simple regressor and not a neural network.

If you want to use a neural network at any rate, you should change the followings:

Generally for regression you should set the activation function for your last layer to 'relu' or 'linear', sigmoid is usually used for the hiden layers.

Try to change these first. If it does not work, try also different strategies as :

  1. Increase the batch size
  2. Increase the number of epochs
  3. Apply whitening to your dataset before running (pre-processing stage).
  4. Decrease the learning rate, you should use scheduler.

For whitening you can do:

from sklearn.decomposition import PCA

pca = PCA(whiten=True)
pca.fit(X)
X = pca.transform(X)

# make here train test split ...

X_test = pca.transform(X_test) # use the same pca model for the test set.

You have a lot of zeros in your dataset. Here you have a list of percentage of zero values per column (between 0 and 1):

0.6611697598907094 WORK_EDUCATION
0.5906196483663051 SHOP
0.15968546556987515 OTHER
0.4517919980835284 AM
0.3695455825652879 PM
0.449195697003247 MIDDAY
0.8160996565242585 NIGHT
0.03156998520561604 AVG_VEH_CNT
1.618641571247746e-05 work_traveltime
2.2660981997468445e-05 shop_traveltime
0.6930343378622924 work_tripmile
0.605410795044367 shop_tripmile
0.185622578107549 TRPMILES_sum
3.237283142495492e-06 TRVL_MIN_sum
0.185622578107549 TRPMILES_mean
0.469645614614391 HBO
0.5744850291841075 HBSHOP
0.8137429143965219 HBW
0.5307266729469959 NHB
0.2017960446874565 DWELTIME_mean
1.618641571247746e-05 TRVL_MIN_mean
0.6959996892208183 work_dweltime
0.6099365168775757 shop_dweltime
0.0009258629787537107 firsttrip_time
0.002949164942813393 lasttrip_time
0.7442934791405661 age_2.0
0.7541995655566023 age_3.0
0.7081200773063214 age_4.0
0.9401296855626884 age_5.0
0.3490503429901489 KNN_result

In short: NNs are rarely the best models for classifying either small amounts data or the data that is already compactly represented by a few non-heterogeneous columns. Often enough, boosted methods or GLM would produce better results from a similar amount of effort.

What can you do with your model? Counterintuitively, sometimes hindering the network capacity can be beneficial, especially when the number of network parameters exceeds number of training points. One can reduce the number of neurons, like in your case setting layer sizes to 16 or so and simultaneously removing layers; introduce regularizations (label smoothing, weight decay, etc); or generate more data by adding more derived columns in different (log, binary) scales.

Another approach would be to search for NNs models designed for your type of data. As, for example, Self-Normalizing Neural Networks or Wide & Deep Learning for Recommender Systems.

If you get to try only 1 thing, I would recommend doing a grid search of the learning rate or trying a few different optimizers.

How to make a better decision about which model to use? Look through finished kaggle.com competitions and find datasets similar to the one at hand, then check out the techniques used by the top places.