Some general recommendations

  • Go for quick analysis first; should be able in real-time
  • If problem is larger, break it down development to small chunk until analysis algorithm is finalized
  • Design algorithm such that read-in from file system is minimized
  • Be aware what the slowest part of your analysis code is, and why
  • Save intermediate results to disk; possible read-in of these results (use pickle, numpy.savez (python) or maybe even hdf5 files)

Large datasets

  • Read-in takes a long time; data does not fit in memory
  • Do development on smaller simulation
  • Data is split in different files; process each file individually
  • Similar principle for time-series