As per my understanding, virtual memory is the combination of physical memory and swap space.
The virtual memory is used to isolate the memory for each applications, and the applications thinks it has the whole memory in size (combination of swap + physical memory).
In case if the memory utilization is full in primary memory, then the swap will be used.
If the application requests more memory than available memory, then the CPU will spend more time on swapping the memory between swap space and primary memory. This will cause thrashing and eventually it will call OOM to kill one or more processes.
The MMU translates the virtual memory address to physical memory address.
I'm wondering what will happen to virtual memory in case if we don't have swap configured?
Will it take the whole physical memory as the virtual memory in size?
Also we don't have to face page fault and thrashing as we don't have a swap at all, hence there is no need to move the memory between two things?
Can any one please point out me if I'm misunderstanding any of the concepts above?
Your understanding is only partially correct in that any sort of virtual memory has to be backed by actual physical media where data is stored.
However, the size of virtual memory each process can use is not determined by the amount of backing store a system has, but rather the architecture of the microprocessor the process is running on, and in part, the operating system kernel that performs the virtual memory management.
That is, even though the x86_64 architecture supports a 64-bit virtual address space in theory, most x86_64 microprocessors only implement a 48-bit address pointer in practice. That is then further limited by the innards of the operating system (for example, of the 256TB the 48 bits give you, Windows before the 8.1 release only supported 8TB of user-space virtual memory per process).
In 32-bit x86 systems on the other hand, the address pointer is 32 bits, which gives you a total of 4GB of directly addressable space, but since the processes need to communicate to the kernel as well, the 4GB was split into 3GB for user-space and 1GB for kernel-space data in most 32-bit operating systems.
The important thing to take away here is that each process gets the full address space that a given architecture/operating system combination supports, regardless of how much memory there actually is in a system.
There are two simple reasons behind this.
One is that the life of a program developer is greatly simplified by using this abstraction.
Imagine you are writing a simple program that reads a string from standard input. To store that string, you need to allocate some memory, so you ask the operating system kernel to do that for you. What the malloc() system call returns, is a memory address where that string will be stored. All good until now.
Now imagine that this string is longer than what a single physical memory page allows you to store (which is 4KB on all x86 systems). So at some point (exactly at 4097 bytes, to be precise), accessing the remaining data in this string would require you to jump to a completely different address in physical memory to actually access the entire string. That's where it becomes complicated without VM.
What you'd need to know then, is where the second page in memory is, because chances are it is not right next to where your first page was allocated, and infact usually it isn't (memory fragmentation and other reasons). So imagine how this would complicate your life as a programmer.
Thanks to VM, the scattered physical layout of data in the system is remapped to contiguous addresses which are unique to your application, so you can continue to access it in a uniform way and let the kernel and the CPU do the remapping.
The second reason is safety. Nevermind all the advanced memory access isolation features implemented in modern processors, the very most basic of techniques is to provide a namespace of sorts to each process, such that whatever memory address it is trying to access, it only makes sense in the context of that process and no other.
Take virtual memory address X for example. If you try to access data at address X from process A, it is remapped to physical memory address P(a). If you try to access data at address X from process B, the physical address is P(b), where (usually) P(a) != P(b). This (in theory) makes it impossible for processes to access each other's data in an unauthorised manner.
(You can actually see this exact same approach, albeit on a completely different level, in Linux containers these days - we call it namespaces and they are used to separate process IDs of container A and container B, such that no single signal a process in container A sends to other processes can hurt processes running in container B, or indeed the host system. They are also used for many other resources, but the primary reason is isolating groups of processes from each other.)
The important thing to take away here is that processes mostly only request memory to be allocated to them, but then end up consuming a very small part of it.
Which brings us to the third reason. Optimisation.
Virtual memory allows for techniques such as memory-mapping important files (like shared libraries) so that the data in those files can only be loaded once, but many processes can actually use it at the same time. It will have appeared to each of those processes as if they "own" the entire address space the data is loaded at, when infact the backing store will only contain one copy of it, remapped to various virtual addresses of various different processes.
Similarly, when processes fork() to perform parallel execution of their work, only the smallest possible amount of physical memory is actually allocated (the execution pointer and some additional process information), however, this new process will actually get its complete 256TB address space (and see however many reservations, data structures, etc. have been made up until that point in time by its parent process). So the amount of virtual memory allocated will grow by the size of the original process that performed the fork(), but the physical memory allocation will only slightly change, until that new process starts changing the data it had shared with its parent process until that point (that last thing then triggers the so called copy-on-write of a memory page, where the kernel duplicates data which was originally owned by two or more processes, in order to preserve the consistency of data in the other processes that didn't do the writing).
This is all of course a simplified explanation, I am deliberately omitting some confusing details and corner cases, just as there are many other mechanisms and considerations to make when trying to fully understand how VM works in relation to the physical memory in a system.
But in short - a memory reservation is not the same as actually committing something to memory.
When a process stores too much data to memory, all the symptoms you described may occur, but that is not directly related to how virtual memory works and why we have it.
Hope this helps,