Contemporary software engineering tools employ Natural Language Processing (NLP) techniques and Information Retrieval (IR) methods for automated support. Such methods exploit the semantic and syntactic knowledge embedded in the textual content of source code to discover important information about the system. Such information can then be utilized in several essential software engineering activities such as traceability, refactoring, and reverse engineering. However, as software evolves, new and inconsistent terminology gradually finds its way into the project, leading the textual content of source code to drift to the unnatural side. Furthermore, source code is highly repetitive, often homogeneous, and suffers from data sparsity and vocabulary mismatch problems. Therefore, applying NLP and IR methods to source code without adjustment can be detrimental. Motivated by these observations, in this proposal we suggest a novel text-processing paradigm adjusted for software. Our main objectives are (1) to introduce an effective, scalable, and computationally-efficient paradigm for processing and analyzing the textual content of source code, and (2) to integrate the proposed paradigm in working prototypes that provide support for several essential software engineering activities. To achieve our objectives, we will conduct a series of analytical experiments, using industrial software systems, to establish the main constructs of our paradigm. Furthermore, sets of human studies will be conducted to assess the usability and effectiveness of our proposed tools. The broader significance of this research arises from the economical impact of the design and development of software engineering tools that enhance software developers’ productivity and ability to produce high-quality software.